subject:"\[jira\] Commented\: \(LUCENE\-2878\) Allow Scorer to expose positions and payloads aka. nuke spans"

[
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243949#comment-14243949
]

Alan Woodward commented on LUCENE-2878:
---

The remaining nocommits are:
* implement nextPosition() etc on TermAutomatonScorer
** this should be fairly simple, and I'm working on it now
* implement nextPosition() etc on SpanScorer
** this is not so simple, as Spans.next() will move to the next document if
it's exhausted the positions on the current doc, a completely different API.
I'm tempted to just not implement these, seeing as they already don't work with
'normal' queries. A followup JIRA will remove Span queries entirely from trunk
and deprecate them in 5.0 anyway.
* should PositionFilteredScorer enforce that all its subqueries should be on
the same field?
** again, I'm tempted not to. 1) this means adding getField() to all queries,
which is a far more invasive API change, and 2) we already have
FieldMaskingSpanQuery to get around this limitation in SpanQueries, which
suggests that it's not necessarily a good thing anyway. On the other hand,
most of the time you want phrases or proximity queries to be on the same field.
* how to deal with Payloads on intervals.
** At the moment we just return the payload at the starting position. This is
probably not what we want. Spans has a different API, returning a
Collectionbyte[] from getPayload rather than a BytesRef, which may be closer
to what should happen. I think this can be changed from a nocommit to a TODO,
however, and dealt with in a separate issue.

I also want to have a look at the MultiTermQuery rewrite API, as at the moment
it defaults to CONSTANT_SCORE_FILTER_REWRITE, which of course doesn't support
positions. It would be nice to have some way of telling the query whether
positions are needed or not at rewrite time. But again, that's for another
issue.

So I think this is close. I'll get TermAutomatonScorer sorted today.

Allow Scorer to expose positions and payloads aka. nuke spans
--

Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch,
PosHighlighter.patch, PosHighlighter.patch

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244141#comment-14244141
 ] 

Alan Woodward commented on LUCENE-2878:
---

Another thing to think about is the interaction of freq() and nextPosition() 
calls.  At the moment the API contract is that you can call both, and indeed 
this was necessary to ensure that you didn't call nextPosition() too many 
times.  However, now that you can just call nextPosition() until you get 
NO_MORE_POSITIONS this isn't needed, and seeing as many Scorers actually need 
to consume all their positions to return an accurate freq(), I'd like to change 
this to say that you *can't* call nextPosition() after you've called freq().  
We can then have a default Scorer implementation of freq() that calls 
nextPosition() repeatedly to count instances, with TermScorer etc having their 
own specialized overrides.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-12 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244145#comment-14244145
 ] 

Robert Muir commented on LUCENE-2878:
-

{quote}
 We can then have a default Scorer implementation of freq() that calls 
nextPosition() repeatedly to count instances, with TermScorer etc having their 
own specialized overrides.
{quote}

Can we not do the slow default implementation and just keep it abstract?

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244180#comment-14244180
 ] 

Alan Woodward commented on LUCENE-2878:
---

How about keeping it abstract, but having a concrete protected slowFreq() 
method?

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-12 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244233#comment-14244233
 ] 

Robert Muir commented on LUCENE-2878:
-

I would rather us not add slow methods at all to our scoring api. if these 
position iterators do not support freq(), then they do not support freq().

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-09 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244385#comment-14244385
 ] 

Alan Woodward commented on LUCENE-2878:
---

So this doesn't quite work, because things like CheckIndex want to validate 
frequencies and positions, and in fact the vast majority of DocsEnum 
implementations have no problem supporting this.  I'm now leaning towards 
making all these 'interval' Scorers just return 1 from freq(), and instead 
enforcing the rule that you can't call score() and nextPosition(), moving the 
limitation up to Scorer itself.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-12 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244395#comment-14244395
 ] 

Robert Muir commented on LUCENE-2878:
-

Why would checkindex be calling this on an interval scorer? I dont understand 
the problem. 

I dont think we should add complex rules to Scorer, just throw UOE here from 
the interval ones.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239418#comment-14239418
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1644050 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1644050 ]

LUCENE-2878: UnorderedNearQuery scoring

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237758#comment-14237758
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1643787 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1643787 ]

LUCENE-2878: Make Interval package-private, remove PositionsCollector

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237768#comment-14237768
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1643788 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1643788 ]

LUCENE-2878: Remove positions highlighter (for now); MemoryPostingsFormat fix

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237792#comment-14237792
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1643791 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1643791 ]

LUCENE-2878: Remove postingsFeatures() from Collector

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237867#comment-14237867
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1643819 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1643819 ]

LUCENE-2878: Fix compilation in TestGrouping; don't update extremes in 
PositionQueue with exhausted iterator

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237937#comment-14237937
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1643835 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1643835 ]

LUCENE-2878: PostingsHighlighter uses Integer.MAX_VALUE to indicate empty 
docsenum

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237979#comment-14237979
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1643843 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1643843 ]

LUCENE-2878: Get Solr compiling

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238002#comment-14238002
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1643846 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1643846 ]

LUCENE-2878: Set NO_MORE_POSITIONS to be -1

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-06 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236938#comment-14236938
 ] 

Alan Woodward commented on LUCENE-2878:
---

The disadvantage of using -1 to indicate NO_MORE_POSITIONS is that having an 
exhausted iterator sort 'naturally' after an unexhausted one simplifies a lot 
of the interval spanning logic in things like the PositionQueue or in 
OrderedNearQuery.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-06 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236939#comment-14236939
 ] 

Robert Muir commented on LUCENE-2878:
-

I dont really see it as a choice, the current value is not a reserved value.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-06 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236945#comment-14236945
 ] 

Yonik Seeley commented on LUCENE-2878:
--

bq. I dont really see it as a choice, the current value is not a reserved value.

We could chose to make it a reserved value since we're going into a new major 
version.
Does anyone even have positions of MAX_VALUE?

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-06 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236951#comment-14236951
 ] 

Robert Muir commented on LUCENE-2878:
-

Its more than that, indexwriter has to reject such positions and tests updated, 
old postings formats need to have a check and do something, etc. 

Because some people like large position increment gap values between field 
instances, someone can easily blow up positions to large values without 
actually having that many terms.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-06 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236963#comment-14236963
]

Yonik Seeley commented on LUCENE-2878:
--

bq. Its more than that, indexwriter has to reject such positions and tests
updated,

That part seems easy.

bq. old postings formats need to have a check and do something, etc.
That part could be more of a pain... not sure. But maybe not necessary if it's
never happened? (see below)

bq. Because some people like large position increment gap values between field
instances, someone can easily blow up positions to large values without
actually having that many terms.

Solr has defaulted to 100 forever... and even if a user changed to a *huge*
value, it would be super unlikely to hit MAX_INT exactly (esp since they would
get an exception if they went over... they are already in super dangerous
territory).

I think the only people that would have MAX_INT position are those that are
using positions to encode something other than real positions (i.e. using
positions like payloads) and have values going all the way up to MAX_INT, or
are using MAX_INT as a special value. It could very well be that no such user
exists.

Allow Scorer to expose positions and payloads aka. nuke spans
--

Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch,
PosHighlighter.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-06 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236966#comment-14236966
 ] 

Robert Muir commented on LUCENE-2878:
-

It does not matter to me really what solr defaults to, we cant be lenient here. 
Changing these constants requires work.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-06 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236974#comment-14236974
 ] 

Yonik Seeley commented on LUCENE-2878:
--

bq. It does not matter to me really what solr defaults to

I was just trying to determine the likelihood of *anyone* actually having 
pos=MAX_INT in their index.

bq.  Changing these constants requires work.

Sure - I was only pointing it out as an option (5.0 would be the right time), 
not strongly advocating for it.


 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235660#comment-14235660
 ] 

Robert Muir commented on LUCENE-2878:
-

Can you explain the core api changes such as DocsAndPositionsEnum being folded 
into DocsEnum?

It looks inconsistent, with some impls returning -1 for the positions methods 
and some throwing UOE. 
And why do we still have an Interval class, if DPEnum has what we need?

I am also unsure of Collector.postingsFeatures, why do we need this? It seems 
enough to pass flags to Weight.scorer()

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-05 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235669#comment-14235669
 ] 

Alan Woodward commented on LUCENE-2878:
---

Hm, it is inconsistent at the moment, as I started out returning -1 and then 
started throwing UOE to find places that were returning the wrong thing.  
Probably what it ought to do is return NO_MORE_POSITIONS if it doesn't support 
positions at all (and -1 for offsets, as that's expected everywhere).  I'll 
update the patch.

Interval is there as a concrete POJO that can be stored, cached, passed around, 
etc, after the scorer has moved on.  Maybe it should be an interface instead, 
and DocsEnum should implement it?

Collector.postingsFeatures() is there for things like highlighting.  Normally a 
query wouldn't require offsets, for example, but if you're using the 
PostingsCollector for highlighting, then you want to retrieve offsets when 
pulling scorers.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235671#comment-14235671
 ] 

Robert Muir commented on LUCENE-2878:
-

I don't think the method shoudl be on collector, this is slow and unrelated to 
collection.

For highlighting, i think it should actually invoke Weight.scorer(..., PARAM, 
...) and advance() scorers to precise docids of interest. It isn't doing 
collection.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-05 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235677#comment-14235677
 ] 

Alan Woodward commented on LUCENE-2878:
---

bq. I don't think the method shoudl be on collector, this is slow and unrelated 
to collection.

OK, I'll try ripping it out and see what stops working

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-05 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235684#comment-14235684
 ] 

Robert Muir commented on LUCENE-2878:
-

{quote}
Interval is there as a concrete POJO that can be stored, cached, passed around, 
etc, after the scorer has moved on. Maybe it should be an interface instead, 
and DocsEnum should implement it?
{quote}

How often is this POJO needed? I see what you are saying, but we have a fairly 
advanced collection of scorers today, and none need that with Scorer :)
If this snapshot of docsenum is only needed for say one or two scorers, maybe 
it can be explicitly and (package-)privately used by them, whereas the rest 
just use docsenum?

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235709#comment-14235709
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1643349 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1643349 ]

LUCENE-2878: return NO_MORE_POSITIONS rather than throwing UOE

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-05 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235738#comment-14235738
 ] 

Alan Woodward commented on LUCENE-2878:
---

Interval (outside the posfilter package) and Collector.postingsFeatures() are 
basically only used by the PositionsCollector, which I'm using for highlighting 
and testing.  The highlighter can just be removed for now, and worked on in a 
separate issue - maybe as an extension of PostingsHighlighter?  I'll rework the 
tests as well.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-11-12 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235749#comment-14235749
 ] 

Robert Muir commented on LUCENE-2878:
-

In general, I think anything we can do to split it up is good :)

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-11-13 Thread Luc Vanlerberghe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209562#comment-14209562
 ] 

Luc Vanlerberghe commented on LUCENE-2878:
--

Any reason why the branch was not created in the branches folder alongside the 
other lucene«jira issue» branches?
Where it is now it is not replicated on git.apache.org, nor on GitHub...

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-11-13 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209827#comment-14209827
 ] 

Alan Woodward commented on LUCENE-2878:
---

Because I'm an idiot, mainly.  Will move the branch, thanks...

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208082#comment-14208082
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1638800 from [~romseygeek]
[ https://svn.apache.org/r1638800 ]

Branch for LUCENE-2878

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-11-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202054#comment-14202054
 ] 

Jan Høydahl commented on LUCENE-2878:
-

This would be a huge selling point for Lucene5.x
NUKE SPANS!

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-11-07 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202058#comment-14202058
 ] 

David Smiley commented on LUCENE-2878:
--

Yeah, tell me about it.  I haven't looked at the details yet but I think 
scorers exposing positions would enable a new highlighter to be built that 
could accurately know where a match occurred.  Hopefully there is control to 
expose positions for queries that are normally position-less (e.g. wildcard 
queries).

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-11-07 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202063#comment-14202063
 ] 

Alan Woodward commented on LUCENE-2878:
---

The good news is that I have somebody to sponsor me to actually do this work in 
the next month or so.

I have a heap of other things I need to do first though :-(

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-03-15 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936240#comment-13936240
 ] 

Simon Willnauer commented on LUCENE-2878:
-

Now we are talking

Sent from my iPhone



 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-03-15 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936304#comment-13936304
 ] 

Alan Woodward commented on LUCENE-2878:
---

Ooh, hello.

So the LUCENE-2878 branch is a bit of a mess, in that it has two semi-working 
versions of this code: Simon's initial IntervalIterator API, in the 
o.a.l.search.intervals package, and my DocsEnum.nextPosition() API in 
o.a.l.search.positions.  Simon's code is much more complete, and I've been 
using a separately maintained version of that in production code for various 
clients, which you can see at 
https://github.com/flaxsearch/lucene-solr-intervals.  I think the 
nextPosition() API is nicer, but the IntervalIterator API has the advantage of 
actually working.

The github repository has some other stuff on it too, around making the 
intervals code work across different fields.  The API that I've come up with 
there is not very nice, though.

It would be ace to get this moving again!

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-02-24 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585466#comment-13585466
 ] 

Alan Woodward commented on LUCENE-2878:
---

I've been chipping away at this for a bit.  Here's a summary of what I've done:
- Applied LUCENE-4524, and also added startPosition() and endPosition() to 
DocsEnum
- Changed the postings readers to return NO_MORE_POSITIONS once nextPosition() 
has been called freq() times
- Extended the ConjunctionScorer and DisjunctionScorer implementations to 
return positions
- Added an abstract PositionFilteredScorer with reset(int doc) and 
doNextPosition() methods
- Added a bunch of concrete implementations (ExactPhraseQuery, NotWithinQuery, 
OrderedNearQuery, UnorderedNearQuery, RangeFilterQuery) with tests - these are 
all in the posfilter package

I still need to implement SloppyPhraseQuery and MultiPhraseQuery, but I 
actually think these won't be too difficult with this API.  Plus there are a 
bunch of nocommits regarding freq() calculations, and this doesn't work at all 
with BooleanScorer - we'll probably need a way to tell the scorer that we do or 
don't want position information.

[~simonw] and I talked about this on IRC the other day, about resolving 
collisions in ExactPhraseQuery, but I think that problem may go away doing 
things this way.  I may have misunderstood though - if so, could you add a test 
to TestExactPhraseQuery showing what I'm missing, Simon?

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-02-07 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573337#comment-13573337
 ] 

Alan Woodward commented on LUCENE-2878:
---

I'm going to try applying the patch from LUCENE-4524 here and see if that 
helps.  Next step would be to add startPosition() and endPosition() to 
DocsEnum, and try re-implementing the filter queries using methods directly on 
child scorers, rather than pulling a separate interval iterator.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-02-07 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573353#comment-13573353
 ] 

Simon Willnauer commented on LUCENE-2878:
-

Alan, I don't think you can cut over to DocsEnum or DocsAndPositionsEnum. 
DocsAndPosEnum has a significant problem that doesn't allow efficient 
PosIterator impl underneath it. It defines that DocsAndPosEnum#nextPosition 
should be called at max DocsEnum#freq() times which is fine on a low level but 
bogus for lazy pos iterators since we don't know ahead of time how many 
intervals we might have. I think we first need to fix this problem before we 
can go and do this refactoring, makes sense? PhraseQuery does only know his 
freq currently because it's greedy and pulls all intervals at once.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-02-07 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573382#comment-13573382
 ] 

Alan Woodward commented on LUCENE-2878:
---

Hm, OK.  So can we change the nextPosition() API to return -1 once the 
positions have been exhausted, rather than becoming undefined?  So a consumer 
would look something like:
{{monospaced}}
  int pos;
  while ((pos = dp.nextPosition()) != -1) {
// do stuff here
  }
{{monospaced}}

Implementations that need to know the frequency call freq(), others can iterate 
lazily.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-02-07 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573386#comment-13573386
 ] 

Robert Muir commented on LUCENE-2878:
-

Alan: its a nice idea... we should seriously consider something this (-1 or 
NO_MORE_POSITIONS
or whatever) if it would allow this stuff to just work over the existing DP 
api.

I dont know what the cost would be to existing impls (e.g. 
Lucene41PostingsReader would need some code changes), but hopefully small or 
nil. 

And of course having an API like this would be well worth any small performance 
hit.


 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-02-07 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573389#comment-13573389
 ] 

Simon Willnauer commented on LUCENE-2878:
-

bq. Hm, OK. So can we change the nextPosition() API to return -1 once the 
positions have been exhausted, rather than becoming undefined? So a consumer 
would look something like:
yeah I was saying the same thing yesterday when I talked about this to rob. 
This would make stuff more consistent too. I will open an issue

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-02-07 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573392#comment-13573392
 ] 

Robert Muir commented on LUCENE-2878:
-

I think this should be explored in the branch versus a separate issue E.g. we 
shouldnt impose this on 
postings implementations unless it sorta works with the whole design here.

I'd also really recommend NO_MORE_POSITIONS not -1. -1 currently means 
invalid (e.g. you should not have called nextPosition).

Its not like any Scorer would need to check for this, because if you try to do 
prox operations on a field that omits position information, the user should be 
getting an exception up-front from the Weight.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-01-21 Thread Simon Willnauer (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558606#comment-13558606
]

Simon Willnauer commented on LUCENE-2878:
-

bq. The only Span functionality that's missing I think is payload queries. If
we want to have all the span functionality in here before it can land on trunk
I can work on that next.

I really think we can skip that for now.

bq. It would also be good to do some proper benchmarking. Do we already have
something that can compare sets of queries?
We do have LuceneUtil but its not like straight forward. I will take a look
what we can do here.

bq. Is it worth specialising here? Have two query types (or maybe just a flag
on the query), so you can optimize for query speed or for scoring. SpanScorer
always iterates over all spans, by comparison.

I think we should specialize the Scorere here. Visiting the least amount of
intervals possible is maybe worth it.

So from my perspective what we should try exploring is making the scorer a
DocsAndPosEnum in the branch and see if we can remove the Interval API mostly
in favor of the DocsAndPos API. The only problem I have with this is really
that if a given scorer consumes intervals from a subscorer it needs to buffer
all those if it's parent needs all of them too. Not sure if it is worth it at
this point. Ideally I would want to have DocsAndPosEnum to be folded into
DocsEnum first too.

Allow Scorer to expose positions and payloads aka. nuke spans
--

Key: LUCENE-2878
URL: https://issues.apache.org/jira/browse/LUCENE-2878
Project: Lucene - Core
Issue Type: Improvement
Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12,
mentor
Fix For: Positions Branch

Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch,
PosHighlighter.patch, PosHighlighter.patch

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-01-20 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558405#comment-13558405
 ] 

Alan Woodward commented on LUCENE-2878:
---

So at the moment, IntervalFilterScorer doesn't consume all the intervals on a 
given document when advancing, it just checks if the document has any matching 
intervals at all.  Which is great for speed, but bad for scoring - you want to 
iterate through the intervals on a document to get the within-doc frequency, 
which can then be passed to the docscorer.  You also need to iterate through 
everything to deal with payloads.

Is it worth specialising here?  Have two query types (or maybe just a flag on 
the query), so you can optimize for query speed or for scoring.  SpanScorer 
always iterates over all spans, by comparison.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-01-18 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557667#comment-13557667
 ] 

Alan Woodward commented on LUCENE-2878:
---

Since the last patch went up I've fixed a bunch of bugs (BrouwerianQuery works 
properly now, as do various nested IntervalQuery subtypes that were throwing 
NPEs), as well as adding Span-type scoring and fleshing out the explain() 
methods.  The only Span functionality that's missing I think is payload 
queries.  If we want to have *all* the span functionality in here before it can 
land on trunk I can work on that next.

It would also be good to do some proper benchmarking.  Do we already have 
something that can compare sets of queries?



 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-01-17 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556367#comment-13556367
 ] 

Alan Woodward commented on LUCENE-2878:
---

I want to get this moving again - will get the branch up to date tomorrow and 
then iterate from there.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2013-01-17 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556372#comment-13556372
 ] 

Simon Willnauer commented on LUCENE-2878:
-

YEAH!!!

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-11-12 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495661#comment-13495661
 ] 

Alan Woodward commented on LUCENE-2878:
---

Just committed a massive test refactoring, which should show where the problems 
are in DisjunctionIntervalScorer and MultiTermQuery.  Lots of the tests fail 
now, as the previous ones weren't necessarily picking up false positives 
(UnorderedNearQuery is particularly bad for this).

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-11-11 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494874#comment-13494874
 ] 

Simon Willnauer commented on LUCENE-2878:
-

hey alan,

bq. I've started to use this branch in an (experimental!) system I'm developing 
for a client. 

very good news! cool stuff - can you provide more infos what you are doing 
there? Do you highlight too?

regarding your latest patch - commit it!

bq. 1) There isn't a replacement for SpanNotQueries - the BrouwerianIterator 
comes close, but doesn't quite cover all the use cases. I
can you provide a testcase what it doesn't cover? you can go ahead and commit 
it even if you don't have a fix.

bq. 2) The API is not very nice when it comes to subclassing Iterators. For 
example, I have 'anchor' terms at the start and end of documents, which allow 
 
I am not sure I understand this. if you have marker terms how do they differ 
from ordinary terms can't you just do a nearOrdered(X, _ENDMARKER_) query? 
I don't see where you need to subclass here. can you elaborate?

bq. 4) I found a bug in the iterators() method of DisjunctionSumScorer
great, can you submit the testcase?

bq. 3) MultiTermQueries don't return iterators unless you set their rewrite 
policies to something other than CONSTANT_SCORE_REWRITE.
yeah the problem here is that we use a filter instead of a scorer, you should 
see an exception right? I think it would make sense to have a MTQ rewrite a 
query on a ConstantScoreQuery instead of a filter - we can't get a interval 
iter from a filter :/

I think overall we should move out of this issue and create separate issues for 
all you cases. Also for the things robert mentioned like exploring Scorere 
extends DocsAndPosEnum

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-11-11 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494940#comment-13494940
 ] 

Alan Woodward commented on LUCENE-2878:
---

Hi Simon, 

I'll open separate sub-tasks for the issues.

The system I'm building is basically an equivalent of the elasticsearch 
percolator - we register a bunch of queries, and then run them all against 
individual documents passing through the system.  We then emit the exact 
positions which have matched, which is a type of highlighting, I guess.  The 
point of the anchor terms is that we don't want to highlight them - if you're 
searching for a term within five positions of the start of a document, you 
don't want the first term of the document highlighted as well.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-11-10 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494785#comment-13494785
 ] 

Alan Woodward commented on LUCENE-2878:
---

I've started to use this branch in an (experimental!) system I'm developing for 
a client.  The good news is that performance is generally much better than the 
existing system that uses SpanQueries - faster query time and smaller memory 
footprint, and also nicer GC behaviour (I can't give exact numbers, but suffice 
to say that where the previous system regularly ran out of memory, this one 
hasn't yet)!

There are definitely some rough edges, though, which I'll try and smooth out 
and add as patches.

1) There isn't a replacement for SpanNotQueries - the BrouwerianIterator comes 
close, but doesn't quite cover all the use cases.  In this instance, I need to 
have the equivalent of a 'not within' operator - match intervals that do not 
fall within a given another interval.  I've written a new iterator, which I've 
called an 'InverseBrouwerianIntervalIterator' for want of a better name, but it 
definitely could do with some more eyes on it...

2) The API is not very nice when it comes to subclassing Iterators.  For 
example, I have 'anchor' terms at the start and end of documents, which allow 
users to query for terms within a certain distance from them.  These shouldn't 
be highlighted, so I created an AnchorTermQuery which returned a different type 
of IntervalIterator that didn't do anything in its collect() method.  To do 
this, I had to create an AnchorTermWeight, an AnchorTermScorer and an 
AnchorTermIntervalIterator, all of which were more or less copy-pastes of the 
equivalent Term* classes; it would be nice to make this easier...

3) MultiTermQueries don't return iterators unless you set their rewrite 
policies to something other than CONSTANT_SCORE_REWRITE.

4) I found a bug in the iterators() method of DisjunctionSumScorer - if all 
subscorers are PositionFilterScorers, then you can get NPEs if the subscorers 
have matches that don't pass the filters.  I'll add a test case shortly

5) I had to run this without my scoring patch (this case doesn't actually use 
scoring, so it doesn't matter that much), because MultiTermQueries can blow up 
in scoring if they get rewritten into blank queries; I guess this wasn't a 
problem with Span* queries, but I haven't had a chance to work out how to get 
round it.  Will add another test case for this as well.

All in all, though, these are looking much better than the equivalent 
SpanQueries.  Position filters on boolean queries in particular work much 
better - the semantics of SpanQueries are completely wrong for this, and 
involved generating very heavy queries for pretty simple cases.  Nice work!

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, 
 PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-11-03 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490116#comment-13490116
 ] 

Alan Woodward commented on LUCENE-2878:
---

Where do we stand on this now?  It sounds as though we need to get more 
implemented before everyone's happy with it being merged in.  I can make a 
start at cutting TestSpansAdvanced and TestSpansAdvanced2 over to intervals 
tests this week (at a glance they're the only tests we have for Span scoring at 
the moment), although I guess things are going to go on hold a bit for the 
ApacheCon.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-11-03 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490119#comment-13490119
 ] 

Simon Willnauer commented on LUCENE-2878:
-

+1 to add scoring! go ahead this would be great.

will you be at apache con?

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-11-03 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490120#comment-13490120
 ] 

Alan Woodward commented on LUCENE-2878:
---

 will you be at apache con?

Not this year :-(  Will try and make one next year, though!

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-10-30 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486858#comment-13486858
 ] 

Michael McCandless commented on LUCENE-2878:


It seems like the new oal.search.interval queries are meant to replace spans?  
So ... should we remove spans?  Or is there functionality missing in intervals?

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-10-30 Thread Simon Willnauer (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486864#comment-13486864
]

Simon Willnauer commented on LUCENE-2878:
-

{quote}
But then ... this would typically be used to find the locations to hilite,
right? Ie not for the main query? Because if you wanted to do this for the
main query you should really use one of the oal.search.intervals.* queries
instead, and those pull only a single DPEnum per Term I think?
{quote}

correct.

{quote}
It seems like the new oal.search.interval queries are meant to replace spans?
So ... should we remove spans? Or is there functionality missing in intervals?
{quote}

eventually yes. Currently they don't score based on positions they only filter.
My plan was to bring this on trunk including spans. Once on trunk move spans to
a module and cut over the functionality query by query. We are currently
missing payload support which I think we should add once we are on trunk. makes
sense?

Allow Scorer to expose positions and payloads aka. nuke spans
--

Key: LUCENE-2878
URL: https://issues.apache.org/jira/browse/LUCENE-2878
Project: Lucene - Core
Issue Type: Improvement
Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12,
mentor
Fix For: Positions Branch

Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch,
LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch,
PosHighlighter.patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-10-30 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486868#comment-13486868
]

Robert Muir commented on LUCENE-2878:
-

To me Interval.java looks a lot like a span. I think it would be good to
resolve this before landing on trunk.
If we cant score based on positions, it seems to me the api is not fully baked?
e.g. i think it would be better to score based on positions and run benchmarks
and so on first.

here we also get a lot more index option flags. I dont like how many of these
we have:
* indexoptions
* flags on docsenum
* flags on docsandpositionsenum
* now here, flags on scorer/collector/etc

I am working on some ideas to clean some of this up in trunk separate from this
branch. i think this makes the apis really confusing.

Allow Scorer to expose positions and payloads aka. nuke spans
--

Key: LUCENE-2878
URL: https://issues.apache.org/jira/browse/LUCENE-2878
Project: Lucene - Core
Issue Type: Improvement
Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12,
mentor
Fix For: Positions Branch

Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch,
LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch,
PosHighlighter.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-10-30 Thread Simon Willnauer (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487069#comment-13487069
]

Simon Willnauer commented on LUCENE-2878:
-

bq. Interval.java looks a lot like a span. I think it would be good to resolve
this before landing on trunk.
what exactly did you expect? I mean its basically the same thing but reusing
the name sucks. what do you wanna resolve here?

regarding your comments could you put your ideas up here in a somewhat more
compact form than 5 1-line comments? This would be great especially what kind
of ideas you have and you resovle / work on on trunk so we can maybe be more
productive here.

thanks

Allow Scorer to expose positions and payloads aka. nuke spans
--

Key: LUCENE-2878
URL: https://issues.apache.org/jira/browse/LUCENE-2878
Project: Lucene - Core
Issue Type: Improvement
Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12,
mentor
Fix For: Positions Branch

Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch,
LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch,
PosHighlighter.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-10-29 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485899#comment-13485899
 ] 

Simon Willnauer commented on LUCENE-2878:
-

thanks mike for the commits! much apprecitated!

{quote}
OK so it sounds like I pull one IntervalIterator up front (and use it
for the whole time), and it's my job to call .scorerAdvanced(docID)
every time I either .nextDoc or .advance the original Scorer? Ie this
resets my IntervalIterator onto the current doc's intervals.
{quote}

exactly!

{quote}
Ahhh  so it visits all intervals in the query tree leading up to
the current match interval (of the top query) that you've iterated to?
OK. Maybe we can find a better name ... can't think of one now 
{quote}

a better name for collect? 

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-10-29 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485952#comment-13485952
 ] 

Michael McCandless commented on LUCENE-2878:


bq. a better name for collect?

Yeah, to somehow reflect that it's visiting/collecting/recursing on the full 
interval tree ... but nothing comes to mind ...

When I first saw it / read the docs I thought this was analogous to 
Scorer.score(Collector), ie that it would bulk collect all intervals from the 
iterator.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485994#comment-13485994
 ] 

Robert Muir commented on LUCENE-2878:
-

I think the patch for review is incomplete? e.g. I see PostingsFeatures was 
added
to the Scorer api but its not in the patch.


 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2012-10-29 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486013#comment-13486013
 ] 

Simon Willnauer commented on LUCENE-2878:
-

bq. I think the patch for review is incomplete? e.g. I see PostingsFeatures was 
added
its a inner class of Weight in that patch but we might move it out!

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486060#comment-13486060
 ] 

Robert Muir commented on LUCENE-2878:
-

{quote}
Instead of PostingFeatures.isProximityFeature, can we just use
X.compareTo(PostingsFeatures.POSITIONS) = 0? (We do this for
IndexOptions).
{quote}

I think this is a confusing part of the current patch. For example:

{noformat}
// Check if we can return a BooleanScorer
-  if (!scoreDocsInOrder  topScorer  required.size() == 0) {
+  if (!scoreDocsInOrder  flags == PostingFeatures.DOCS_AND_FREQS  
topScorer  required.size() == 0) {
{noformat}

I don't think we should be doing these == comparisons. What if someone sends 
DOCS_ONLY? (which really ConstantScoreQuery i think should pass down to its 
subs and so on, so they can skip freq blocks, but thats another thing to 
tackle).


 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486064#comment-13486064
 ] 

Robert Muir commented on LUCENE-2878:
-

There seems to be a lot of unrelated formatting changes in important classes 
like TermWeight.java etc

Can we factor these out and do any formatting changes separately?

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, gsoc2012, lucene-gsoc-11, lucene-gsoc-12, 
 mentor
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans