subject:"\[jira\] \[Commented\] \(LUCENE\-4312\) Index format to store position length per position"

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-25 Thread Michael Gibney (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892990#comment-16892990
 ] 

Michael Gibney commented on LUCENE-4312:


For the sake of facilitating discussion around something more concrete, I 
uploaded a patch ([^positionLength-postings.patch]) for a straw-man proposal 
for {{PostingsEnum}} modifications to support position length (also visible as 
a pseudo-PR here: [https://github.com/magibney/lucene-solr/pull/1]). The patch 
won't compile, of course (no corresponding modifications to subclasses of 
{{PostingsEnum}}).

The proposal goes a bit beyond simply adding a {{positionLength()}} method, 
with a few additional fundamental methods to support optimizations that proved 
helpful in implementing performant positional queries (for LUCENE-7398).

Any feedback would be much appreciated, especially given the acknowledged 
provisional (and potentially controversial) nature of this proposal.

[~jpountz], I've given some more thought to the challenge of not being able to 
"advance positions on terms in the order we want anymore". I think there should 
be a general-purpose way to preserve this ability (in a way that doesn't depend 
on the kind of corpus-specific shingle-based filtering that I previously 
suggested). I'm considering an approach leveraging something analogous to a 
reverse token filter, except rather than reversing the token text, it (sort of) 
reverses start/end positions: start position of the new token is end position 
of the original token, and end position of the new token is 
{{originalEndPosition + positionLength}}. Then you could use the least-cost 
term as an entrypoint, and build forward with original tokens, backward with 
the modified-positions tokens. Query implementation would be responsible for 
properly interpreting flipped positions.

 

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
> Attachments: positionLength-postings.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-11 Thread Michael Gibney (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883103#comment-16883103
 ] 

Michael Gibney commented on LUCENE-4312:


That PR looks like a good start, [~romseygeek].

I'd still like to have a better sense of what bar we're trying to clear as far 
as demonstrating the usefulness of recording position length in the index. It 
seems we have consensus (basically unchanged since August of 2012), that:
 # Recording position length in the index is relatively straightforward, 
whether done via codecs and modified postings API or via Payloads and custom 
Intervals/Spans implementations.
 # Performant positional query implementation leveraging position length is a 
significant challenge.

Would people be comfortable evaluating usefulness/performance based on the 
Spans implementation proposed for LUCENE-7398, as opposed to holding out for an 
Intervals query implementation? Particularly if we're talking proof-of-concept, 
all of the performance implications are directly comparable (between Spans and 
Intervals).

Is there a _conditional_ consensus that indexed position length would be a 
worthwhile feature, assuming the implementation of performant positional 
queries to leverage it?

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-10 Thread Alan Woodward (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882091#comment-16882091
 ] 

Alan Woodward commented on LUCENE-4312:
---

I've opened a PR with a simple implementation of this as an IntervalsSource: 
https://github.com/apache/lucene-solr/pull/772

Note that this currently works only with index-time graphs; currently the logic 
for rewriting query-time graphs relies on the fact that the only sources with a 
minimum extent of 1 are simple terms, and this new source breaks that 
assumption.  I think we can fix that with the addition of another method to 
IntervalsSource, but I'm trying to keep things as simple as possible to begin 
with.

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-09 Thread Michael Gibney (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881477#comment-16881477
 ] 

Michael Gibney commented on LUCENE-4312:


This sounds potentially like a good way to proceed. I appreciate the need for a 
high bar for getting things into the index – I suppose I was invoking 
"chicken/egg" not directly as an argument for inclusion in the index, but 
rather to highlight the interdependence of these features.

Essentially all of the proof-of-concept work that we're discussing here is 
already implemented as part of the LUCENE-7398 work, and has been running in 
production (and iteratively improved) for over a year. Before proceeding, I'd 
like to get some consensus on what the best way is to move forward, and also 
perhaps have some discussion of what bar we have in mind for "once these prove 
useful".

Regarding usefulness, and the question of to what extent this represents a 
corner-case: anybody interested in index-time synonyms and precise positional 
queries needs this feature. So in some sense this boils down to a question of 
the usefulness of index-time synonyms (or other index-time TokenStream graphs) 
... and since the standing recommendation has for some time been to _avoid_ 
using index-time synonyms, we have another chicken/egg :). I can say that this 
has been considerably helpful in my use case, and the problem it addresses is 
at the root of a number of consistently reported issues, among users of both 
[Elasticsearch|https://discuss.elastic.co/t/not-getting-results-from-a-phrase-query-using-query-string-of-the-form-x-a1-abc-in-6-6-0/179191]
 and 
[Solr|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201905.mbox/%3CCAF%3DheHETPCqxUcqyu13tFfKFcALzD__-QrToRBP-VVWh1S3-Wg%40mail.gmail.com%3E].

Practically speaking, I'm wondering what's the best way to get the most eyes on 
this feature set, with the goal of evaluating its usefulness and performance. 
The fix as currently implemented is basically a wholesale rewrite of some of 
the Spans classes, but it seeks to correctly support existing Spans contracts; 
implemented as a branch, I was able to rely on existing tests against various 
Spans. For performance reasons, changes were also introduced in indexing code 
(e.g., DefaultIndexingChain). For these reasons, my sense is that it would be 
quite challenging to extract these features into the sandbox module. Even if 
such "sandbox" extraction were possible, it would render the task of evaluation 
more difficult for all but the most dedicated users (currently it suffices to 
run a forked build to swap out the backend Spans implementation in all of the 
parsers and components that rely on Spans). Could these potentially be reasons 
to opt for a "branch" approach (as opposed to "sandbox")?

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-09 Thread Michael McCandless (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881195#comment-16881195
 ] 

Michael McCandless commented on LUCENE-4312:


+1 to build on payloads, today, to break the chicken/egg situation.  This 
should just be a {{TokenFilter}} that converts {{PositionLengthAttribute}} into 
payloads?  Then [~mgibney] could contribute query-time code that can decode 
these payloads and implement correct positional queries.  Once these prove 
useful we can circle back later and optimize how we store position lengths in 
the index.

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-08 Thread Robert Muir (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880828#comment-16880828
 ] 

Robert Muir commented on LUCENE-4312:
-

I don't think chicken and the egg description works well as an argument for 
something to add to the index. We should have a high bar in order to do that, 
because once something gets added its basically impossible to remove.

My earlier suggestion (payloads) was based on the fact that we are talking 
about corner-cases as far as search improvements, at a heavy complexity cost.

Maybe we could first address the search side with payload-based queries (maybe 
in sandbox, similar to what you already developed?) to try to address 
[~jpountz] concerns about scalability before actually optimizing it further by 
encoding in the index?

This way it wouldn't have to be all solved at once.

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-08 Thread Michael Gibney (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880635#comment-16880635
 ] 

Michael Gibney commented on LUCENE-4312:


True, both good points. But it's kind of a chicken-or-egg situation ... there 
would have been no point to address these implied challenges, so long as 
position length has not been recorded in the index (and is thus not available 
at query time). That doesn't mean there _aren't_ ways to address the challenges.

Regarding the "A B C" example, I addressed this in the LUCENE-7398 work by 
indexing next start position as a lookahead. As a proof of concept this was 
done with Payloads, but in principle I could see slight modifications 
(somewhere at the intersection of codecs and postings API) that would natively 
read next start position "early" and expose it as a lookahead. This would avoid 
the type of problematic call to {{PostingsEnum.nextPosition()}} that would (as 
you correctly point out) result in the need to buffer all information 
associated with _every_ position. I've described this approach in more detail 
[here|https://michaelgibney.net/2018/09/lucene-graph-queries-2/#index-lookahead-don-t-buffer-positions-if-you-don-t-have-to].
{quote}we can't advance positions on terms in the order we want anymore.
{quote}
Yes, I'd argue that's the toughest challenge. I addressed it indirectly by 
constructing CommonGrams-style shingles used specifically for pre-filtering 
conjunctions in the "approximation" phase of two-phase iteration (ensuring that 
common terms at subclause index 0 don't kill performance). This is described in 
more detail 
[here|https://michaelgibney.net/2018/09/lucene-graph-queries-2/#shingle-based-pre-filtering-of-conjunctionspans].

I'm not intending this to be about these particular solutions, and you might 
take issue with the solutions themselves. The more general point I guess is 
that indexed position length is fundamental, and is a prerequisite for the 
development of ways to address these challenges.

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-08 Thread Adrien Grand (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880606#comment-16880606
 ] 

Adrien Grand commented on LUCENE-4312:
--

bq. the complexity of query execution would be driven by what's actually in the 
index

I don't think this is true.

For instance an exact phrase query trying to match "A B C" that is currently 
positioned on A (position=3, length=1), B (position=4, length=1), C 
(position=6, length=1) would need to advance B to the next position in case 
there is another match on position 4 that has a length of 2. And then we should 
advance C first because maybe because it also has another match on position 4 
of a different length.

Also we can't advance positions on terms in the order we want anymore. Today we 
use the rarer term to lead the iteration of positions. If we had position 
lengths in the index we would need to advance positions in the order in which 
terms occur in the phrase query since the start position that B must have 
depends on the length of A on the current position: position starts are 
guaranteed to come in order in the index but position ends are not (at least we 
don't enforce it in token streams today).

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-08 Thread Michael Gibney (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880490#comment-16880490
 ] 

Michael Gibney commented on LUCENE-4312:


Thank you for the feedback, [~sokolov] and [~jpountz]!
{quote}Recording position lengths in the index is the easy part of the problem 
in my opinion.
{quote}
Yes, this is my view as well; and looking to the future, _respecting_ position 
length would certainly add complexity to phrase queries. But in terms of 
performance impact, the complexity of query execution would be driven by what's 
actually in the index (so for many use cases performance should be roughly 
equivalent to that of an implementation that ignores position length).

Regarding the challenges of query implementation... I'm taking a fresh look at 
this issue in the context of work done on LUCENE-7398, which seeks to implement 
backtracking phrase queries in an efficient way (including sloppy, nested, 
etc.). Despite that issue being nominally about "nested Span queries", it's 
really more generally about "proximity search over variable-length subclauses", 
and the techniques used in the implementation for LUCENE-7398 would be 
transferable to interval queries as well.

It's a fair point about the arbitrariness of sloppy phrase queries with 
intervening multi-term synonyms, but I wouldn't call such queries 
"meaningless"; in any case, I think that problem already exists for multi-term 
indexed synonyms, and is not exacerbated by the introduction of indexed 
position length. Sloppy phrase queries (and, for that matter, tokenization 
itself) are somewhat arbitrary by nature. Following that tangent, I can imagine 
some potential ways to mitigate such arbitrariness ... all of which themselves 
rely on the ability to index token graph structure (i.e., position length).

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-08 Thread Adrien Grand (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880422#comment-16880422
 ] 

Adrien Grand commented on LUCENE-4312:
--

Recording position lengths in the index is the easy part of the problem in my 
opinion. I'm concerned that this will introduce significant complexity to 
phrase queries (they will require backtracking in order to deal with the case 
that a term exists twice at the same position with different position lengths), 
and even make sloppy phrase queries and their spans/intervals counterparts 
meaningless (as terms could be very distant according to the index only because 
there is one term in-between that has a multi-term synonym indexed). 

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-06 Thread Mike Sokolov (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879757#comment-16879757
 ] 

Mike Sokolov commented on LUCENE-4312:
--

Yes, we're compromising precision today when we apply index-time synonyms (and 
other analysis that produces token graphs). I think this would be an awesome 
addition If the cost of indexing position length in postings is not too great, 
and I think you are right -- it will usually be 1, sometimes 0 and rarely > 1, 
so we should be able to encode compactly, and decode efficiently.

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-05 Thread Michael Gibney (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879425#comment-16879425
 ] 

Michael Gibney commented on LUCENE-4312:


Following up on discussion at Berlin Buzzwords with [~mikemccand], [~sokolov], 
[~simonw], and [~romseygeek]:

A lot of useful context (for, e.g., synonym generation, etc.) is available at 
index time that is not available at query time. Leveraging this context can 
result in index-time TokenStream manipulations that produce token graphs. Since 
position length is not indexed, it is impossible at query time to reconstruct 
index-time TokenStream "graph" structure.

Indexed position length is a prerequisite for any use case that calls for:
1. index-time graph TokenStreams
2. precise/accurate proximity query (via spans, intervals, etc.)

Could we discuss adding first-class support for this structural "position 
length" information?

Updating PostingsEnum to include endPosition() -- returning {{position+1}} by 
default -- would be a meaningful first step. This would facilitate the 
development of query implementations without requiring an API fork, and would 
signal an intention to move in the direction of supporting index-time token 
graphs.

Beyond that, I'm optimistic that codecs could be enhanced to index position 
length without introducing much additional overhead (I'd guess that position 
length for the common case of linear/non-graph index-time token streams could 
compress quite well).

> Index format to store position length per position
> --
>
> Key: LUCENE-4312
> URL: https://issues.apache.org/jira/browse/LUCENE-4312
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 6.0
>Reporter: Gang Luo
>Priority: Minor
>  Labels: Suggestion
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Mike Mccandless said:TokenStreams are actually graphs.
> Indexer ignores PositionLengthAttribute.Need change the index format (and 
> Codec APIs) to store an additional int position length per position.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-04-29 Thread Michael Gibney (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829599#comment-16829599
 ] 

Michael Gibney commented on LUCENE-4312:


(following on discussion at

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

13 matches

Site Navigation

Mail list logo

Footer information