[jira] [Commented] (LUCENE-8250) Should FilteringTokenFilter handle positionLength

2018-04-13 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437157#comment-16437157
 ] 

Robert Muir commented on LUCENE-8250:
-

ok I will dig into it. But i think its coming back, I think the problem is that 
these bogus "holes"  are incompatible with conversion of the stream to an 
automaton in this case. 

The idea if a "hole" is that from a position perspective, the token is removed 
but we leave evidence that it was there. So its wrong to modify positionLength 
when deleting a token if we are going to do holes.

An alternative is to not leave a hole at all when deleting a token. Instead 
FilteringTokenFilter would adjust posinc/poslen as needed to behave as if the 
token was never there in the first place. It would need some additional 
buffering to do this correctly. So that's how LUCENE-4065 ties in.

> Should FilteringTokenFilter handle positionLength
> -
>
> Key: LUCENE-8250
> URL: https://issues.apache.org/jira/browse/LUCENE-8250
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Major
> Attachments: LUCENE-8250.patch
>
>
> FilteringTokenFilter does not handle the position length graph attribute when 
> removing a token from the stream. This doesn't work well with graph token 
> stream that sets position length since removing a token from the stream can 
> invalidate the position length set on the previous tokens. 
> This issue was first discussed in 
> https://issues.apache.org/jira/browse/LUCENE-4065 but it has a different 
> purpose which is why I am opening a new issue here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8250) Should FilteringTokenFilter handle positionLength

2018-04-13 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436960#comment-16436960
 ] 

Jim Ferenczi commented on LUCENE-8250:
--

I attached a small test that I hope illustrate the issue. The synonym rule is 
"twd, the walking dead, the zombie show" and removing "the" from the stream 
after the synonym graph makes "zombie show" a following path of "walking" so 
the output of the graph is "twd, walking dead, walking zombie show". It's 
unclear to me if the FilteringTokenFilter is doing the right thing here. I 
added the dot output of TokenStreamToAutomaton in the test, this class is able 
to fill the hole when a stop filter removes a token but in this case I don't 
see how we can infer that "zombie show" is not after "walking". 

> Should FilteringTokenFilter handle positionLength
> -
>
> Key: LUCENE-8250
> URL: https://issues.apache.org/jira/browse/LUCENE-8250
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Major
> Attachments: LUCENE-8250.patch
>
>
> FilteringTokenFilter does not handle the position length graph attribute when 
> removing a token from the stream. This doesn't work well with graph token 
> stream that sets position length since removing a token from the stream can 
> invalidate the position length set on the previous tokens. 
> This issue was first discussed in 
> https://issues.apache.org/jira/browse/LUCENE-4065 but it has a different 
> purpose which is why I am opening a new issue here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8250) Should FilteringTokenFilter handle positionLength

2018-04-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436277#comment-16436277
 ] 

Robert Muir commented on LUCENE-8250:
-

can we make a simple test case for the issue? It would help me understand it 
better. The last example we looked at, we tentatively decided that stopfilter 
was doing the right thing, so maybe we need another one.

> Should FilteringTokenFilter handle positionLength
> -
>
> Key: LUCENE-8250
> URL: https://issues.apache.org/jira/browse/LUCENE-8250
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Major
>
> FilteringTokenFilter does not handle the position length graph attribute when 
> removing a token from the stream. This doesn't work well with graph token 
> stream that sets position length since removing a token from the stream can 
> invalidate the position length set on the previous tokens. 
> This issue was first discussed in 
> https://issues.apache.org/jira/browse/LUCENE-4065 but it has a different 
> purpose which is why I am opening a new issue here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org