[jira] [Updated] (LUCENE-8651) Tokenizer implementations can't be reset

2019-01-22 Thread Alan Woodward (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8651:
--
Attachment: LUCENE-8651.patch

> Tokenizer implementations can't be reset
> 
>
> Key: LUCENE-8651
> URL: https://issues.apache.org/jira/browse/LUCENE-8651
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Dan Meehl
>Priority: Major
> Attachments: LUCENE-8650-2.patch, LUCENE-8651.patch, LUCENE-8651.patch
>
>
> The fine print here is that they can't be reset without calling setReader() 
> every time before reset() is called. The reason for this is that Tokenizer 
> violates the contract put forth by TokenStream.reset() which is the following:
> "Resets this stream to a clean state. Stateful implementations must implement 
> this method so that they can be reused, just as if they had been created 
> fresh."
> Tokenizer implementation's reset function can't reset in that manner because 
> their Tokenizer.close() removes the reference to the underlying Reader 
> because of LUCENE-2387. The catch-22 here is that we don't want to 
> unnecessarily keep around a Reader (memory leak) but we would like to be able 
> to reset() if necessary.
> The patches include an integration test that attempts to use a 
> ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer 
> TokenStream. This test fails with an IllegalStateException thrown by 
> Tokenizer.ILLEGAL_STATE_READER.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8651) Tokenizer implementations can't be reset

2019-01-18 Thread Daniel Meehl (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Meehl updated LUCENE-8651:
-
Description: 
The fine print here is that they can't be reset without calling setReader() 
every time before reset() is called. The reason for this is that Tokenizer 
violates the contract put forth by TokenStream.reset() which is the following:

"Resets this stream to a clean state. Stateful implementations must implement 
this method so that they can be reused, just as if they had been created fresh."

Tokenizer implementation's reset function can't reset in that manner because 
their Tokenizer.close() removes the reference to the underlying Reader because 
of LUCENE-2387. The catch-22 here is that we don't want to unnecessarily keep 
around a Reader (memory leak) but we would like to be able to reset() if 
necessary.

The patches include an integration test that attempts to use a 
ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer 
TokenStream. This test fails with an IllegalStateException thrown by 
Tokenizer.ILLEGAL_STATE_READER.

 

  was:
The fine print here is that they can't be reset without calling setReader() 
every time before reset() is called. The reason for this is that Tokenizer 
violates the contract put forth by TokenStream.reset() which is the following:

"Resets this stream to a clean state. Stateful implementations must implement 
this method so that they can be reused, just as if they had been created fresh."

Tokenizer implementation's reset function can't reset in that manner because 
their Tokenizer.end() removes the reference to the underlying Reader because of 
LUCENE-2387. The catch-22 here is that we don't want to unnecessarily keep 
around a Reader (memory leak) but we would like to be able to reset() if 
necessary.

The patches include an integration test that attempts to use a 
ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer 
TokenStream. This test fails with an IllegalStateException thrown by 
Tokenizer.ILLEGAL_STATE_READER.

 


> Tokenizer implementations can't be reset
> 
>
> Key: LUCENE-8651
> URL: https://issues.apache.org/jira/browse/LUCENE-8651
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Daniel Meehl
>Priority: Major
> Attachments: LUCENE-8650-2.patch, LUCENE-8651.patch
>
>
> The fine print here is that they can't be reset without calling setReader() 
> every time before reset() is called. The reason for this is that Tokenizer 
> violates the contract put forth by TokenStream.reset() which is the following:
> "Resets this stream to a clean state. Stateful implementations must implement 
> this method so that they can be reused, just as if they had been created 
> fresh."
> Tokenizer implementation's reset function can't reset in that manner because 
> their Tokenizer.close() removes the reference to the underlying Reader 
> because of LUCENE-2387. The catch-22 here is that we don't want to 
> unnecessarily keep around a Reader (memory leak) but we would like to be able 
> to reset() if necessary.
> The patches include an integration test that attempts to use a 
> ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer 
> TokenStream. This test fails with an IllegalStateException thrown by 
> Tokenizer.ILLEGAL_STATE_READER.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8651) Tokenizer implementations can't be reset

2019-01-18 Thread Daniel Meehl (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Meehl updated LUCENE-8651:
-
Lucene Fields: New,Patch Available  (was: New)

> Tokenizer implementations can't be reset
> 
>
> Key: LUCENE-8651
> URL: https://issues.apache.org/jira/browse/LUCENE-8651
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Daniel Meehl
>Priority: Major
> Attachments: LUCENE-8650-2.patch, LUCENE-8651.patch
>
>
> The fine print here is that they can't be reset without calling setReader() 
> every time before reset() is called. The reason for this is that Tokenizer 
> violates the contract put forth by TokenStream.reset() which is the following:
> "Resets this stream to a clean state. Stateful implementations must implement 
> this method so that they can be reused, just as if they had been created 
> fresh."
> Tokenizer implementation's reset function can't reset in that manner because 
> their Tokenizer.end() removes the reference to the underlying Reader because 
> of LUCENE-2387. The catch-22 here is that we don't want to unnecessarily keep 
> around a Reader (memory leak) but we would like to be able to reset() if 
> necessary.
> The patches include an integration test that attempts to use a 
> ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer 
> TokenStream. This test fails with an IllegalStateException thrown by 
> Tokenizer.ILLEGAL_STATE_READER.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8651) Tokenizer implementations can't be reset

2019-01-18 Thread Daniel Meehl (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Meehl updated LUCENE-8651:
-
Attachment: LUCENE-8650-2.patch

> Tokenizer implementations can't be reset
> 
>
> Key: LUCENE-8651
> URL: https://issues.apache.org/jira/browse/LUCENE-8651
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Daniel Meehl
>Priority: Major
> Attachments: LUCENE-8650-2.patch
>
>
> The fine print here is that they can't be reset without calling setReader() 
> every time before reset() is called. The reason for this is that Tokenizer 
> violates the contract put forth by TokenStream.reset() which is the following:
> "Resets this stream to a clean state. Stateful implementations must implement 
> this method so that they can be reused, just as if they had been created 
> fresh."
> Tokenizer implementation's reset function can't reset in that manner because 
> their Tokenizer.end() removes the reference to the underlying Reader because 
> of LUCENE-2387. The catch-22 here is that we don't want to unnecessarily keep 
> around a Reader (memory leak) but we would like to be able to reset() if 
> necessary.
> The patches include an integration test that attempts to use a 
> ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer 
> TokenStream. This test fails with an IllegalStateException thrown by 
> Tokenizer.ILLEGAL_STATE_READER.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8651) Tokenizer implementations can't be reset

2019-01-18 Thread Daniel Meehl (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Meehl updated LUCENE-8651:
-
Component/s: modules/analysis

> Tokenizer implementations can't be reset
> 
>
> Key: LUCENE-8651
> URL: https://issues.apache.org/jira/browse/LUCENE-8651
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Daniel Meehl
>Priority: Major
> Attachments: LUCENE-8650-2.patch, LUCENE-8651.patch
>
>
> The fine print here is that they can't be reset without calling setReader() 
> every time before reset() is called. The reason for this is that Tokenizer 
> violates the contract put forth by TokenStream.reset() which is the following:
> "Resets this stream to a clean state. Stateful implementations must implement 
> this method so that they can be reused, just as if they had been created 
> fresh."
> Tokenizer implementation's reset function can't reset in that manner because 
> their Tokenizer.end() removes the reference to the underlying Reader because 
> of LUCENE-2387. The catch-22 here is that we don't want to unnecessarily keep 
> around a Reader (memory leak) but we would like to be able to reset() if 
> necessary.
> The patches include an integration test that attempts to use a 
> ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer 
> TokenStream. This test fails with an IllegalStateException thrown by 
> Tokenizer.ILLEGAL_STATE_READER.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8651) Tokenizer implementations can't be reset

2019-01-18 Thread Daniel Meehl (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Meehl updated LUCENE-8651:
-
Attachment: LUCENE-8651.patch

> Tokenizer implementations can't be reset
> 
>
> Key: LUCENE-8651
> URL: https://issues.apache.org/jira/browse/LUCENE-8651
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Daniel Meehl
>Priority: Major
> Attachments: LUCENE-8650-2.patch, LUCENE-8651.patch
>
>
> The fine print here is that they can't be reset without calling setReader() 
> every time before reset() is called. The reason for this is that Tokenizer 
> violates the contract put forth by TokenStream.reset() which is the following:
> "Resets this stream to a clean state. Stateful implementations must implement 
> this method so that they can be reused, just as if they had been created 
> fresh."
> Tokenizer implementation's reset function can't reset in that manner because 
> their Tokenizer.end() removes the reference to the underlying Reader because 
> of LUCENE-2387. The catch-22 here is that we don't want to unnecessarily keep 
> around a Reader (memory leak) but we would like to be able to reset() if 
> necessary.
> The patches include an integration test that attempts to use a 
> ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer 
> TokenStream. This test fails with an IllegalStateException thrown by 
> Tokenizer.ILLEGAL_STATE_READER.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org