[jira] [Updated] (LUCENE-8497) Rethink multi-term analysis handling

2018-10-30 Thread Alan Woodward (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8497:
--
Attachment: LUCENE-8497.patch

> Rethink multi-term analysis handling
> 
>
> Key: LUCENE-8497
> URL: https://issues.apache.org/jira/browse/LUCENE-8497
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8497.patch, LUCENE-8497.patch, LUCENE-8497.patch, 
> LUCENE-8497.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current framework for handling term normalisation works via instanceof 
> checks for MultiTermAwareComponent and casts.  MultiTermAwareComponent itself 
> deals in AbstractAnalysisComponents, and so callers need to cast to the 
> correct component type before use, which is ripe for misuse.
> We should re-organise all this to be type-safe and usable without casts.  One 
> possibility is to add `normalize` methods to CharFilterFactory and 
> TokenFilterFactory that mirror their existing `create` methods.  The default 
> implementation would return the input unchanged, while filters that should 
> apply at normalization time can delegate to `create`.
> Related to this, we should deprecate and remove LowerCaseTokenizer, which 
> combines tokenization and normalization in a way that will break this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8497) Rethink multi-term analysis handling

2018-10-29 Thread Alan Woodward (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8497:
--
Attachment: LUCENE-8497.patch

> Rethink multi-term analysis handling
> 
>
> Key: LUCENE-8497
> URL: https://issues.apache.org/jira/browse/LUCENE-8497
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8497.patch, LUCENE-8497.patch, LUCENE-8497.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current framework for handling term normalisation works via instanceof 
> checks for MultiTermAwareComponent and casts.  MultiTermAwareComponent itself 
> deals in AbstractAnalysisComponents, and so callers need to cast to the 
> correct component type before use, which is ripe for misuse.
> We should re-organise all this to be type-safe and usable without casts.  One 
> possibility is to add `normalize` methods to CharFilterFactory and 
> TokenFilterFactory that mirror their existing `create` methods.  The default 
> implementation would return the input unchanged, while filters that should 
> apply at normalization time can delegate to `create`.
> Related to this, we should deprecate and remove LowerCaseTokenizer, which 
> combines tokenization and normalization in a way that will break this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8497) Rethink multi-term analysis handling

2018-10-16 Thread Alan Woodward (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8497:
--
Attachment: LUCENE-8497.patch

> Rethink multi-term analysis handling
> 
>
> Key: LUCENE-8497
> URL: https://issues.apache.org/jira/browse/LUCENE-8497
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8497.patch, LUCENE-8497.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current framework for handling term normalisation works via instanceof 
> checks for MultiTermAwareComponent and casts.  MultiTermAwareComponent itself 
> deals in AbstractAnalysisComponents, and so callers need to cast to the 
> correct component type before use, which is ripe for misuse.
> We should re-organise all this to be type-safe and usable without casts.  One 
> possibility is to add `normalize` methods to CharFilterFactory and 
> TokenFilterFactory that mirror their existing `create` methods.  The default 
> implementation would return the input unchanged, while filters that should 
> apply at normalization time can delegate to `create`.
> Related to this, we should deprecate and remove LowerCaseTokenizer, which 
> combines tokenization and normalization in a way that will break this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8497) Rethink multi-term analysis handling

2018-10-16 Thread Alan Woodward (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8497:
--
Attachment: LUCENE-8497.patch

> Rethink multi-term analysis handling
> 
>
> Key: LUCENE-8497
> URL: https://issues.apache.org/jira/browse/LUCENE-8497
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8497.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current framework for handling term normalisation works via instanceof 
> checks for MultiTermAwareComponent and casts.  MultiTermAwareComponent itself 
> deals in AbstractAnalysisComponents, and so callers need to cast to the 
> correct component type before use, which is ripe for misuse.
> We should re-organise all this to be type-safe and usable without casts.  One 
> possibility is to add `normalize` methods to CharFilterFactory and 
> TokenFilterFactory that mirror their existing `create` methods.  The default 
> implementation would return the input unchanged, while filters that should 
> apply at normalization time can delegate to `create`.
> Related to this, we should deprecate and remove LowerCaseTokenizer, which 
> combines tokenization and normalization in a way that will break this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org