Christian Moen created LUCENE-6216: -------------------------------------- Summary: Make it easier to modify Japanese token attributes downstream Key: LUCENE-6216 URL: https://issues.apache.org/jira/browse/LUCENE-6216 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Christian Moen Priority: Minor
Japanese-specific token attributes such as {{PartOfSpeechAttribute}}, {{BaseFormAttribute}}, etc. get their values from a {{org.apache.lucene.analysis.ja.Token}} through a {{setToken()}} method. This makes it cumbersome to change these token attributes later on in the analysis chain since the {{Token}} instances are difficult to instantiate (sort of read-only objects). I've ran into this issue in LUCENE-3922 (JapaneseNumberFilter) where it would be appropriate to update token attributes to also reflect Japanese number normalization. I think it might be more practical to allow setting a specific value for these token attributes directly rather than through a {{Token}} since it makes the APIs simpler, allows for easier changing attributes downstream, and also supporting additional dictionaries easier. The drawback with the approach that I can think of is a performance hit as we will miss out on the inherent lazy retrieval of these token attributes from the {{Token}} object (and the underlying dictionary/buffer). I'd like to do some testing to better understand the performance impact of this change. Happy to hear your thoughts on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org