[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools
[ https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869082#comment-16869082 ] ASF subversion and git services commented on LUCENE-8866: - Commit 2adc8c6c13d1a74c3a371c2341a05507e893dabf in lucene-solr's branch refs/heads/branch_8x from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2adc8c6 ] LUCENE-8866: remove kuromoji/tools dependency on ICU > Remove ICU dependency of kuromoji tools/test-tools > -- > > Key: LUCENE-8866 > URL: https://issues.apache.org/jira/browse/LUCENE-8866 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-8866.patch > > > The tooling stuff has an off-by-default option to normalize entries, > currently using the ICU api. > But I think since its off-by-default, and just doing NFKC normalization at > dictionary-build-time, its a better tradeoff to use the JDK here? > I would rather remove the ICU dependency for the tooling and look at > simplifying the build to have less modules (e.g. investigate moving the > tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] > new tests in LUCENE-8863 are running by default, dictionary tool is shipped > as a commandline tool in the JAR, etc) > "ant regenerate" should be enough to prevent any chicken-and-eggs in the > dictionary construction code, so I don't think we need separate modules to > enforce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools
[ https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869078#comment-16869078 ] ASF subversion and git services commented on LUCENE-8866: - Commit 91331d1a891d76173f6854287f11821e6ab41fae in lucene-solr's branch refs/heads/master from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=91331d1 ] LUCENE-8866: remove kuromoji/tools dependency on ICU > Remove ICU dependency of kuromoji tools/test-tools > -- > > Key: LUCENE-8866 > URL: https://issues.apache.org/jira/browse/LUCENE-8866 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-8866.patch > > > The tooling stuff has an off-by-default option to normalize entries, > currently using the ICU api. > But I think since its off-by-default, and just doing NFKC normalization at > dictionary-build-time, its a better tradeoff to use the JDK here? > I would rather remove the ICU dependency for the tooling and look at > simplifying the build to have less modules (e.g. investigate moving the > tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] > new tests in LUCENE-8863 are running by default, dictionary tool is shipped > as a commandline tool in the JAR, etc) > "ant regenerate" should be enough to prevent any chicken-and-eggs in the > dictionary construction code, so I don't think we need separate modules to > enforce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools
[ https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868440#comment-16868440 ] Robert Muir commented on LUCENE-8866: - If there are no objections I will wait until LUCENE-8863 is merged. The patch here poached some build changes from Mike S's PR for LUCENE-8863 because I needed to run test-tools. > Remove ICU dependency of kuromoji tools/test-tools > -- > > Key: LUCENE-8866 > URL: https://issues.apache.org/jira/browse/LUCENE-8866 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-8866.patch > > > The tooling stuff has an off-by-default option to normalize entries, > currently using the ICU api. > But I think since its off-by-default, and just doing NFKC normalization at > dictionary-build-time, its a better tradeoff to use the JDK here? > I would rather remove the ICU dependency for the tooling and look at > simplifying the build to have less modules (e.g. investigate moving the > tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] > new tests in LUCENE-8863 are running by default, dictionary tool is shipped > as a commandline tool in the JAR, etc) > "ant regenerate" should be enough to prevent any chicken-and-eggs in the > dictionary construction code, so I don't think we need separate modules to > enforce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools
[ https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866916#comment-16866916 ] Mike Sokolov commented on LUCENE-8866: -- +1 if people have more precise normalization requirements, they can encode them in their dictionary – I think we can presume this is not noisy user data, and should already have been cleaned. > Remove ICU dependency of kuromoji tools/test-tools > -- > > Key: LUCENE-8866 > URL: https://issues.apache.org/jira/browse/LUCENE-8866 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-8866.patch > > > The tooling stuff has an off-by-default option to normalize entries, > currently using the ICU api. > But I think since its off-by-default, and just doing NFKC normalization at > dictionary-build-time, its a better tradeoff to use the JDK here? > I would rather remove the ICU dependency for the tooling and look at > simplifying the build to have less modules (e.g. investigate moving the > tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] > new tests in LUCENE-8863 are running by default, dictionary tool is shipped > as a commandline tool in the JAR, etc) > "ant regenerate" should be enough to prevent any chicken-and-eggs in the > dictionary construction code, so I don't think we need separate modules to > enforce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools
[ https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866133#comment-16866133 ] Robert Muir commented on LUCENE-8866: - Simple patch, I didn't move any code around, just removed the external dep. > Remove ICU dependency of kuromoji tools/test-tools > -- > > Key: LUCENE-8866 > URL: https://issues.apache.org/jira/browse/LUCENE-8866 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-8866.patch > > > The tooling stuff has an off-by-default option to normalize entries, > currently using the ICU api. > But I think since its off-by-default, and just doing NFKC normalization at > dictionary-build-time, its a better tradeoff to use the JDK here? > I would rather remove the ICU dependency for the tooling and look at > simplifying the build to have less modules (e.g. investigate moving the > tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] > new tests in LUCENE-8863 are running by default, dictionary tool is shipped > as a commandline tool in the JAR, etc) > "ant regenerate" should be enough to prevent any chicken-and-eggs in the > dictionary construction code, so I don't think we need separate modules to > enforce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org