[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools

2019-06-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869082#comment-16869082
 ] 

ASF subversion and git services commented on LUCENE-8866:
-

Commit 2adc8c6c13d1a74c3a371c2341a05507e893dabf in lucene-solr's branch 
refs/heads/branch_8x from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2adc8c6 ]

LUCENE-8866: remove kuromoji/tools dependency on ICU


> Remove ICU dependency of kuromoji tools/test-tools
> --
>
> Key: LUCENE-8866
> URL: https://issues.apache.org/jira/browse/LUCENE-8866
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-8866.patch
>
>
> The tooling stuff has an off-by-default option to normalize entries, 
> currently using the ICU api.
> But I think since its off-by-default, and just doing NFKC normalization at 
> dictionary-build-time, its a better tradeoff to use the JDK here?
> I would rather remove the ICU dependency for the tooling and look at 
> simplifying the build to have less modules (e.g. investigate moving the 
> tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] 
> new tests in LUCENE-8863 are running by default, dictionary tool is shipped 
> as a commandline tool in the JAR, etc)
> "ant regenerate" should be enough to prevent any chicken-and-eggs in the 
> dictionary construction code, so I don't think we need separate modules to 
> enforce it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools

2019-06-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869078#comment-16869078
 ] 

ASF subversion and git services commented on LUCENE-8866:
-

Commit 91331d1a891d76173f6854287f11821e6ab41fae in lucene-solr's branch 
refs/heads/master from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=91331d1 ]

LUCENE-8866: remove kuromoji/tools dependency on ICU


> Remove ICU dependency of kuromoji tools/test-tools
> --
>
> Key: LUCENE-8866
> URL: https://issues.apache.org/jira/browse/LUCENE-8866
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-8866.patch
>
>
> The tooling stuff has an off-by-default option to normalize entries, 
> currently using the ICU api.
> But I think since its off-by-default, and just doing NFKC normalization at 
> dictionary-build-time, its a better tradeoff to use the JDK here?
> I would rather remove the ICU dependency for the tooling and look at 
> simplifying the build to have less modules (e.g. investigate moving the 
> tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] 
> new tests in LUCENE-8863 are running by default, dictionary tool is shipped 
> as a commandline tool in the JAR, etc)
> "ant regenerate" should be enough to prevent any chicken-and-eggs in the 
> dictionary construction code, so I don't think we need separate modules to 
> enforce it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools

2019-06-20 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868440#comment-16868440
 ] 

Robert Muir commented on LUCENE-8866:
-

If there are no objections I will wait until LUCENE-8863 is merged. The patch 
here poached some build changes from Mike S's PR for LUCENE-8863 because I 
needed to run test-tools.

> Remove ICU dependency of kuromoji tools/test-tools
> --
>
> Key: LUCENE-8866
> URL: https://issues.apache.org/jira/browse/LUCENE-8866
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-8866.patch
>
>
> The tooling stuff has an off-by-default option to normalize entries, 
> currently using the ICU api.
> But I think since its off-by-default, and just doing NFKC normalization at 
> dictionary-build-time, its a better tradeoff to use the JDK here?
> I would rather remove the ICU dependency for the tooling and look at 
> simplifying the build to have less modules (e.g. investigate moving the 
> tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] 
> new tests in LUCENE-8863 are running by default, dictionary tool is shipped 
> as a commandline tool in the JAR, etc)
> "ant regenerate" should be enough to prevent any chicken-and-eggs in the 
> dictionary construction code, so I don't think we need separate modules to 
> enforce it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools

2019-06-18 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866916#comment-16866916
 ] 

Mike Sokolov commented on LUCENE-8866:
--

+1 if people have more precise normalization requirements, they can encode them 
in their dictionary – I think we can presume this is not noisy user data, and 
should already have been cleaned.

> Remove ICU dependency of kuromoji tools/test-tools
> --
>
> Key: LUCENE-8866
> URL: https://issues.apache.org/jira/browse/LUCENE-8866
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-8866.patch
>
>
> The tooling stuff has an off-by-default option to normalize entries, 
> currently using the ICU api.
> But I think since its off-by-default, and just doing NFKC normalization at 
> dictionary-build-time, its a better tradeoff to use the JDK here?
> I would rather remove the ICU dependency for the tooling and look at 
> simplifying the build to have less modules (e.g. investigate moving the 
> tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] 
> new tests in LUCENE-8863 are running by default, dictionary tool is shipped 
> as a commandline tool in the JAR, etc)
> "ant regenerate" should be enough to prevent any chicken-and-eggs in the 
> dictionary construction code, so I don't think we need separate modules to 
> enforce it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools

2019-06-17 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866133#comment-16866133
 ] 

Robert Muir commented on LUCENE-8866:
-

Simple patch, I didn't move any code around, just removed the external dep.

> Remove ICU dependency of kuromoji tools/test-tools
> --
>
> Key: LUCENE-8866
> URL: https://issues.apache.org/jira/browse/LUCENE-8866
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-8866.patch
>
>
> The tooling stuff has an off-by-default option to normalize entries, 
> currently using the ICU api.
> But I think since its off-by-default, and just doing NFKC normalization at 
> dictionary-build-time, its a better tradeoff to use the JDK here?
> I would rather remove the ICU dependency for the tooling and look at 
> simplifying the build to have less modules (e.g. investigate moving the 
> tooling and tests into src/java and src/tools, so that [~msoko...@gmail.com] 
> new tests in LUCENE-8863 are running by default, dictionary tool is shipped 
> as a commandline tool in the JAR, etc)
> "ant regenerate" should be enough to prevent any chicken-and-eggs in the 
> dictionary construction code, so I don't think we need separate modules to 
> enforce it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org