[GitHub] [lucene-solr] msokolov commented on issue #862: LUCENE-8971: Enable constructing JapaneseTokenizer with custom dictio…

2019-09-11 Thread GitBox
msokolov commented on issue #862: LUCENE-8971: Enable constructing 
JapaneseTokenizer with custom dictio…
URL: https://github.com/apache/lucene-solr/pull/862#issuecomment-530494478
 
 
   > Should it be marked experimental then ? The fact that we ship a single 
dictionary within the jar also ensures that it is built from the same version 
but this change breaks this assumption. What kind of compatibility are we 
expecting here ? Should we require users to rebuild binary dictionary on each 
minor version ?
   
   Yes, these are good questions. I think experimental makes sense for this 
given that we are not providing detailed documentation and really only experts 
with knowledge of NLP will ever use this. With expert features there is no 
compatibility guarantee, so I think that rebuilding with each version would be 
the recommended policy. I would think users would be well-advised to rebuild 
whenever they build their software, treating the Kuromoji dictionary as a 
binary artifact produced from (textual dictionary) source code. Does that make 
sense?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on issue #862: LUCENE-8971: Enable constructing JapaneseTokenizer with custom dictio…

2019-09-11 Thread GitBox
msokolov commented on issue #862: LUCENE-8971: Enable constructing 
JapaneseTokenizer with custom dictio…
URL: https://github.com/apache/lucene-solr/pull/862#issuecomment-530482206
 
 
   But to answer your question a little more fully: the way this works is, you 
use DictionaryBuilder, in the same way that the build system does it to build 
the built-in dictionary. Then you put that dictionary on your classpath (or on 
the filesystem), and pass that path to the JapaneseTokenizer constructor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on issue #862: LUCENE-8971: Enable constructing JapaneseTokenizer with custom dictio…

2019-09-11 Thread GitBox
msokolov commented on issue #862: LUCENE-8971: Enable constructing 
JapaneseTokenizer with custom dictio…
URL: https://github.com/apache/lucene-solr/pull/862#issuecomment-530481447
 
 
   Jim, I'm sorry - there was no comment for several days, so I assumed it was 
OK to push. Let's discuss, and if there's a problem I can revert? 
DictionaryBuilder was moved into the main jar (no longer part of tools) in 
LUCENE-8871, so I think the software is all there - we have been using it in an 
automated build in our system with a patched version of Lucene. Arguably we 
could/should have documentation around this, that is clearly lacking, but given 
the complexity of this "feature" I feel it is pretty expert.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on issue #862: LUCENE-8971: Enable constructing JapaneseTokenizer with custom dictio…

2019-09-11 Thread GitBox
msokolov commented on issue #862: LUCENE-8971: Enable constructing 
JapaneseTokenizer with custom dictio…
URL: https://github.com/apache/lucene-solr/pull/862#issuecomment-530426989
 
 
   Also back-ported to 8x branch - this was a clean cherry-pick, so I didn't 
post a PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org