[jira] [Resolved] (LUCENE-8971) Enable constructing JapaneseTokenizer from custom dictionary

2019-09-11 Thread Mike Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov resolved LUCENE-8971. -- Assignee: Mike Sokolov Resolution: Fixed > Enable constructing JapaneseTokenizer from

[jira] [Updated] (LUCENE-8971) Enable constructing JapaneseTokenizer from custom dictionary

2019-09-11 Thread Mike Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8971: - Fix Version/s: 8.3 > Enable constructing JapaneseTokenizer from custom dictionary >

[jira] [Updated] (LUCENE-8971) Enable constructing JapaneseTokenizer from custom dictionary

2019-09-06 Thread Mike Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8971: - Description: This is basically just finishing up what was started in LUCENE-8863. It adds a

[jira] [Created] (LUCENE-8971) Enable constructing JapaneseTokenizer from custom dictionary

2019-09-06 Thread Mike Sokolov (Jira)
Mike Sokolov created LUCENE-8971: Summary: Enable constructing JapaneseTokenizer from custom dictionary Key: LUCENE-8971 URL: https://issues.apache.org/jira/browse/LUCENE-8971 Project: Lucene - Core

[jira] [Commented] (LUCENE-8966) KoreanTokenizer should split unknown words on digits

2019-09-06 Thread Mike Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924206#comment-16924206 ] Mike Sokolov commented on LUCENE-8966: -- > For complex number grouping and normalization, Namgyu Kim

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-09-06 Thread Mike Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924203#comment-16924203 ] Mike Sokolov commented on LUCENE-8920: -- If I understand you correctly, T1 is the threshold we

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-09-05 Thread Mike Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923530#comment-16923530 ] Mike Sokolov commented on LUCENE-8920: -- I like this! I would be happy to review if you want to post

[jira] [Commented] (LUCENE-8966) KoreanTokenizer should split unknown words on digits

2019-09-05 Thread Mike Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923378#comment-16923378 ] Mike Sokolov commented on LUCENE-8966: -- Would you consider grouping numbers and (at least some)

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-31 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897113#comment-16897113 ] Mike Sokolov commented on LUCENE-8920: -- [~noble.paul] thanks for fixing - I thought I had been

[jira] [Updated] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8920: - Status: Patch Available (was: Open) > Reduce size of FSTs due to use of direct-addressing

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1675#comment-1675 ] Mike Sokolov commented on LUCENE-8920: -- bq. I'm making it a blocker for 8.3 since we haven't

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-18 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888416#comment-16888416 ] Mike Sokolov commented on LUCENE-8920: -- Before digging in in earnest on FST size reduction, I'd

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-17 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887089#comment-16887089 ] Mike Sokolov commented on LUCENE-8920: -- Note: I pushed the old-format Kuromoji dictionary and it

[jira] [Comment Edited] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-16 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886603#comment-16886603 ] Mike Sokolov edited comment on LUCENE-8920 at 7/17/19 1:27 AM: --- Yes, that

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-16 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886603#comment-16886603 ] Mike Sokolov commented on LUCENE-8920: -- Yes, that makes sense. Because we reverted the "current

[jira] [Updated] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-15 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8920: - Description: Some data can lead to worst-case ~4x RAM usage due to this optimization. Several

[jira] [Created] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-07-15 Thread Mike Sokolov (JIRA)
Mike Sokolov created LUCENE-8920: Summary: Reduce size of FSTs due to use of direct-addressing encoding Key: LUCENE-8920 URL: https://issues.apache.org/jira/browse/LUCENE-8920 Project: Lucene - Core

[jira] [Commented] (SOLR-13629) Remove whitespace only lines & trailing whitespace from analytics package

2019-07-13 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884528#comment-16884528 ] Mike Sokolov commented on SOLR-13629: - We don't want to remove all the blank (or whitespace-only)

[jira] [Commented] (SOLR-6672) function results' names should not include trailing whitespace

2019-07-08 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880853#comment-16880853 ] Mike Sokolov commented on SOLR-6672: Thanks! I had forgotten about this. Did you at least test

[jira] [Commented] (LUCENE-4312) Index format to store position length per position

2019-07-06 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879757#comment-16879757 ] Mike Sokolov commented on LUCENE-4312: -- Yes, we're compromising precision today when we apply

[jira] [Commented] (LUCENE-8895) Switch all FSTs to use direct addressing optimization

2019-07-03 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877938#comment-16877938 ] Mike Sokolov commented on LUCENE-8895: -- Ah yes, thanks! I now deprecated the other one too. >

[jira] [Updated] (LUCENE-8895) Switch all FSTs to use direct addressing optimization

2019-07-02 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8895: - Resolution: Fixed Status: Resolved (was: Patch Available) > Switch all FSTs to use

[jira] [Updated] (LUCENE-8895) Switch all FSTs to use direct addressing optimization

2019-07-02 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8895: - Fix Version/s: 8.2 > Switch all FSTs to use direct addressing optimization >

[jira] [Resolved] (LUCENE-8781) Explore FST direct array arc encoding

2019-07-02 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov resolved LUCENE-8781. -- Resolution: Fixed > Explore FST direct array arc encoding >

[jira] [Commented] (LUCENE-8781) Explore FST direct array arc encoding

2019-07-02 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877104#comment-16877104 ] Mike Sokolov commented on LUCENE-8781: -- The extension of this feature to more use cases is tracked

[jira] [Updated] (LUCENE-8895) Switch all FSTs to use direct addressing optimization

2019-06-30 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8895: - Status: Patch Available (was: Open) > Switch all FSTs to use direct addressing optimization >

[jira] [Created] (LUCENE-8895) Switch all FSTs to use direct addressing optimization

2019-06-30 Thread Mike Sokolov (JIRA)
Mike Sokolov created LUCENE-8895: Summary: Switch all FSTs to use direct addressing optimization Key: LUCENE-8895 URL: https://issues.apache.org/jira/browse/LUCENE-8895 Project: Lucene - Core

[jira] [Commented] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-29 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875610#comment-16875610 ] Mike Sokolov commented on LUCENE-8781: -- Well, there is an easy fix for {{blocktreeords}}, but it

[jira] [Commented] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-29 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875607#comment-16875607 ] Mike Sokolov commented on LUCENE-8781: -- Hmm, I found that {{blocktreeords}} codec has some

[jira] [Commented] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-29 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875600#comment-16875600 ] Mike Sokolov commented on LUCENE-8781: -- Funny you should mention this - I just today tested

[jira] [Resolved] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-29 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov resolved LUCENE-8871. -- Resolution: Fixed > Move Kuromoji DictionaryBuilder tool from src/tools to src/ >

[jira] [Updated] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-29 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8871: - Fix Version/s: 8.2 > Move Kuromoji DictionaryBuilder tool from src/tools to src/ >

[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-27 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874290#comment-16874290 ] Mike Sokolov commented on SOLR-13571: - Have we ever tried publishing a site map? Google used to have

[jira] [Commented] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-27 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874014#comment-16874014 ] Mike Sokolov commented on LUCENE-8871: -- I see, thanks for explaining. I was reading the commit

[jira] [Commented] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-27 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874009#comment-16874009 ] Mike Sokolov commented on LUCENE-8871: -- I see what you did there [~jpountz]! Thank you for fixing.

[jira] [Commented] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-27 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874008#comment-16874008 ] Mike Sokolov commented on LUCENE-8871: -- I see what you did there! Thank you for fixing. I have to

[jira] [Commented] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-25 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872790#comment-16872790 ] Mike Sokolov commented on LUCENE-8871: -- Thanks for reviewing. FYI I will be delayed a bit in

[jira] [Commented] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-24 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871403#comment-16871403 ] Mike Sokolov commented on LUCENE-8871: -- This has been up for a day, and is I think pretty

[jira] [Commented] (LUCENE-8869) Build kuromoji system dictionary as a separated jar and load it from JapaneseTokenizer at runtime

2019-06-23 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870518#comment-16870518 ] Mike Sokolov commented on LUCENE-8869: -- [~tomoko] there might be some minor conflicts with

[jira] [Commented] (LUCENE-8870) Support numeric value in Field class

2019-06-22 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870200#comment-16870200 ] Mike Sokolov commented on LUCENE-8870: -- Personally I find the Field type-facade kind of annoying;

[jira] [Resolved] (LUCENE-8863) Improve Kuromoji DictionaryBuilder error handling, and enable loading external dictionary for testing

2019-06-20 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov resolved LUCENE-8863. -- Resolution: Fixed > Improve Kuromoji DictionaryBuilder error handling, and enable loading >

[jira] [Updated] (LUCENE-8863) Improve Kuromoji DictionaryBuilder error handling, and enable loading external dictionary for testing

2019-06-20 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8863: - Fix Version/s: 8.2 > Improve Kuromoji DictionaryBuilder error handling, and enable loading >

[jira] [Commented] (LUCENE-8816) Decouple Kuromoji's morphological analyser and its dictionary

2019-06-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867977#comment-16867977 ] Mike Sokolov commented on LUCENE-8816: -- LUCENE-8871 opened to cover moving dictionary builder tools

[jira] [Created] (LUCENE-8871) Move Kuromoji DictionaryBuilder tool from src/tools to src/

2019-06-19 Thread Mike Sokolov (JIRA)
Mike Sokolov created LUCENE-8871: Summary: Move Kuromoji DictionaryBuilder tool from src/tools to src/ Key: LUCENE-8871 URL: https://issues.apache.org/jira/browse/LUCENE-8871 Project: Lucene - Core

[jira] [Commented] (LUCENE-8863) Improve Kuromoji DictionaryBuilder error handling, and enable loading external dictionary for testing

2019-06-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867970#comment-16867970 ] Mike Sokolov commented on LUCENE-8863: -- Agreed - I'll edit the description to indicate how we added

[jira] [Updated] (LUCENE-8863) Improve Kuromoji DictionaryBuilder error handling, and enable loading external dictionary for testing

2019-06-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8863: - Summary: Improve Kuromoji DictionaryBuilder error handling, and enable loading external

[jira] [Comment Edited] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867966#comment-16867966 ] Mike Sokolov edited comment on LUCENE-8781 at 6/19/19 8:09 PM: --- re-closing

[jira] [Resolved] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov resolved LUCENE-8781. -- Resolution: Fixed re-closing after pushing fix that handled missing case (in memory codec) >

[jira] [Commented] (LUCENE-8863) Improve handling of edge cases in Kuromoji's DIctionaryBuilder

2019-06-18 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866918#comment-16866918 ] Mike Sokolov commented on LUCENE-8863: --  I'll push this in a couple of days if there are not other

[jira] [Commented] (LUCENE-8866) Remove ICU dependency of kuromoji tools/test-tools

2019-06-18 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866916#comment-16866916 ] Mike Sokolov commented on LUCENE-8866: -- +1 if people have more precise normalization requirements,

[jira] [Comment Edited] (LUCENE-8863) Improve handling of edge cases in Kuromoji's DIctionaryBuilder

2019-06-17 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865906#comment-16865906 ] Mike Sokolov edited comment on LUCENE-8863 at 6/17/19 7:10 PM: --- OK, I will

[jira] [Comment Edited] (LUCENE-8863) Improve handling of edge cases in Kuromoji's DIctionaryBuilder

2019-06-17 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865906#comment-16865906 ] Mike Sokolov edited comment on LUCENE-8863 at 6/17/19 7:08 PM: --- OK, I will

[jira] [Commented] (LUCENE-8863) Improve handling of edge cases in Kuromoji's DIctionaryBuilder

2019-06-17 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865906#comment-16865906 ] Mike Sokolov commented on LUCENE-8863: -- OK, I will check for empty base form and raise an

[jira] [Comment Edited] (LUCENE-8863) Improve handling of edge cases in Kuromoji's DIctionaryBuilder

2019-06-15 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864841#comment-16864841 ] Mike Sokolov edited comment on LUCENE-8863 at 6/15/19 7:56 PM: --- {quote}Can

[jira] [Commented] (LUCENE-8863) Improve handling of edge cases in Kuromoji's DIctionaryBuilder

2019-06-15 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864841#comment-16864841 ] Mike Sokolov commented on LUCENE-8863: -- {quote}Can we just throw an exception on empty base form?

[jira] [Commented] (LUCENE-8863) Improve handling of edge cases in Kuromoji's DIctionaryBuilder

2019-06-15 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864701#comment-16864701 ] Mike Sokolov commented on LUCENE-8863: -- I'll submit a patch soon. My initial idea was to maintain

[jira] [Commented] (LUCENE-8816) Decouple Kuromoji's morphological analyser and its dictionary

2019-06-15 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864672#comment-16864672 ] Mike Sokolov commented on LUCENE-8816: -- I opened LUCENE-8863 to cover some small, but blocking,

[jira] [Created] (LUCENE-8863) Improve handling of edge cases in Kuromoji's DIctionaryBuilder

2019-06-15 Thread Mike Sokolov (JIRA)
Mike Sokolov created LUCENE-8863: Summary: Improve handling of edge cases in Kuromoji's DIctionaryBuilder Key: LUCENE-8863 URL: https://issues.apache.org/jira/browse/LUCENE-8863 Project: Lucene -

[jira] [Commented] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-15 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864668#comment-16864668 ] Mike Sokolov commented on LUCENE-8781: -- Thanks for testing, [~dsmiley], you definitely found a bug.

[jira] [Commented] (LUCENE-8816) Decouple Kuromoji's morphological analyser and its dictionary

2019-06-11 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861622#comment-16861622 ] Mike Sokolov commented on LUCENE-8816: -- Thanks Robert, yeah I understand this was built for a

[jira] [Commented] (LUCENE-8816) Decouple Kuromoji's morphological analyser and its dictionary

2019-06-11 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861609#comment-16861609 ] Mike Sokolov commented on LUCENE-8816: -- I see that in {{BinaryDictionaryWriter}} we restrict

[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-10 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860382#comment-16860382 ] Mike Sokolov commented on LUCENE-8791: -- bq. We distribute total number of results we are looking

[jira] [Commented] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-08 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859227#comment-16859227 ] Mike Sokolov commented on LUCENE-8781: -- Got it, thanks. Yeah this was a tiny change, doesn't seem

[jira] [Updated] (LUCENE-8844) Bump FST Version (to 7)

2019-06-08 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8844: - Summary: Bump FST Version (to 7) (was: Bump FST Version) > Bump FST Version (to 7) >

[jira] [Updated] (LUCENE-8844) Bump FST Version

2019-06-08 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8844: - Description: In LUCENE-8781, we changed the FST encoding but did not bump the version number

[jira] [Assigned] (LUCENE-8844) Bump FST Version

2019-06-08 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov reassigned LUCENE-8844: Assignee: Mike Sokolov > Bump FST Version > > > Key:

[jira] [Created] (LUCENE-8844) Bump FST Version

2019-06-08 Thread Mike Sokolov (JIRA)
Mike Sokolov created LUCENE-8844: Summary: Bump FST Version Key: LUCENE-8844 URL: https://issues.apache.org/jira/browse/LUCENE-8844 Project: Lucene - Core Issue Type: Bug

[jira] [Commented] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-08 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859220#comment-16859220 ] Mike Sokolov commented on LUCENE-8781: -- OK, I see we write a version header and then check it for

[jira] [Commented] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-06 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858079#comment-16858079 ] Mike Sokolov commented on LUCENE-8781: -- I think I -- did not understand how to edit CHANGES.txt

[jira] [Comment Edited] (LUCENE-8816) Decouple Kuromoji's morphological analyser and its dictionary

2019-05-28 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849731#comment-16849731 ] Mike Sokolov edited comment on LUCENE-8816 at 5/28/19 1:41 PM: --- What if we

[jira] [Commented] (LUCENE-8816) Decouple Kuromoji's morphological analyser and its dictionary

2019-05-28 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849731#comment-16849731 ] Mike Sokolov commented on LUCENE-8816: -- What if we changed the various dictionary classes to

[jira] [Resolved] (LUCENE-8781) Explore FST direct array arc encoding

2019-05-27 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov resolved LUCENE-8781. -- Resolution: Fixed Pushed to 8.x (and 7.x, although it seems there will be no future 7.x

[jira] [Updated] (LUCENE-8781) Explore FST direct array arc encoding

2019-05-26 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8781: - Fix Version/s: (was: 8.x) 8.2 > Explore FST direct array arc encoding >

[jira] [Updated] (LUCENE-8781) Explore FST direct array arc encoding

2019-05-26 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8781: - Fix Version/s: 8.x > Explore FST direct array arc encoding >

[jira] [Reopened] (LUCENE-8781) Explore FST direct array arc encoding

2019-05-26 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov reopened LUCENE-8781: -- reopening to track backporting this improvement to 8.x and 7.x > Explore FST direct array arc

[jira] [Commented] (LUCENE-4012) Make all query classes serializable, and provide a query parser to consume them

2019-05-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843445#comment-16843445 ] Mike Sokolov commented on LUCENE-4012: -- I want to hijack this issue to be about maing Query

[jira] [Updated] (LUCENE-4012) Make all query classes serializable, and provide a query parser to consume them

2019-05-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-4012: - Summary: Make all query classes serializable, and provide a query parser to consume them (was:

[jira] [Commented] (LUCENE-8798) Autogenerated ID for LeafReaderContexts Within An IndexSearcher

2019-05-13 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838538#comment-16838538 ] Mike Sokolov commented on LUCENE-8798: -- I think what confused me was the link to the other JIRA

[jira] [Commented] (LUCENE-8798) Autogenerated ID for LeafReaderContexts Within An IndexSearcher

2019-05-13 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838497#comment-16838497 ] Mike Sokolov commented on LUCENE-8798: -- [~atris] I glanced at the issue you referenced, but I don't

[jira] [Commented] (LUCENE-8780) Improve ByteBufferGuard in Java 11

2019-04-28 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828826#comment-16828826 ] Mike Sokolov commented on LUCENE-8780: -- I don't have a good theory, but I was curious so I ran a

[jira] [Updated] (LUCENE-8781) Explore FST direct array arc encoding

2019-04-27 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8781: - Description: This issue is for exploring an alternate FST encoding of Arcs as full-sized

[jira] [Updated] (LUCENE-8781) Explore FST direct array arc encoding

2019-04-27 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Sokolov updated LUCENE-8781: - Description: This issue is for exploring an alternate FST encoding of Arcs as full-sized

[jira] [Created] (LUCENE-8781) Explore FST direct array arc encoding

2019-04-27 Thread Mike Sokolov (JIRA)
Mike Sokolov created LUCENE-8781: Summary: Explore FST direct array arc encoding Key: LUCENE-8781 URL: https://issues.apache.org/jira/browse/LUCENE-8781 Project: Lucene - Core Issue Type:

[jira] [Commented] (LUCENE-8681) Prorated early termination

2019-04-15 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818266#comment-16818266 ] Mike Sokolov commented on LUCENE-8681: -- I updated the PR with a new patch that changes the API for

[jira] [Commented] (LUCENE-8753) New PostingFormat - UniformSplit

2019-04-03 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809124#comment-16809124 ] Mike Sokolov commented on LUCENE-8753: -- The behavior I'm referring to isn't a problem with the

[jira] [Commented] (LUCENE-8753) New PostingFormat - UniformSplit

2019-04-03 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808860#comment-16808860 ] Mike Sokolov commented on LUCENE-8753: -- I've been working on some other FST-related changes, and

[jira] [Commented] (LUCENE-8750) Implement setMissingValue for numeric ValueSource sortFields

2019-04-02 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807816#comment-16807816 ] Mike Sokolov commented on LUCENE-8750: -- Here's a PR: https://github.com/apache/lucene-solr/pull/631

[jira] [Created] (LUCENE-8750) Implement setMissingValue for numeric ValueSource sortFields

2019-04-02 Thread Mike Sokolov (JIRA)
Mike Sokolov created LUCENE-8750: Summary: Implement setMissingValue for numeric ValueSource sortFields Key: LUCENE-8750 URL: https://issues.apache.org/jira/browse/LUCENE-8750 Project: Lucene - Core

[jira] [Commented] (LUCENE-8700) Enable concurrent flushing when no indexing is in progress

2019-02-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772305#comment-16772305 ] Mike Sokolov commented on LUCENE-8700: -- Pull request for this issue:

[jira] [Created] (LUCENE-8700) Enable concurrent flushing when no indexing is in progress

2019-02-19 Thread Mike Sokolov (JIRA)
Mike Sokolov created LUCENE-8700: Summary: Enable concurrent flushing when no indexing is in progress Key: LUCENE-8700 URL: https://issues.apache.org/jira/browse/LUCENE-8700 Project: Lucene - Core

[jira] [Commented] (LUCENE-8681) Prorated early termination

2019-02-19 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771986#comment-16771986 ] Mike Sokolov commented on LUCENE-8681: -- I posted [a new

[jira] [Commented] (LUCENE-8681) Prorated early termination

2019-02-15 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769854#comment-16769854 ] Mike Sokolov commented on LUCENE-8681: -- bq. ... doMaxScore and trackTotalHits (did you mean

[jira] [Commented] (LUCENE-8681) Prorated early termination

2019-02-15 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769698#comment-16769698 ] Mike Sokolov commented on LUCENE-8681: -- There are a bunch of different ways to provide for opt-in

[jira] [Commented] (LUCENE-8681) Prorated early termination

2019-02-13 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767360#comment-16767360 ] Mike Sokolov commented on LUCENE-8681: -- Yes, I guess it would be necessary to pass a

[jira] [Commented] (LUCENE-8681) Prorated early termination

2019-02-12 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766112#comment-16766112 ] Mike Sokolov commented on LUCENE-8681: -- bq. so from my perspective, api change is not really crazy

[jira] [Commented] (SOLR-13233) SpellCheckCollator ignores stacked tokens

2019-02-10 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764410#comment-16764410 ] Mike Sokolov commented on SOLR-13233: - I wonder if SpellCheckCollator should just ignore all stacked

[jira] [Commented] (LUCENE-8681) Prorated early termination

2019-02-09 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764163#comment-16764163 ] Mike Sokolov commented on LUCENE-8681: -- I hope I'm not reading this the right way (?!? :), but I do

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-02-07 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762692#comment-16762692 ] Mike Sokolov commented on LUCENE-8635: -- [~akjain] that's strange yeah -- this patch was supposed to

[jira] [Comment Edited] (LUCENE-8681) Prorated early termination

2019-02-07 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762656#comment-16762656 ] Mike Sokolov edited comment on LUCENE-8681 at 2/7/19 1:28 PM: -- bq. However

[jira] [Commented] (LUCENE-8681) Prorated early termination

2019-02-07 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762656#comment-16762656 ] Mike Sokolov commented on LUCENE-8681: -- bq. However I wonder if this could be implemented directly

  1   2   3   4   >