[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462497#comment-16462497
]
ASF subversion and git services commented on LUCENE-8231:
-
Commit
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462495#comment-16462495
]
ASF subversion and git services commented on LUCENE-8231:
-
Commit
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437242#comment-16437242
]
ASF subversion and git services commented on LUCENE-8231:
-
Commit
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437241#comment-16437241
]
ASF subversion and git services commented on LUCENE-8231:
-
Commit
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437162#comment-16437162
]
Uwe Schindler commented on LUCENE-8231:
---
+1 to backport
> Nori, a Korean analyzer based on
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437141#comment-16437141
]
Robert Muir commented on LUCENE-8231:
-
+1 to backport
> Nori, a Korean analyzer based on
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437065#comment-16437065
]
Jim Ferenczi commented on LUCENE-8231:
--
Thanks a lot Robert ! Any objections to backport to 7x ?
>
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437064#comment-16437064
]
ASF subversion and git services commented on LUCENE-8231:
-
Commit
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436664#comment-16436664
]
Robert Muir commented on LUCENE-8231:
-
+1 to commit the latest patch. Thanks for all the work here.
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436123#comment-16436123
]
Jim Ferenczi commented on LUCENE-8231:
--
I attached a new patch that restores the
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436086#comment-16436086
]
Robert Muir commented on LUCENE-8231:
-
We may want to make a new issue and link. LUCENE-4065 was
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436066#comment-16436066
]
Jim Ferenczi commented on LUCENE-8231:
--
Ok I'll restore the KoreanPartOfSpeechStopFilter then and we
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436063#comment-16436063
]
Robert Muir commented on LUCENE-8231:
-
I didn't really see consensus on this issue though (there was
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436042#comment-16436042
]
Jim Ferenczi commented on LUCENE-8231:
--
I think that the Japanese analyzer has the same issue and
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436030#comment-16436030
]
Robert Muir commented on LUCENE-8231:
-
I don't understand why it needs to change posLength, I think
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436019#comment-16436019
]
Jim Ferenczi commented on LUCENE-8231:
--
No because FilteringTokenFilter doesn't handle
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436006#comment-16436006
]
Robert Muir commented on LUCENE-8231:
-
Shouldn't FilteringTokenFilter be enough? It just requires you
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436001#comment-16436001
]
Jim Ferenczi commented on LUCENE-8231:
--
I agree this will also simplify the understanding of these
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435964#comment-16435964
]
Robert Muir commented on LUCENE-8231:
-
And i havent looked into why the tokenizer takes stoptags.
It
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435951#comment-16435951
]
Robert Muir commented on LUCENE-8231:
-
Do you think we should remove
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435922#comment-16435922
]
Jim Ferenczi commented on LUCENE-8231:
--
Sure, I added two more ctr in the last patch, one with
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435888#comment-16435888
]
Robert Muir commented on LUCENE-8231:
-
If the UserDictionary is optional can we just have a no-arg
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435802#comment-16435802
]
Jim Ferenczi commented on LUCENE-8231:
--
Right, I changed the Analyzer but not the Tokenizer. I
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435770#comment-16435770
]
Robert Muir commented on LUCENE-8231:
-
I still don't see a KoreanTokenizer ctor that uses these
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435755#comment-16435755
]
Jim Ferenczi commented on LUCENE-8231:
--
I attached a new patch that passes precommit checks. The
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435685#comment-16435685
]
Robert Muir commented on LUCENE-8231:
-
i don't think it should. it is very specific to what
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435655#comment-16435655
]
David Smiley commented on LUCENE-8231:
--
I think I've seen this root arc caching technique in at
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435496#comment-16435496
]
Robert Muir commented on LUCENE-8231:
-
very nice. may want to look at generated javadocs (last patch
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435420#comment-16435420
]
Jim Ferenczi commented on LUCENE-8231:
--
Thanks Robert.
I attached a new patch that changes the enum
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434875#comment-16434875
]
Robert Muir commented on LUCENE-8231:
-
An easy win related to this is to make enum values have real
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434869#comment-16434869
]
Robert Muir commented on LUCENE-8231:
-
We may want to tweak the attributes reflection, or look at
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432078#comment-16432078
]
Jim Ferenczi commented on LUCENE-8231:
--
I attached a new patch that fixes an issue with offsets of
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425218#comment-16425218
]
Jim Ferenczi commented on LUCENE-8231:
--
Hi Robert,
I pushed another iteration that moves the
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423345#comment-16423345
]
Robert Muir commented on LUCENE-8231:
-
Hi Jim, the latest changes look great. Thanks for optimizing
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422791#comment-16422791
]
Jim Ferenczi commented on LUCENE-8231:
--
I attached a new patch with lots of cleanups and fixes. I
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421999#comment-16421999
]
Jim Ferenczi commented on LUCENE-8231:
--
Hi Robert, thanks for your testings and suggestions !
I
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421364#comment-16421364
]
Robert Muir commented on LUCENE-8231:
-
Another thing to look at is if we really need two bytes for
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421353#comment-16421353
]
Robert Muir commented on LUCENE-8231:
-
There is still quite a bit of redundancy in the compound
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421333#comment-16421333
]
Robert Muir commented on LUCENE-8231:
-
I looked at the recent patch, one thing we need to warn about
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421320#comment-16421320
]
Robert Muir commented on LUCENE-8231:
-
Hi Jim, I dug into this a bit more to explain your results,
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420467#comment-16420467
]
Jim Ferenczi commented on LUCENE-8231:
--
I attached a new patch that adds a better compression for
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420404#comment-16420404
]
Robert Muir commented on LUCENE-8231:
-
Thanks for uploading the patch! I will dig into this some,
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419178#comment-16419178
]
Jim Ferenczi commented on LUCENE-8231:
--
Sure I attached a new patch (LUCENE-8231-remap-hangul.patch)
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419140#comment-16419140
]
Robert Muir commented on LUCENE-8231:
-
well according to my commit years ago it was "smaller and much
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419121#comment-16419121
]
Jim Ferenczi commented on LUCENE-8231:
--
I tried this approach and generated a new FST with the remap
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418972#comment-16418972
]
Robert Muir commented on LUCENE-8231:
-
See code from that other analyzer for doing these kind of
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418952#comment-16418952
]
Robert Muir commented on LUCENE-8231:
-
Yeah i think thats a good sign that the huge binary searches
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1641#comment-1641
]
Jim Ferenczi commented on LUCENE-8231:
--
{quote}
and looking more, you'd need full byte range to do
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418884#comment-16418884
]
Robert Muir commented on LUCENE-8231:
-
yeah its just the general case that if you only have 256
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418863#comment-16418863
]
Dawid Weiss commented on LUCENE-8231:
-
Ah, sorry. I though it's using FST directly. When I
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418845#comment-16418845
]
Robert Muir commented on LUCENE-8231:
-
{quote}
I think that root cache is already restricted to a
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418840#comment-16418840
]
Robert Muir commented on LUCENE-8231:
-
and looking more, you'd need full byte range to do that. So a
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418833#comment-16418833
]
Dawid Weiss commented on LUCENE-8231:
-
bq. Root arc caching of all syllables is heavy there, thats a
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418821#comment-16418821
]
Robert Muir commented on LUCENE-8231:
-
Just in case its unclear in the above, i'm saying input of 한
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418793#comment-16418793
]
Robert Muir commented on LUCENE-8231:
-
I think its ok if they don't share code initially. We can try
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418785#comment-16418785
]
Uwe Schindler commented on LUCENE-8231:
---
How about sharing code between the 2 extensions. I had no
[
https://issues.apache.org/jira/browse/LUCENE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418762#comment-16418762
]
Robert Muir commented on LUCENE-8231:
-
{quote}
The expression that contains the decompounds can also
57 matches
Mail list logo