[GitHub] [lucene-solr] dsmiley commented on pull request #1592: SOLR-14579 First pass at dismantling Utils
dsmiley commented on pull request #1592: URL: https://github.com/apache/lucene-solr/pull/1592#issuecomment-647082412 +1 Can you see who added these and get their attention here for their opinion?. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14581) Document the way auto commits work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141299#comment-17141299 ] David Smiley commented on SOLR-14581: - Thanks for improving Solr's documentation! For reference, your patch is simply the following: bq. +TIP: Each node has its own auto commit timer which starts upon receipt of an update. While Solr promises eventual consistency, leaders will generally receive updates *before* replicas; it is therefore possible for replicas to lag behind somewhat. > TIP: If this is a tip then... well what is the advise you are offering? Perhaps "NOTE:" is better. (see `about-this-guide.adoc`) is better. > Each node has its own auto commit timer No, each *core* (replica) has one. Nodes can host many cores which act independently. I'd like to propose the following new language. I thought about your approach of including some rationale but I think it's way more important to point out the consequences than the causes. bq. +NOTE: Using auto soft commit or commitWithin requires the client app to embrace the realities of "eventual consistency". Solr will make documents searchable at _roughly_ the same time across NRT replicas of a collection but there are no hard guarantees. Consequently, in rare cases, it's possible for a document to show up in one search only for it not to appear in a subsequent search occurring immediately after when the second is routed to a different replica. Also, documents added in a particular order (even in the same batch) might become searchable out of order of submission when there is sharding. CC [~erickerickson] > Document the way auto commits work in SolrCloud > --- > > Key: SOLR-14581 > URL: https://issues.apache.org/jira/browse/SOLR-14581 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation, SolrCloud >Affects Versions: master (9.0) >Reporter: Bram Van Dam >Priority: Minor > Attachments: SOLR-14581.patch > > > The documentation is unclear about how auto commits actually work in > SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. > Erick's reply verbatim: > {quote}Each node has its own timer that starts when it receives an update. > So in your situation, 60 seconds after any give replica gets it’s first > update, all documents that have been received in the interval will > be committed. > But note several things: > 1> commits will tend to cluster for a given shard. By that I mean > they’ll tend to happen within a few milliseconds of each other >‘cause it doesn’t take that long for an update to get from the >leader to all the followers. > 2> this is per replica. So if you host replicas from multiple collections >on some node, their commits have no relation to each other. And >say for some reason you transmit exactly one document that lands >on shard1. Further, say nodeA contains replicas for shard1 and shard2. >Only the replica for shard1 would commit. > 3> Solr promises eventual consistency. In this case, due to all the >timing variables it is not guaranteed that every replica of a single >shard has the same document available for search at any given time. >Say doc1 hits the leader at time T and a follower at time T+10ms. >Say doc2 hits the leader and gets indexed 5ms before the >commit is triggered, but for some reason it takes 15ms for it to get >to the follower. The leader will be able to search doc2, but the > follower won’t until 60 seconds later.{quote} > Perhaps the subject deserves a section of its own, but I'll attach a patch > which includes the gist of Erick's reply as a Tip in the "indexing in > SolrCloud"-section. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9286) FST arc.copyOf clones BitTables and this can lead to excessive memory use
[ https://issues.apache.org/jira/browse/LUCENE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141249#comment-17141249 ] Tomoko Uchida commented on LUCENE-9286: --- Thanks Robert and Mike for your comments, bq. To get the benchmark to cover JapaneseAnalyzer (and the other CJK analyzers too, maybe?) we'd need to incorporate some documents that include text in ideographic scripts. I can work for preparing the corpus but I'm unusually busy for a while here; maybe I can start it next month... > FST arc.copyOf clones BitTables and this can lead to excessive memory use > - > > Key: LUCENE-9286 > URL: https://issues.apache.org/jira/browse/LUCENE-9286 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.5 >Reporter: Dawid Weiss >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.6 > > Attachments: screen-[1].png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I see a dramatic increase in the amount of memory required for construction > of (arguably large) automata. It currently OOMs with 8GB of memory consumed > for bit tables. I am pretty sure this didn't require so much memory before > (the automaton is ~50MB after construction). > Something bad happened in between. Thoughts, [~broustant], [~sokolov]? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14404) CoreContainer level custom requesthandlers
[ https://issues.apache.org/jira/browse/SOLR-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-14404: -- Description: caveats: * The class should be annotated with {{org.apache.solr.api.EndPoint}}. Which means only V2 APIs are supported * The path should have prefix {{/api/plugin}} add a plugin {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "add": { "name":"myplugin", "class": "full.ClassName" } }' http://localhost:8983/api/cluster/plugins {code} add a plugin from a package {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "add": { "name":"myplugin", "class": "pkgName:full.ClassName" , "version: "1.0" } }' http://localhost:8983/api/cluster/plugins {code} remove a plugin {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "remove": "myplugin" }' http://localhost:8983/api/cluster/plugins {code} The configuration will be stored in the {{clusterprops.json}} as {code:java} { "plugins" : { "myplugin" : {"class": "full.ClassName" } } } {code} example plugin {code:java} public class MyPlugin { private final CoreContainer coreContainer; public MyPlugin(CoreContainer coreContainer) { this.coreContainer = coreContainer; } @EndPoint(path = "/myplugin/path1", method = METHOD.GET, permission = READ) public void call(SolrQueryRequest req, SolrQueryResponse rsp){ rsp.add("myplugin.version", "2.0"); } } {code} This plugin will be accessible on all nodes at {{/api/myplugin/path1}}. It's possible to add more methods at different paths. Ensure that all paths start with {{myplugin}} because that is the name in which the plugin is registered with. So {{/myplugin/path2}} , {{/myplugin/my/deeply/nested/path}} are all valid paths. It's possible that the suer chooses to register the plugin with a different name. In that case , use a template variable as follows in paths {{$plugin-name/path1}} was: caveats: * The class should be annotated with {{org.apache.solr.api.EndPoint}}. Which means only V2 APIs are supported * The path should have prefix {{/api/plugin}} add a plugin {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "add": { "name":"myplugin", "class": "full.ClassName" } }' http://localhost:8983/api/cluster/plugins {code} add a plugin from a package {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "add": { "name":"myplugin", "class": "pkgName:full.ClassName" , "version: "1.0" } }' http://localhost:8983/api/cluster/plugins {code} remove a plugin {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "remove": "myplugin" }' http://localhost:8983/api/cluster/plugins {code} The configuration will be stored in the {{clusterprops.json}} as {code:java} { "plugins" : { "myplugin" : {"class": "full.ClassName" } } } {code} example plugin {code:java} @EndPoint(path = "/plugin/my/path", method = METHOD.GET, permission = READ) public class MyPlugin { private final CoreContainer coreContainer; public MyPlugin(CoreContainer coreContainer) { this.coreContainer = coreContainer; } @Command public void call(SolrQueryRequest req, SolrQueryResponse rsp){ rsp.add("myplugin.version", "2.0"); } } {code} This plugin will be accessible on all nodes at {{/api/plugin/my/path}} > CoreContainer level custom requesthandlers > -- > > Key: SOLR-14404 > URL: https://issues.apache.org/jira/browse/SOLR-14404 > Project: Solr > Issue Type: New Feature >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > caveats: > * The class should be annotated with {{org.apache.solr.api.EndPoint}}. > Which means only V2 APIs are supported > * The path should have prefix {{/api/plugin}} > add a plugin > {code:java} > curl -X POST -H 'Content-type:application/json' --data-binary ' > { > "add": { > "name":"myplugin", "class": "full.ClassName" > } > }' http://localhost:8983/api/cluster/plugins > {code} > add a plugin from a package > {code:java} > curl -X POST -H 'Content-type:application/json' --data-binary ' > { > "add": { > "name":"myplugin", "class": "pkgName:full.ClassName" , > "version: "1.0" > } > }' http://localhost:8983/api/cluster/plugins > {code} > remove a plugin > {code:java} > curl -X POST -H 'Content-type:application/json' --data-binary ' > { > "remove": "myplugin" > }' http://localhost:8983/api/cluster/plugins > {code} > The configuration will be stored in the {{clusterprops.json}} > as > {code:java} > { > "plugins" : { > "myplugin" : {"class": "full.ClassName" } > } > } > {code} > example plugin > {code:java} > public class MyPlugin { > private final
[jira] [Updated] (SOLR-14404) CoreContainer level custom requesthandlers
[ https://issues.apache.org/jira/browse/SOLR-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-14404: -- Description: caveats: * The class should be annotated with {{org.apache.solr.api.EndPoint}}. Which means only V2 APIs are supported * The path should have prefix {{/api/plugin}} add a plugin {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "add": { "name":"myplugin", "class": "full.ClassName" } }' http://localhost:8983/api/cluster/plugins {code} add a plugin from a package {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "add": { "name":"myplugin", "class": "pkgName:full.ClassName" , "version: "1.0" } }' http://localhost:8983/api/cluster/plugins {code} remove a plugin {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "remove": "myplugin" }' http://localhost:8983/api/cluster/plugins {code} The configuration will be stored in the {{clusterprops.json}} as {code:java} { "plugins" : { "myplugin" : {"class": "full.ClassName" } } } {code} example plugin {code:java} public class MyPlugin { private final CoreContainer coreContainer; public MyPlugin(CoreContainer coreContainer) { this.coreContainer = coreContainer; } @EndPoint(path = "/myplugin/path1", method = METHOD.GET, permission = READ) public void call(SolrQueryRequest req, SolrQueryResponse rsp){ rsp.add("myplugin.version", "2.0"); } } {code} This plugin will be accessible on all nodes at {{/api/myplugin/path1}}. It's possible to add more methods at different paths. Ensure that all paths start with {{myplugin}} because that is the name in which the plugin is registered with. So {{/myplugin/path2}} , {{/myplugin/my/deeply/nested/path}} are all valid paths. It's possible that the user chooses to register the plugin with a different name. In that case , use a template variable as follows in paths {{$plugin-name/path1}} was: caveats: * The class should be annotated with {{org.apache.solr.api.EndPoint}}. Which means only V2 APIs are supported * The path should have prefix {{/api/plugin}} add a plugin {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "add": { "name":"myplugin", "class": "full.ClassName" } }' http://localhost:8983/api/cluster/plugins {code} add a plugin from a package {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "add": { "name":"myplugin", "class": "pkgName:full.ClassName" , "version: "1.0" } }' http://localhost:8983/api/cluster/plugins {code} remove a plugin {code:java} curl -X POST -H 'Content-type:application/json' --data-binary ' { "remove": "myplugin" }' http://localhost:8983/api/cluster/plugins {code} The configuration will be stored in the {{clusterprops.json}} as {code:java} { "plugins" : { "myplugin" : {"class": "full.ClassName" } } } {code} example plugin {code:java} public class MyPlugin { private final CoreContainer coreContainer; public MyPlugin(CoreContainer coreContainer) { this.coreContainer = coreContainer; } @EndPoint(path = "/myplugin/path1", method = METHOD.GET, permission = READ) public void call(SolrQueryRequest req, SolrQueryResponse rsp){ rsp.add("myplugin.version", "2.0"); } } {code} This plugin will be accessible on all nodes at {{/api/myplugin/path1}}. It's possible to add more methods at different paths. Ensure that all paths start with {{myplugin}} because that is the name in which the plugin is registered with. So {{/myplugin/path2}} , {{/myplugin/my/deeply/nested/path}} are all valid paths. It's possible that the suer chooses to register the plugin with a different name. In that case , use a template variable as follows in paths {{$plugin-name/path1}} > CoreContainer level custom requesthandlers > -- > > Key: SOLR-14404 > URL: https://issues.apache.org/jira/browse/SOLR-14404 > Project: Solr > Issue Type: New Feature >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > caveats: > * The class should be annotated with {{org.apache.solr.api.EndPoint}}. > Which means only V2 APIs are supported > * The path should have prefix {{/api/plugin}} > add a plugin > {code:java} > curl -X POST -H 'Content-type:application/json' --data-binary ' > { > "add": { > "name":"myplugin", "class": "full.ClassName" > } > }' http://localhost:8983/api/cluster/plugins > {code} > add a plugin from a package > {code:java} > curl -X POST -H 'Content-type:application/json' --data-binary ' > { > "add": { > "name":"myplugin", "class": "pkgName:full.ClassName" , > "version: "1.0" > } > }' http://localhost:8983/api/cluster/plugins > {code} > remove a plugin >
[jira] [Comment Edited] (LUCENE-9394) Fix or suppress compile-time warnings
[ https://issues.apache.org/jira/browse/LUCENE-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141165#comment-17141165 ] Erick Erickson edited comment on LUCENE-9394 at 6/20/20, 9:20 PM: -- Thanks, I propose that I just add the SuppressWarnings to the 8x code line. My reasoning is that, despite the effort I've been putting in to get clean compiles, what I've done hasn't actually _fixed_ anything. It has laid the groundwork for not getting worse is all (8,000 warnings in Solr sheesh!). Given that, it's hard for me to justify any changes affecting back-compat for a minor release, even if it's not that much of an inconvenience. Add to that that I imagine we'll be cutting 9.0 in the not too distant future and there'll be a limited amount of back-port pain. I could be persuaded otherwise, but that's my starting position... was (Author: erickerickson): Thanks, I propose that I just add the SuppressWarnings to the 8x code line. My reasoning is that, despite the effort I've been putting in to get clean compiles, what I've done hasn't actually _fixed_ anything. It has laid the groundwork for not getting worse is all (8,000 warnings in Solr sheesh!). Given that, it's hard for me to justify any changes affecting back-compat for a minor release, even if it's not that much of an inconvanience. Add to that that I imagine we'll be cutting 9.0 in the not too distant future and there'll be a limited amount of back-port pain. I could be persuaded otherwise, but that's my starting position... > Fix or suppress compile-time warnings > - > > Key: LUCENE-9394 > URL: https://issues.apache.org/jira/browse/LUCENE-9394 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Assignee: Michael Sokolov >Priority: Major > Fix For: master (9.0) > > Time Spent: 50m > Remaining Estimate: 0h > > This is a spinoff from [~erickerickson]'s efforts over in SOLR-10778 > The goal is a warning-free compilation, followed by enforcement of build > failure on warnings, with the idea of suppressing innocuous warnings to the > extent that the remaining warnings be treated as build failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14584) solr.in.cmd and solr.in.sh still reference obsolete jks files
[ https://issues.apache.org/jira/browse/SOLR-14584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aren Cambre updated SOLR-14584: --- Summary: solr.in.cmd and solr.in.sh still reference obsolete jks files (was: solr.in.cmd and solr.in.sh still reference jks files) > solr.in.cmd and solr.in.sh still reference obsolete jks files > - > > Key: SOLR-14584 > URL: https://issues.apache.org/jira/browse/SOLR-14584 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 8.5.2 >Reporter: Aren Cambre >Priority: Major > Labels: easyfix > Time Spent: 10m > Remaining Estimate: 0h > > When following the Enabling SSL documentation > ([https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html]), the end > result is an error if you miss a critical detail: that you need to change the > *.jks* file extension in two lines to *.p12*. > Please update the default *bin/solr.in.cmd* and *bin/solr.in.sh* files to > reference *p12* files. It appears that the JKS format has been left behind, > so there's no reason to reference those by default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14584) solr.in.cmd and solr.in.sh still reference obsolete jks files
[ https://issues.apache.org/jira/browse/SOLR-14584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aren Cambre updated SOLR-14584: --- Description: When following the Enabling SSL documentation ([https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html]), the end result is an error if you miss a critical detail: that you need to change the *.jks* file extension in two lines to *.p12*. Please update the default *bin/solr.in.cmd* and *bin/solr.in.sh* files to reference *p12* files. It appears that the JKS format is obsolete, so there's no reason to reference those by default. was: When following the Enabling SSL documentation ([https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html]), the end result is an error if you miss a critical detail: that you need to change the *.jks* file extension in two lines to *.p12*. Please update the default *bin/solr.in.cmd* and *bin/solr.in.sh* files to reference *p12* files. It appears that the JKS format has been left behind, so there's no reason to reference those by default. > solr.in.cmd and solr.in.sh still reference obsolete jks files > - > > Key: SOLR-14584 > URL: https://issues.apache.org/jira/browse/SOLR-14584 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 8.5.2 >Reporter: Aren Cambre >Priority: Major > Labels: easyfix > Time Spent: 10m > Remaining Estimate: 0h > > When following the Enabling SSL documentation > ([https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html]), the end > result is an error if you miss a critical detail: that you need to change the > *.jks* file extension in two lines to *.p12*. > Please update the default *bin/solr.in.cmd* and *bin/solr.in.sh* files to > reference *p12* files. It appears that the JKS format is obsolete, so there's > no reason to reference those by default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14584) solr.in.cmd and solr.in.sh still reference jks files
[ https://issues.apache.org/jira/browse/SOLR-14584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aren Cambre updated SOLR-14584: --- Description: When following the Enabling SSL documentation ([https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html]), the end result is an error if you miss a critical detail: that you need to change the *.jks* file extension in two lines to *.p12*. Please update the default *bin/solr.in.cmd* and *bin/solr.in.sh* files to reference *p12* files. It appears that the JKS format has been left behind, so there's no reason to reference those by default. was: When following the [Enabling SSL|[https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html]] documentation exactly, the end result is an error if you miss a critical detail: that you need to change the *.jks* file extension in two lines to *.p12*. Please update the default *bin/solr.in.cmd* file to reference *p12* files. It appears that the JKS format has been left behind, so there's no reason to reference those by default. > solr.in.cmd and solr.in.sh still reference jks files > > > Key: SOLR-14584 > URL: https://issues.apache.org/jira/browse/SOLR-14584 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 8.5.2 >Reporter: Aren Cambre >Priority: Major > Labels: easyfix > Time Spent: 10m > Remaining Estimate: 0h > > When following the Enabling SSL documentation > ([https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html]), the end > result is an error if you miss a critical detail: that you need to change the > *.jks* file extension in two lines to *.p12*. > Please update the default *bin/solr.in.cmd* and *bin/solr.in.sh* files to > reference *p12* files. It appears that the JKS format has been left behind, > so there's no reason to reference those by default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] arencambre opened a new pull request #1597: fixes SOLR-14584
arencambre opened a new pull request #1597: URL: https://github.com/apache/lucene-solr/pull/1597 # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14584) solr.in.cmd and solr.in.sh still reference jks files
Aren Cambre created SOLR-14584: -- Summary: solr.in.cmd and solr.in.sh still reference jks files Key: SOLR-14584 URL: https://issues.apache.org/jira/browse/SOLR-14584 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Server Affects Versions: 8.5.2 Reporter: Aren Cambre When following the [Enabling SSL|[https://lucene.apache.org/solr/guide/8_5/enabling-ssl.html]] documentation exactly, the end result is an error if you miss a critical detail: that you need to change the *.jks* file extension in two lines to *.p12*. Please update the default *bin/solr.in.cmd* file to reference *p12* files. It appears that the JKS format has been left behind, so there's no reason to reference those by default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9394) Fix or suppress compile-time warnings
[ https://issues.apache.org/jira/browse/LUCENE-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141165#comment-17141165 ] Erick Erickson commented on LUCENE-9394: Thanks, I propose that I just add the SuppressWarnings to the 8x code line. My reasoning is that, despite the effort I've been putting in to get clean compiles, what I've done hasn't actually _fixed_ anything. It has laid the groundwork for not getting worse is all (8,000 warnings in Solr sheesh!). Given that, it's hard for me to justify any changes affecting back-compat for a minor release, even if it's not that much of an inconvanience. Add to that that I imagine we'll be cutting 9.0 in the not too distant future and there'll be a limited amount of back-port pain. I could be persuaded otherwise, but that's my starting position... > Fix or suppress compile-time warnings > - > > Key: LUCENE-9394 > URL: https://issues.apache.org/jira/browse/LUCENE-9394 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Assignee: Michael Sokolov >Priority: Major > Fix For: master (9.0) > > Time Spent: 50m > Remaining Estimate: 0h > > This is a spinoff from [~erickerickson]'s efforts over in SOLR-10778 > The goal is a warning-free compilation, followed by enforcement of build > failure on warnings, with the idea of suppressing innocuous warnings to the > extent that the remaining warnings be treated as build failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14583) Spell suggestion is returned even if hits are non-zero when spellcheck.maxResultsForSuggest=0
[ https://issues.apache.org/jira/browse/SOLR-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-14583: Summary: Spell suggestion is returned even if hits are non-zero when spellcheck.maxResultsForSuggest=0 (was: Spell suggestions is returned even if hits are non-zero when spellcheck.maxResultsForSuggest=0) > Spell suggestion is returned even if hits are non-zero when > spellcheck.maxResultsForSuggest=0 > - > > Key: SOLR-14583 > URL: https://issues.apache.org/jira/browse/SOLR-14583 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Munendra S N >Assignee: Munendra S N >Priority: Major > > SOLR-4280 added to support fractional support for > spellcheck.maxResultsForSuggest. After SOLR-4280, > {{spellcheck.maxResultsForSuggest=0}} is treated same as not specify the > {{spellcheck.maxResultsForSuggest}} parameter. This can cause spell > suggestions to be returned even when hits are non-zero and greater than > {{spellcheck.maxResultsForSuggest}} (i.e, greater than 0) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14583) Spell suggestions is returned even if hits are non-zero when spellcheck.maxResultsForSuggest=0
Munendra S N created SOLR-14583: --- Summary: Spell suggestions is returned even if hits are non-zero when spellcheck.maxResultsForSuggest=0 Key: SOLR-14583 URL: https://issues.apache.org/jira/browse/SOLR-14583 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Munendra S N Assignee: Munendra S N SOLR-4280 added to support fractional support for spellcheck.maxResultsForSuggest. After SOLR-4280, {{spellcheck.maxResultsForSuggest=0}} is treated same as not specify the {{spellcheck.maxResultsForSuggest}} parameter. This can cause spell suggestions to be returned even when hits are non-zero and greater than {{spellcheck.maxResultsForSuggest}} (i.e, greater than 0) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14582) Expose IWC.setMaxCommitMergeWaitSeconds as an expert feature in Solr's index config
[ https://issues.apache.org/jira/browse/SOLR-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Eduardo Fernandez Lobbe updated SOLR-14582: - Summary: Expose IWC.setMaxCommitMergeWaitSeconds as an expert feature in Solr's index config (was: Exponse IWC.setMaxCommitMergeWaitSeconds as an expert feature in Solr's index config) > Expose IWC.setMaxCommitMergeWaitSeconds as an expert feature in Solr's index > config > --- > > Key: SOLR-14582 > URL: https://issues.apache.org/jira/browse/SOLR-14582 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tomas Eduardo Fernandez Lobbe >Priority: Trivial > > LUCENE-8962 added the ability to merge segments synchronously on commit. This > isn't done by default and the default {{MergePolicy}} won't do it, but custom > merge policies can take advantage of this. Solr allows plugging in custom > merge policies, so if someone wants to make use of this feature they could, > however, they need to set {{IndexWriterConfig.maxCommitMergeWaitSeconds}} to > something greater than 0. > Since this is an expert feature, I plan to document it only in javadoc and > not the ref guide. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9286) FST arc.copyOf clones BitTables and this can lead to excessive memory use
[ https://issues.apache.org/jira/browse/LUCENE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141107#comment-17141107 ] Michael Sokolov commented on LUCENE-9286: - > We could improve the analyzers nightly benchmark That makes sense. There is also the commented out {{TestJapaneseTokenizer.testWikipedia}} that tests performance of Kuromoji specifically, but one has to remember to run it. To get the benchmark to cover JapaneseAnalyzer (and the other CJK analyzers too, maybe?) we'd need to incorporate some documents that include text in ideographic scripts. It looks as if the benchmarks use English Wikipedia docs exclusively right now. luceneutil data seems to be kept in [~mikemccand]'s Apache homedir. Simplest first step would be to add a Japanese Wikipedia dump to that, but we could also source the data from somewhere else if need be ... > FST arc.copyOf clones BitTables and this can lead to excessive memory use > - > > Key: LUCENE-9286 > URL: https://issues.apache.org/jira/browse/LUCENE-9286 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.5 >Reporter: Dawid Weiss >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.6 > > Attachments: screen-[1].png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I see a dramatic increase in the amount of memory required for construction > of (arguably large) automata. It currently OOMs with 8GB of memory consumed > for bit tables. I am pretty sure this didn't require so much memory before > (the automaton is ~50MB after construction). > Something bad happened in between. Thoughts, [~broustant], [~sokolov]? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9394) Fix or suppress compile-time warnings
[ https://issues.apache.org/jira/browse/LUCENE-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141102#comment-17141102 ] Michael Sokolov commented on LUCENE-9394: - > Do you (or anyone else) want to weigh in on whether to backport this fix or > just SuppressWarnings in 8x for Lucene? I think it's down to what our back compat policy is. If we're OK with introducing breaking API changes in a minor release, then we should fix rather than suppress, but I was under the impression that we only made such changes on major releases. I personally feel this would be OK - it's a compilation failure, not a behavior change, so there's no risk someone gets a surprise; they just have to fix up their Map types or add SuppressWarnings. > Fix or suppress compile-time warnings > - > > Key: LUCENE-9394 > URL: https://issues.apache.org/jira/browse/LUCENE-9394 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Assignee: Michael Sokolov >Priority: Major > Fix For: master (9.0) > > Time Spent: 50m > Remaining Estimate: 0h > > This is a spinoff from [~erickerickson]'s efforts over in SOLR-10778 > The goal is a warning-free compilation, followed by enforcement of build > failure on warnings, with the idea of suppressing innocuous warnings to the > extent that the remaining warnings be treated as build failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9411) Fail complation on warnings
[ https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141101#comment-17141101 ] Michael Sokolov commented on LUCENE-9411: - > I was tending to the fail-early (i.e. not just on precommit) for the same >reason. After a bit of annoyance, people should be able to write the code >right the first time OK, I agree - I think this is appropriate for things like compiler warnings. I just want to make sure that for more stringent checks like style checks, javadoc, etc. we don't move them up to compile phase. We want to be able to make some speculative changes without worrying about all the fine points. Once we have some code that seems worth committing, then we can polish up the imports, the lines with trailing whitespace and so on. I think that's how it works now - precommit handles these fussier checks, right? > Fail complation on warnings > --- > > Key: LUCENE-9411 > URL: https://issues.apache.org/jira/browse/LUCENE-9411 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Labels: build > Attachments: LUCENE-9411.patch, LUCENE-9411.patch, LUCENE-9411.patch, > annotations-warnings.patch > > > Moving this over here from SOLR-11973 since it's part of the build system and > affects Lucene as well as Solr. You might want to see the discussion there. > We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, > try, etc. warnings. There are some peculiar warnings (things like > SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's > assume those are not a problem. Now I'd like to start failing the compilation > if people write new code that generates warnings. > From what I can tell, just adding the flag is easy in both the Gradle and Ant > builds. I still have to prove out that adding -Werrors does what I expect, > i.e. succeeds now and fails when I introduce warnings. > But let's assume that works. Are there objections to this idea generally? I > hope to have some data by next Monday. > FWIW, the Lucene code base had far fewer issues than Solr, but > common-build.xml is in Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9411) Fail complation on warnings
[ https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141096#comment-17141096 ] Dawid Weiss commented on LUCENE-9411: - A single one is fine I think but I'd rather have it separately to know it's a workaround for an odd behavior of javac than have it scattered around various build files. > Fail complation on warnings > --- > > Key: LUCENE-9411 > URL: https://issues.apache.org/jira/browse/LUCENE-9411 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Labels: build > Attachments: LUCENE-9411.patch, LUCENE-9411.patch, LUCENE-9411.patch, > annotations-warnings.patch > > > Moving this over here from SOLR-11973 since it's part of the build system and > affects Lucene as well as Solr. You might want to see the discussion there. > We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, > try, etc. warnings. There are some peculiar warnings (things like > SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's > assume those are not a problem. Now I'd like to start failing the compilation > if people write new code that generates warnings. > From what I can tell, just adding the flag is easy in both the Gradle and Ant > builds. I still have to prove out that adding -Werrors does what I expect, > i.e. succeeds now and fails when I introduce warnings. > But let's assume that works. Are there objections to this idea generally? I > hope to have some data by next Monday. > FWIW, the Lucene code base had far fewer issues than Solr, but > common-build.xml is in Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9411) Fail complation on warnings
[ https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141062#comment-17141062 ] Erick Erickson commented on LUCENE-9411: Sure. Are you thinking two different files, one for findbugs and one for error_prone? Or just a single file, something like gradle/hacks/annotations.gradle? > Fail complation on warnings > --- > > Key: LUCENE-9411 > URL: https://issues.apache.org/jira/browse/LUCENE-9411 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Labels: build > Attachments: LUCENE-9411.patch, LUCENE-9411.patch, LUCENE-9411.patch, > annotations-warnings.patch > > > Moving this over here from SOLR-11973 since it's part of the build system and > affects Lucene as well as Solr. You might want to see the discussion there. > We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, > try, etc. warnings. There are some peculiar warnings (things like > SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's > assume those are not a problem. Now I'd like to start failing the compilation > if people write new code that generates warnings. > From what I can tell, just adding the flag is easy in both the Gradle and Ant > builds. I still have to prove out that adding -Werrors does what I expect, > i.e. succeeds now and fails when I introduce warnings. > But let's assume that works. Are there objections to this idea generally? I > hope to have some data by next Monday. > FWIW, the Lucene code base had far fewer issues than Solr, but > common-build.xml is in Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9286) FST arc.copyOf clones BitTables and this can lead to excessive memory use
[ https://issues.apache.org/jira/browse/LUCENE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141058#comment-17141058 ] Robert Muir commented on LUCENE-9286: - We could improve the analyzers nightly benchmark: https://people.apache.org/~mikemccand/lucenebench/analyzers.html > FST arc.copyOf clones BitTables and this can lead to excessive memory use > - > > Key: LUCENE-9286 > URL: https://issues.apache.org/jira/browse/LUCENE-9286 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.5 >Reporter: Dawid Weiss >Assignee: Bruno Roustant >Priority: Major > Fix For: 8.6 > > Attachments: screen-[1].png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I see a dramatic increase in the amount of memory required for construction > of (arguably large) automata. It currently OOMs with 8GB of memory consumed > for bit tables. I am pretty sure this didn't require so much memory before > (the automaton is ~50MB after construction). > Something bad happened in between. Thoughts, [~broustant], [~sokolov]? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9413) Add a char filter corresponding to CJKWidthFilter
[ https://issues.apache.org/jira/browse/LUCENE-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida resolved LUCENE-9413. --- Resolution: Won't Fix > Add a char filter corresponding to CJKWidthFilter > - > > Key: LUCENE-9413 > URL: https://issues.apache.org/jira/browse/LUCENE-9413 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Tomoko Uchida >Priority: Minor > > In association with issues in Elasticsearch > ([https://github.com/elastic/elasticsearch/issues/58384] and > [https://github.com/elastic/elasticsearch/issues/58385]), it might be useful > for Japanese default analyzer. > Although I don't think it's a bug to not normalize FULL and HALF width > characters before tokenization, the behaviour sometimes confuses beginners or > users who have limited knowledge about Japanese analysis (and Unicode). > If we have a FULL and HALF width character normalization filter in > {{analyzers-common}}, we can include it into JapaneseAnalyzer (currently, > JapaneseAnalyzer contains CJKWidthFilter but it is applied after tokenization > so some of FULL width numbers or latin alphabets are separated by the > tokenizer). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9413) Add a char filter corresponding to CJKWidthFilter
[ https://issues.apache.org/jira/browse/LUCENE-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141009#comment-17141009 ] Tomoko Uchida commented on LUCENE-9413: --- The mecab-ipadic dictionary has entries which includes FULL width characters, so this naive approach - FULL / HALF width character normalization before tokenizing can break tokenization. :/ Maybe we could concat "unknown" word sequence which consists of only numbers or latin alphabets, after tokenization ? {code} $ cut -d',' -f1 mecab-ipadic-all-utf8.csv | grep 1 12月 1番 11月 1月 10月 G7プラス1 小1 高1 1つ F1 中1 110番 G1 1 ファスニング21 G10 インパクト21 アルゴテクノス21 セルヴィ21 モクネット21 U19 どさんこワイド212 西15線北 北13線 西14線北 北14線 西10号南 南1条 東11号北 東12線北 西11号北 駒場北1条通 東1線南 第1安井牧場 西10号北 東11線北 美旗町中1番 南21線西 南17線西 西10線北 岩内町第1基線 北15線 南12線西 東13線南 西13線北 西1線北 南16線西 西10線南 西16線北 西11線北 西12号北 西11線南 東10線北 北1線 東1線北 南13号 南14線西 南1線 北11線 西12線南 西14線南 南13線西 浦臼第1 西13線南 東10号北 南19線西 北1条 南11線西 平泉外12入会 東10線南 東10号南 南18線西 南15線西 東11号南 東12号北 北10線 駒場南1条通 南1番通 南10線西 北12線 西1線南 太田1の通り 東11線南 西12線北 東12線南 大泉1区南部 M40A1 F15戦闘機 DF31 F15 G1 辞林21 R12 O157 DF41 スーパー301 GP125 北13条東 M1A2 アポロ11号 {code} > Add a char filter corresponding to CJKWidthFilter > - > > Key: LUCENE-9413 > URL: https://issues.apache.org/jira/browse/LUCENE-9413 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Tomoko Uchida >Priority: Minor > > In association with issues in Elasticsearch > ([https://github.com/elastic/elasticsearch/issues/58384] and > [https://github.com/elastic/elasticsearch/issues/58385]), it might be useful > for Japanese default analyzer. > Although I don't think it's a bug to not normalize FULL and HALF width > characters before tokenization, the behaviour sometimes confuses beginners or > users who have limited knowledge about Japanese analysis (and Unicode). > If we have a FULL and HALF width character normalization filter in > {{analyzers-common}}, we can include it into JapaneseAnalyzer (currently, > JapaneseAnalyzer contains CJKWidthFilter but it is applied after tokenization > so some of FULL width numbers or latin alphabets are separated by the > tokenizer). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9411) Fail complation on warnings
[ https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140997#comment-17140997 ] Dawid Weiss commented on LUCENE-9411: - Can you move all those blocks into a separate single file (and apply to those projects that need it), Erik? {code:java} + // Prometeus exporter classes reference this although it's not part of the exported classpath + // which causes odd warnings during compilation. Shut it up with an explicit-version + // compile-only dependency (!). + compileOnly 'com.google.errorprone:error_prone_annotations:2.1.3'{code} It should be included from top-level (gradle/hacks/findbugs-annotations.gradle) and look something like this: {code:java} configure([project(":solr:foo"), project(":solr:bar")]) { plugins.withType(JavaPlugin) { // blah blah dependencies { compileOnly 'com.google.errorprone:error_prone_annotations:2.1.3' } } }{code} The "withType" bit is needed just in case the file is included before the java plugin is applied - then dependencies configuration wouldn't be resolved properly. > Fail complation on warnings > --- > > Key: LUCENE-9411 > URL: https://issues.apache.org/jira/browse/LUCENE-9411 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Labels: build > Attachments: LUCENE-9411.patch, LUCENE-9411.patch, LUCENE-9411.patch, > annotations-warnings.patch > > > Moving this over here from SOLR-11973 since it's part of the build system and > affects Lucene as well as Solr. You might want to see the discussion there. > We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, > try, etc. warnings. There are some peculiar warnings (things like > SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's > assume those are not a problem. Now I'd like to start failing the compilation > if people write new code that generates warnings. > From what I can tell, just adding the flag is easy in both the Gradle and Ant > builds. I still have to prove out that adding -Werrors does what I expect, > i.e. succeeds now and fails when I introduce warnings. > But let's assume that works. Are there objections to this idea generally? I > hope to have some data by next Monday. > FWIW, the Lucene code base had far fewer issues than Solr, but > common-build.xml is in Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9413) Add a char filter corresponding to CJKWidthFilter
[ https://issues.apache.org/jira/browse/LUCENE-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-9413: -- Description: In association with issues in Elasticsearch ([https://github.com/elastic/elasticsearch/issues/58384] and [https://github.com/elastic/elasticsearch/issues/58385]), it might be useful for Japanese default analyzer. Although I don't think it's a bug to not normalize FULL and HALF width characters before tokenization, the behaviour sometimes confuses beginners or users who have limited knowledge about Japanese analysis (and Unicode). If we have a FULL and HALF width character normalization filter in {{analyzers-common}}, we can include it into JapaneseAnalyzer (currently, JapaneseAnalyzer contains CJKWidthFilter but it is applied after tokenization so some of FULL width numbers or latin alphabets are separated by the tokenizer). was: In association with issues in Elasticsearch ([https://github.com/elastic/elasticsearch/issues/58384] and [https://github.com/elastic/elasticsearch/issues/58385]), it might be useful for Japanese default analyzer. Although I don't think it's a bug to not normalize FULL and HALF width characters before tokenization, the behaviour sometimes confuses beginners or users who have limited knowledge about Japanese analysis (and Unicode). If we have a FULL and HALF width character normalization filter in {{analyzers-common}}, we can include it into JapaneseAnalyzer (currently, JapaneseAnalyzer contains CJKWidthFilter but it is applied after tokenization so some of FULL width numbers or alphabets are separated by the tokenizer). > Add a char filter corresponding to CJKWidthFilter > - > > Key: LUCENE-9413 > URL: https://issues.apache.org/jira/browse/LUCENE-9413 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Tomoko Uchida >Priority: Minor > > In association with issues in Elasticsearch > ([https://github.com/elastic/elasticsearch/issues/58384] and > [https://github.com/elastic/elasticsearch/issues/58385]), it might be useful > for Japanese default analyzer. > Although I don't think it's a bug to not normalize FULL and HALF width > characters before tokenization, the behaviour sometimes confuses beginners or > users who have limited knowledge about Japanese analysis (and Unicode). > If we have a FULL and HALF width character normalization filter in > {{analyzers-common}}, we can include it into JapaneseAnalyzer (currently, > JapaneseAnalyzer contains CJKWidthFilter but it is applied after tokenization > so some of FULL width numbers or latin alphabets are separated by the > tokenizer). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9413) Add a char filter corresponding to CJKWidthFilter
[ https://issues.apache.org/jira/browse/LUCENE-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140976#comment-17140976 ] Tomoko Uchida commented on LUCENE-9413: --- I cannot take time for working on this soon, but wanted to hook it as an issue... comments and thoughts are welcomed. > Add a char filter corresponding to CJKWidthFilter > - > > Key: LUCENE-9413 > URL: https://issues.apache.org/jira/browse/LUCENE-9413 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Tomoko Uchida >Priority: Minor > > In association with issues in Elasticsearch > ([https://github.com/elastic/elasticsearch/issues/58384] and > [https://github.com/elastic/elasticsearch/issues/58385]), it might be useful > for Japanese default analyzer. > Although I don't think it's a bug to not normalize FULL and HALF width > characters before tokenization, the behaviour sometimes confuses beginners or > users who have limited knowledge about Japanese analysis (and Unicode). > If we have a FULL and HALF width character normalization filter in > {{analyzers-common}}, we can include it into JapaneseAnalyzer (currently, > JapaneseAnalyzer contains CJKWidthFilter but it is applied after tokenization > so some of FULL width numbers or alphabets are separated by the tokenizer). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org