[jira] [Resolved] (LUCENE-9344) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/LUCENE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida resolved LUCENE-9344. --- Fix Version/s: master (9.0) Resolution: Fixed > Convert XXX.txt files to proper XXX.md > --- > > Key: LUCENE-9344 > URL: https://issues.apache.org/jira/browse/LUCENE-9344 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: master (9.0) > > Time Spent: 1h > Remaining Estimate: 0h > > Text files that are (partially) written in markdown (such as "README.txt") > can be converted to proper markdown files. This change was suggested on > LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9344) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/LUCENE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091217#comment-17091217 ] ASF subversion and git services commented on LUCENE-9344: - Commit 75b648ce828f1131824330adc14a5ae1f850bc35 in lucene-solr's branch refs/heads/master from Tomoko Uchida [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=75b648c ] LUCENE-9344: Use https url for lucene.apache.org > Convert XXX.txt files to proper XXX.md > --- > > Key: LUCENE-9344 > URL: https://issues.apache.org/jira/browse/LUCENE-9344 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Text files that are (partially) written in markdown (such as "README.txt") > can be converted to proper markdown files. This change was suggested on > LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on pull request #1449: LUCENE-9344: Convert XXX.txt files to proper XXX.md
mocobeta commented on pull request #1449: URL: https://github.com/apache/lucene-solr/pull/1449#issuecomment-618810502 Thank you for reviewing, I just merge it on the master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9344) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/LUCENE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091213#comment-17091213 ] ASF subversion and git services commented on LUCENE-9344: - Commit c7697b088c955c9bcbd489145b396f1540c584d6 in lucene-solr's branch refs/heads/master from Tomoko Uchida [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c7697b0 ] LUCENE-9344: Convert .txt files to properly formatted .md files (#1449) > Convert XXX.txt files to proper XXX.md > --- > > Key: LUCENE-9344 > URL: https://issues.apache.org/jira/browse/LUCENE-9344 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Text files that are (partially) written in markdown (such as "README.txt") > can be converted to proper markdown files. This change was suggested on > LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14423) static caches in StreamHandler ought to move to CoreContainer lifecycle
[ https://issues.apache.org/jira/browse/SOLR-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091198#comment-17091198 ] David Smiley commented on SOLR-14423: - Great feedback AB. I agree on the dependency injection framework point. > static caches in StreamHandler ought to move to CoreContainer lifecycle > --- > > Key: SOLR-14423 > URL: https://issues.apache.org/jira/browse/SOLR-14423 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: David Smiley >Priority: Major > > StreamHandler (at "/stream") has several statically declared caches. I think > this is problematic, such as in testing wherein multiple nodes could be in > the same JVM. One of them is more serious -- SolrClientCache which is > closed/cleared via a SolrCore close hook. That's bad for performance but > also dangerous since another core might want to use one of these clients! > CC [~jbernste] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14434) Multiterm Analyzer Not Persisted in Managed Schema
[ https://issues.apache.org/jira/browse/SOLR-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trey Grainger updated SOLR-14434: - Description: In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This allows for specific control over analysis for things like wildcard terms, prefix queries, range queries, etc. For example, the following would cause the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of "{{hats*}}", and thus match on the indexed version of "{{hat}}". {code:java} {code} This works fine if using a non-managed schema (i.e. {{schema.xml}} file) OR if you use managed schema (i.e. {{managed-schema}} file) and push your schema directly to Zookeeper. However, starting with Solr 8.0, if you use the Schema API to add a {{fieldType}}, the {{multiterm}} analyzers are not persisted (only {{index}} and {{query}} analyzers are). This bug seems to have originated from LUCENE-8497, which refactored this code area substantially. The bug is caused by the managed schema being able to READ in the {{multiterm}} analyzers from the schema file, but then being unable to write them out. Since pushing the schema directly to Zookeeper only requires Solr reading them in, this bug would not have been obvious in initial testing. However, since the schema API reads in the schema file, writes an updated schema out to Zookeeper (where the bug occurs), and then reads the file back in, all of the {{multiTerm}} analyzers get stripped out. I've identified the problematic code and am looking into an appropriate fix. was: In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an explicit "{{multiterm}}" analyzer to schema f\{{ieldType}} definitions. This allows for specific control over analysis for things like wildcard terms, prefix queries, range queries, etc. For example, the following would cause the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of "{{hats*}}", and thus match on the indexed version of "{{hat}}". {code:java} {code} This works fine if using a non-managed schema (i.e. {{schema.xml}} file) OR if you use managed schema (i.e. {{managed-schema}} file) and push your schema directly to Zookeeper. However, starting with Solr 8.0, if you use the Schema API to add a {{fieldType}}, the {{multiterm}} analyzers are not persisted (only {{index}} and {{query}} analyzers are). This bug seems to have originated from LUCENE-8497, which refactored this code area substantially. The bug is caused by the managed schema being able to READ in the {{multiterm}} analyzers from the schema file, but then being unable to write them out. Since pushing the schema directly to Zookeeper only requires Solr reading them in, this bug would not have been obvious in initial testing. However, since the schema API reads in the schema file, writes an updated schema out to Zookeeper (where the bug occurs), and then reads the file back in, all of the {{multiTerm}} analyzers get stripped out. I've identified the problematic code and am looking into an appropriate fix. > Multiterm Analyzer Not Persisted in Managed Schema > -- > > Key: SOLR-14434 > URL: https://issues.apache.org/jira/browse/SOLR-14434 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Schema and Analysis >Affects Versions: 8.0, 8.1, 8.2, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1 >Reporter: Trey Grainger >Priority: Major > > In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an > explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This > allows for specific control over analysis for things like wildcard terms, > prefix queries, range queries, etc. For example, the following would cause > the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of > "{{hats*}}", and thus match on the indexed version of "{{hat}}". > {code:java} >positionIncrementGap="100" termOffsets="true" termVectors="true"> > > > > > > > >ignoreCase="true" synonyms="synonyms.txt"/> > > > > > > > > > {code} > This works fine if using a non-managed schema (i.e. {{schema.xml}} file) OR > if you use managed schema (i.e. {{managed-schema}} file) and push your schema > directly to Zookeeper. However, starting with Solr 8.0, if you use the Schema > API to add a {{fieldType}}, the
[jira] [Created] (SOLR-14434) Multiterm Analyzer Not Persisted in Managed Schema
Trey Grainger created SOLR-14434: Summary: Multiterm Analyzer Not Persisted in Managed Schema Key: SOLR-14434 URL: https://issues.apache.org/jira/browse/SOLR-14434 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Schema and Analysis Affects Versions: 8.5.1, 8.4.1, 8.5, 8.3.1, 8.4, 8.3, 8.1.1, 8.2, 8.1, 8.0 Reporter: Trey Grainger In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an explicit "{{multiterm}}" analyzer to schema f\{{ieldType}} definitions. This allows for specific control over analysis for things like wildcard terms, prefix queries, range queries, etc. For example, the following would cause the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of "{{hats*}}", and thus match on the indexed version of "{{hat}}". {code:java} {code} This works fine if using a non-managed schema (i.e. {{schema.xml}} file) OR if you use managed schema (i.e. {{managed-schema}} file) and push your schema directly to Zookeeper. However, starting with Solr 8.0, if you use the Schema API to add a {{fieldType}}, the {{multiterm}} analyzers are not persisted (only {{index}} and {{query}} analyzers are). This bug seems to have originated from LUCENE-8497, which refactored this code area substantially. The bug is caused by the managed schema being able to READ in the {{multiterm}} analyzers from the schema file, but then being unable to write them out. Since pushing the schema directly to Zookeeper only requires Solr reading them in, this bug would not have been obvious in initial testing. However, since the schema API reads in the schema file, writes an updated schema out to Zookeeper (where the bug occurs), and then reads the file back in, all of the {{multiTerm}} analyzers get stripped out. I've identified the problematic code and am looking into an appropriate fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14414) New Admin UI
[ https://issues.apache.org/jira/browse/SOLR-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091110#comment-17091110 ] Marcus Eagan commented on SOLR-14414: - I think that makes sense. I will post some pros and cons of each solution in the SIP so that everyone has the information that I have. > New Admin UI > > > Key: SOLR-14414 > URL: https://issues.apache.org/jira/browse/SOLR-14414 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (9.0) >Reporter: Marcus Eagan >Priority: Major > Attachments: QueryUX-SolrAdminUIReboot.mov > > > We have had a lengthy discussion in the mailing list about the need to build > a modern UI that is both more security and does not depend on deprecated, end > of life code. In this ticket, I intend to familiarize the community with the > efforts of the community to do just that that. While we are nearing feature > parity, but not there yet as many have suggested we could complete this task > in iterations, here is an attempt to get the ball rolling. I have mostly > worked on it in weekend nights on the occasion that I could find the time. > Angular is certainly not my specialty, and this is my first attempt at using > TypeScript besides a few brief learning exercises here and there. However, I > will be engaging experts in both of these areas for consultation as our > community tries to pull our UI into another era. > Many of the components here can improve. One or two them need to be > rewritten, and there are even at least three essential components to the app > missing, along with some tests. A couple other things missing are the V2 API, > which I found difficult to build with in this context because it is not > documented on the web. I understand that it is "self-documenting," but the > most easy-to-use APIs are still documented on the web. Maybe it is entirely > documented on the web, and I had trouble finding it. Forgive me, as that > could be an area of assistance. Another area where I need assistance is > packaging this application as a Solr package. I understand this app is not in > the right place for that today, but it can be. There are still many > improvements to be made in this Jira and certainly in this code. > The project is located in {{lucene-solr/solr/webapp2}}, where there is a > README for information on running the app. > The app can be started from the this directory with {{npm start}} for now. It > can quickly be modified to start as a part of the typical start commands as > it approaches parity. I expect there will be a lot of opinions. I welcome > them, of course. The community input should drive the project's success. > Discussion in mailing list: > https://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAF76exK-EB_tyFx0B4fBiA%3DJj8gH%3Divn2Uo6cWvMwhvzRdA3KA%40mail.gmail.com%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14414) New Admin UI
[ https://issues.apache.org/jira/browse/SOLR-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091010#comment-17091010 ] Marcus Eagan edited comment on SOLR-14414 at 4/24/20, 2:49 AM: --- Ok Tomas. I intend to agree broadly. I even filed the disabled PR. It will be reviewed and merged sooner than the UI for sure. I think that it's fine if the package is pulled in as a pinned version and included in the CI pipeline. I'm particularly not looking forward to working with Noble but I hope that we can be respectful to drive this project to completion. That way, the admin UI can iterate faster. But perhaps that should be a future state because that requires a lot more work and bureaucracy from someone here. I'm happy to do a lot of the leg work on the actual build, recruit devs to pitch in, and even host a server teaching people who to use the Admin UI. If the desire is to have it live in a separate repository, then [~janhoy] can you work on this ASAP. Jeremy, the developer who originally started the Angular project and wrote the first page of the project that I have been building upon, and I have discussed the effort and regular meetings. I think we could move really fast working together, but I won't be able to say how fast until the end of next week. On Thu, Apr 23, 2020 at 4:10 PM Tomas Eduardo Fernandez Lobbe (Jira) < was (Author: marcussorealheis): Ok Tomas. I intend to agree broadly. I even filed the disabled PR. It will be reviewed and merged sooner than the UI for sure. I think that it's fine if the package is pulled in as a pinned version and included in the CI pipeline. I'm particularly not looking forward to working with Noble but I hope that we can be respectful to drive this project to completion. That way, the admin UI can iterate faster. But perhaps that should be a future state because that requires a lot more work and bureaucracy from someone here. I'm happy to do a lot of the leg work on the actual build, recruit devs to pitch in, and even host a server teaching people who to use the Admin UI. If the desire is to have it live in a separate repository, then @Jan Høydahl can you work on this ASAP. Jeremy, the developer who originally started the Angular project and wrote the first page of the project that I have been building upon, and I have discussed the effort and regular meetings. I think we could move really fast working together, but I won't be able to say how fast until the end of next week. On Thu, Apr 23, 2020 at 4:10 PM Tomas Eduardo Fernandez Lobbe (Jira) < > New Admin UI > > > Key: SOLR-14414 > URL: https://issues.apache.org/jira/browse/SOLR-14414 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (9.0) >Reporter: Marcus Eagan >Priority: Major > Attachments: QueryUX-SolrAdminUIReboot.mov > > > We have had a lengthy discussion in the mailing list about the need to build > a modern UI that is both more security and does not depend on deprecated, end > of life code. In this ticket, I intend to familiarize the community with the > efforts of the community to do just that that. While we are nearing feature > parity, but not there yet as many have suggested we could complete this task > in iterations, here is an attempt to get the ball rolling. I have mostly > worked on it in weekend nights on the occasion that I could find the time. > Angular is certainly not my specialty, and this is my first attempt at using > TypeScript besides a few brief learning exercises here and there. However, I > will be engaging experts in both of these areas for consultation as our > community tries to pull our UI into another era. > Many of the components here can improve. One or two them need to be > rewritten, and there are even at least three essential components to the app > missing, along with some tests. A couple other things missing are the V2 API, > which I found difficult to build with in this context because it is not > documented on the web. I understand that it is "self-documenting," but the > most easy-to-use APIs are still documented on the web. Maybe it is entirely > documented on the web, and I had trouble finding it. Forgive me, as that > could be an area of assistance. Another area where I need assistance is > packaging this application as a Solr package. I understand this app is not in > the right place for that today, but it can be. There are still many > improvements to be made in this Jira and certainly in this code. > The project is located in {{lucene-solr/solr/webapp2}}, where there is a > README for information on running the app. > The app can be started from the this directory with {{npm start}} for now. It > can
[jira] [Commented] (SOLR-13289) Support for BlockMax WAND
[ https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091106#comment-17091106 ] Ishan Chattopadhyaya commented on SOLR-13289: - [~tflobbe], I was working on this last month and I'm actually much farther along on the patch than what I put here. I'll put together an updated patch by next week, and we can collaborate on this from there. WDYT? > Support for BlockMax WAND > - > > Key: SOLR-13289 > URL: https://issues.apache.org/jira/browse/SOLR-13289 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-13289.patch, SOLR-13289.patch > > > LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to > expose this via Solr. When enabled, the numFound returned will not be exact. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13325) Add a collection selector to triggers
[ https://issues.apache.org/jira/browse/SOLR-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091090#comment-17091090 ] Shalin Shekhar Mangar commented on SOLR-13325: -- I'm looking at this again. I think we should change the syntax slightly and get rid of the {{#policy}} key name. Instead, this can operate on any collection property such as policy or configName or autoAddReplicas etc that are part of the collection state. What's slightly complicating is that there are additional collection properties (stored in collectionprops.json). I don't intend to support that at the moment. On a related note, collection props have write APIs but no read APIs which severely limit the usefulness of that feature? That's something we should fix separately. Now once we have this working, it reduces the need for a separate AutoAddReplicasPlanAction because you can get the same behavior by setting the following in ComputePlanAction: {code} "collection": {"autoAddReplicas": "true"} {code} However, there is a difference between the current implementation of "collections" in ComputePlanAction and how AutoAddReplicasPlanAction works which is that the former filters out suggestions of non-matching collections but the latter pushes down the collection hint to the policy engine so that it doesn't even compute suggestions for non-matching collections in the first place. The latter is obviously more efficient. The one thing we have to be careful about is that the list of matching collections should be evaluated lazily when the action is triggered instead of early in the init method so that it can *see* the changes in the cluster state. > Add a collection selector to triggers > - > > Key: SOLR-13325 > URL: https://issues.apache.org/jira/browse/SOLR-13325 > Project: Solr > Issue Type: Improvement > Components: AutoScaling >Reporter: Shalin Shekhar Mangar >Priority: Major > Fix For: master (9.0), 8.2 > > > Similar to SOLR-13273, it'd be nice to have a collection selector that > applies to triggers. An example use-case would be to selectively add replicas > on new nodes for certain collections only. > Here is a selector that returns collections that match the given collection > property/value pair: > {code} > "collection": {"property_name": "property_value"} > {code} > Here's another selector that returns collections that have the given policy > applied > {code} > "collection": {"#policy": "policy_name"} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14414) New Admin UI
[ https://issues.apache.org/jira/browse/SOLR-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091022#comment-17091022 ] Jan Høydahl commented on SOLR-14414: I prefer if we can work on SIP level for now, expand on and understand the options, continue iterating mainly in the email-thread, not in PR or JIRA, understand the pro/con of various options, and finally end up with a SIP proposal that has broad support. Please understand, I have not *decided* on either a separate repo/subproject or on Vue, I just try help guide the process of making sure we all consider our options before diving into code and getting veto'ed. And as always, keeping things simple in first phase is always a good idea, i.e. forget about package manager and running npm through Java for now. > New Admin UI > > > Key: SOLR-14414 > URL: https://issues.apache.org/jira/browse/SOLR-14414 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (9.0) >Reporter: Marcus Eagan >Priority: Major > Attachments: QueryUX-SolrAdminUIReboot.mov > > > We have had a lengthy discussion in the mailing list about the need to build > a modern UI that is both more security and does not depend on deprecated, end > of life code. In this ticket, I intend to familiarize the community with the > efforts of the community to do just that that. While we are nearing feature > parity, but not there yet as many have suggested we could complete this task > in iterations, here is an attempt to get the ball rolling. I have mostly > worked on it in weekend nights on the occasion that I could find the time. > Angular is certainly not my specialty, and this is my first attempt at using > TypeScript besides a few brief learning exercises here and there. However, I > will be engaging experts in both of these areas for consultation as our > community tries to pull our UI into another era. > Many of the components here can improve. One or two them need to be > rewritten, and there are even at least three essential components to the app > missing, along with some tests. A couple other things missing are the V2 API, > which I found difficult to build with in this context because it is not > documented on the web. I understand that it is "self-documenting," but the > most easy-to-use APIs are still documented on the web. Maybe it is entirely > documented on the web, and I had trouble finding it. Forgive me, as that > could be an area of assistance. Another area where I need assistance is > packaging this application as a Solr package. I understand this app is not in > the right place for that today, but it can be. There are still many > improvements to be made in this Jira and certainly in this code. > The project is located in {{lucene-solr/solr/webapp2}}, where there is a > README for information on running the app. > The app can be started from the this directory with {{npm start}} for now. It > can quickly be modified to start as a part of the typical start commands as > it approaches parity. I expect there will be a lot of opinions. I welcome > them, of course. The community input should drive the project's success. > Discussion in mailing list: > https://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAF76exK-EB_tyFx0B4fBiA%3DJj8gH%3Divn2Uo6cWvMwhvzRdA3KA%40mail.gmail.com%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14414) New Admin UI
[ https://issues.apache.org/jira/browse/SOLR-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091010#comment-17091010 ] Marcus Eagan commented on SOLR-14414: - Ok Tomas. I intend to agree broadly. I even filed the disabled PR. It will be reviewed and merged sooner than the UI for sure. I think that it's fine if the package is pulled in as a pinned version and included in the CI pipeline. I'm particularly not looking forward to working with Noble but I hope that we can be respectful to drive this project to completion. That way, the admin UI can iterate faster. But perhaps that should be a future state because that requires a lot more work and bureaucracy from someone here. I'm happy to do a lot of the leg work on the actual build, recruit devs to pitch in, and even host a server teaching people who to use the Admin UI. If the desire is to have it live in a separate repository, then @Jan Høydahl can you work on this ASAP. Jeremy, the developer who originally started the Angular project and wrote the first page of the project that I have been building upon, and I have discussed the effort and regular meetings. I think we could move really fast working together, but I won't be able to say how fast until the end of next week. On Thu, Apr 23, 2020 at 4:10 PM Tomas Eduardo Fernandez Lobbe (Jira) < > New Admin UI > > > Key: SOLR-14414 > URL: https://issues.apache.org/jira/browse/SOLR-14414 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (9.0) >Reporter: Marcus Eagan >Priority: Major > Attachments: QueryUX-SolrAdminUIReboot.mov > > > We have had a lengthy discussion in the mailing list about the need to build > a modern UI that is both more security and does not depend on deprecated, end > of life code. In this ticket, I intend to familiarize the community with the > efforts of the community to do just that that. While we are nearing feature > parity, but not there yet as many have suggested we could complete this task > in iterations, here is an attempt to get the ball rolling. I have mostly > worked on it in weekend nights on the occasion that I could find the time. > Angular is certainly not my specialty, and this is my first attempt at using > TypeScript besides a few brief learning exercises here and there. However, I > will be engaging experts in both of these areas for consultation as our > community tries to pull our UI into another era. > Many of the components here can improve. One or two them need to be > rewritten, and there are even at least three essential components to the app > missing, along with some tests. A couple other things missing are the V2 API, > which I found difficult to build with in this context because it is not > documented on the web. I understand that it is "self-documenting," but the > most easy-to-use APIs are still documented on the web. Maybe it is entirely > documented on the web, and I had trouble finding it. Forgive me, as that > could be an area of assistance. Another area where I need assistance is > packaging this application as a Solr package. I understand this app is not in > the right place for that today, but it can be. There are still many > improvements to be made in this Jira and certainly in this code. > The project is located in {{lucene-solr/solr/webapp2}}, where there is a > README for information on running the app. > The app can be started from the this directory with {{npm start}} for now. It > can quickly be modified to start as a part of the typical start commands as > it approaches parity. I expect there will be a lot of opinions. I welcome > them, of course. The community input should drive the project's success. > Discussion in mailing list: > https://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAF76exK-EB_tyFx0B4fBiA%3DJj8gH%3Divn2Uo6cWvMwhvzRdA3KA%40mail.gmail.com%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14414) New Admin UI
[ https://issues.apache.org/jira/browse/SOLR-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091006#comment-17091006 ] Tomas Eduardo Fernandez Lobbe commented on SOLR-14414: -- Thanks for working on this Marcus. I personally agree 100% with what [~gus] said on the email thread. * The UI is critical for Solr. if it's a package for modularization purposes may be fine (I didn't look much into the packages design yet), but I think it needs to be on by default. Having the ability to "disable" it could be nice too, but not as important to me, I'd always want it on. * It's very difficult to have the UI live in a separate repo and not fall out of sync. Can't be compared with Kibana or any other enterprise product. I don't think this is a good idea. * I'd love for the UI to run in the same process as Solr and not have to start/monitor another app. > New Admin UI > > > Key: SOLR-14414 > URL: https://issues.apache.org/jira/browse/SOLR-14414 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (9.0) >Reporter: Marcus Eagan >Priority: Major > Attachments: QueryUX-SolrAdminUIReboot.mov > > > We have had a lengthy discussion in the mailing list about the need to build > a modern UI that is both more security and does not depend on deprecated, end > of life code. In this ticket, I intend to familiarize the community with the > efforts of the community to do just that that. While we are nearing feature > parity, but not there yet as many have suggested we could complete this task > in iterations, here is an attempt to get the ball rolling. I have mostly > worked on it in weekend nights on the occasion that I could find the time. > Angular is certainly not my specialty, and this is my first attempt at using > TypeScript besides a few brief learning exercises here and there. However, I > will be engaging experts in both of these areas for consultation as our > community tries to pull our UI into another era. > Many of the components here can improve. One or two them need to be > rewritten, and there are even at least three essential components to the app > missing, along with some tests. A couple other things missing are the V2 API, > which I found difficult to build with in this context because it is not > documented on the web. I understand that it is "self-documenting," but the > most easy-to-use APIs are still documented on the web. Maybe it is entirely > documented on the web, and I had trouble finding it. Forgive me, as that > could be an area of assistance. Another area where I need assistance is > packaging this application as a Solr package. I understand this app is not in > the right place for that today, but it can be. There are still many > improvements to be made in this Jira and certainly in this code. > The project is located in {{lucene-solr/solr/webapp2}}, where there is a > README for information on running the app. > The app can be started from the this directory with {{npm start}} for now. It > can quickly be modified to start as a part of the typical start commands as > it approaches parity. I expect there will be a lot of opinions. I welcome > them, of course. The community input should drive the project's success. > Discussion in mailing list: > https://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAF76exK-EB_tyFx0B4fBiA%3DJj8gH%3Divn2Uo6cWvMwhvzRdA3KA%40mail.gmail.com%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14414) New Admin UI
[ https://issues.apache.org/jira/browse/SOLR-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Eduardo Fernandez Lobbe updated SOLR-14414: - Description: We have had a lengthy discussion in the mailing list about the need to build a modern UI that is both more security and does not depend on deprecated, end of life code. In this ticket, I intend to familiarize the community with the efforts of the community to do just that that. While we are nearing feature parity, but not there yet as many have suggested we could complete this task in iterations, here is an attempt to get the ball rolling. I have mostly worked on it in weekend nights on the occasion that I could find the time. Angular is certainly not my specialty, and this is my first attempt at using TypeScript besides a few brief learning exercises here and there. However, I will be engaging experts in both of these areas for consultation as our community tries to pull our UI into another era. Many of the components here can improve. One or two them need to be rewritten, and there are even at least three essential components to the app missing, along with some tests. A couple other things missing are the V2 API, which I found difficult to build with in this context because it is not documented on the web. I understand that it is "self-documenting," but the most easy-to-use APIs are still documented on the web. Maybe it is entirely documented on the web, and I had trouble finding it. Forgive me, as that could be an area of assistance. Another area where I need assistance is packaging this application as a Solr package. I understand this app is not in the right place for that today, but it can be. There are still many improvements to be made in this Jira and certainly in this code. The project is located in {{lucene-solr/solr/webapp2}}, where there is a README for information on running the app. The app can be started from the this directory with {{npm start}} for now. It can quickly be modified to start as a part of the typical start commands as it approaches parity. I expect there will be a lot of opinions. I welcome them, of course. The community input should drive the project's success. Discussion in mailing list: https://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAF76exK-EB_tyFx0B4fBiA%3DJj8gH%3Divn2Uo6cWvMwhvzRdA3KA%40mail.gmail.com%3E was: We have had a lengthy discussion in the mailing list about the need to build a modern UI that is both more security and does not depend on deprecated, end of life code. In this ticket, I intend to familiarize the community with the efforts of the community to do just that that. While we are nearing feature parity, but not there yet as many have suggested we could complete this task in iterations, here is an attempt to get the ball rolling. I have mostly worked on it in weekend nights on the occasion that I could find the time. Angular is certainly not my specialty, and this is my first attempt at using TypeScript besides a few brief learning exercises here and there. However, I will be engaging experts in both of these areas for consultation as our community tries to pull our UI into another era. Many of the components here can improve. One or two them need to be rewritten, and there are even at least three essential components to the app missing, along with some tests. A couple other things missing are the V2 API, which I found difficult to build with in this context because it is not documented on the web. I understand that it is "self-documenting," but the most easy-to-use APIs are still documented on the web. Maybe it is entirely documented on the web, and I had trouble finding it. Forgive me, as that could be an area of assistance. Another area where I need assistance is packaging this application as a Solr package. I understand this app is not in the right place for that today, but it can be. There are still many improvements to be made in this Jira and certainly in this code. The project is located in {{lucene-solr/solr/webapp2}}, where there is a README for information on running the app. The app can be started from the this directory with {{npm start}} for now. It can quickly be modified to start as a part of the typical start commands as it approaches parity. I expect there will be a lot of opinions. I welcome them, of course. The community input should drive the project's success. > New Admin UI > > > Key: SOLR-14414 > URL: https://issues.apache.org/jira/browse/SOLR-14414 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Affects Versions: master (9.0) >Reporter: Marcus Eagan >Priority: Major > Attachments:
[GitHub] [lucene-solr] dsmiley opened a new pull request #1453: SOLR-14433: Improve SolrShardReporter default metrics list
dsmiley opened a new pull request #1453: URL: https://github.com/apache/lucene-solr/pull/1453 https://issues.apache.org/jira/browse/SOLR-14433# Now includes TLOG and UPDATE./update. These were small bugs to begin with but from user perspective this is an incremental improvement. CC @sigram This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14433) Improve default metrics collected by SolrShardReporter
[ https://issues.apache.org/jira/browse/SOLR-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-14433: Description: SolrShardReporter's default metric filters have two problems: * "Tlog.\*" should be "TLOG.\*" (a bug) * "UPDATE ./update/.\*requests" should be "UPDATE ./update.\*requests" (notice removal of one '/') Today, the first was fixed and tagged to the issue that incorrectly made this change – SOLR-12690. What remains is the other. CC [~ab] was: SolrShardReporter's default metric filters have two problems: * "Tlog.*" should be "TLOG.*" (a bug) * "UPDATE\\./update/.*requests" should be "UPDATE\\./update.*requests" (notice removal of one '/') Today, the first was fixed and tagged to the issue that incorrectly made this change – SOLR-12690. What remains is the other. CC [~ab] > Improve default metrics collected by SolrShardReporter > -- > > Key: SOLR-14433 > URL: https://issues.apache.org/jira/browse/SOLR-14433 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > > SolrShardReporter's default metric filters have two problems: > * "Tlog.\*" should be "TLOG.\*" (a bug) > * "UPDATE > ./update/.\*requests" should be "UPDATE > ./update.\*requests" (notice removal of one '/') > Today, the first was fixed and tagged to the issue that incorrectly made this > change – SOLR-12690. What remains is the other. > CC [~ab] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14433) Improve default metrics collected by SolrShardReporter
David Smiley created SOLR-14433: --- Summary: Improve default metrics collected by SolrShardReporter Key: SOLR-14433 URL: https://issues.apache.org/jira/browse/SOLR-14433 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: metrics Reporter: David Smiley Assignee: David Smiley SolrShardReporter's default metric filters have two problems: * "Tlog.*" should be "TLOG.*" (a bug) * "UPDATE\\./update/.*requests" should be "UPDATE\\./update.*requests" (notice removal of one '/') Today, the first was fixed and tagged to the issue that incorrectly made this change – SOLR-12690. What remains is the other. CC [~ab] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14430) Authorization plugins should check roles from request
[ https://issues.apache.org/jira/browse/SOLR-14430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090943#comment-17090943 ] Jan Høydahl commented on SOLR-14430: The JWTAuth plugin wraps the user principal in the class {{JWTPrincipalWithUserRoles}} which implements {{org.apache.solr.security.VerifiedUserRoles}} {code:java} Set getVerifiedRoles(); {code} Currently that class is not used other than in tests but my next idea was to implement SOLR-12131 which adds a new class [ExternalRoleRuleBasedAuthorizationPlugin|https://github.com/apache/lucene-solr/pull/341/files#diff-1605e924a4ccb6bddd1f776e54b8f2cd] which reads the roles from the request (VerifiedUserRoles) instead of from a user->role mapping. Hope you can review my PR and tell what you think about that approach. > Authorization plugins should check roles from request > - > > Key: SOLR-14430 > URL: https://issues.apache.org/jira/browse/SOLR-14430 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: security >Reporter: Mike Drob >Priority: Major > > The AuthorizationContext exposes {{getUserPrincipal}} to the plugin, but it > does not allow the plugin to interrogate the request for {{isUserInRole}}. If > we trust the request enough to get a principal from it, then we should trust > it enough to ask about roles, as those could have been defined and verified > by an authentication plugin. > This model would be an alternative to the current model where > RuleBasedAuthorizationPlugin maintains its own user->role mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1450: SOLR-14429: Convert XXX.txt files to proper XXX.md
dsmiley commented on a change in pull request #1450: URL: https://github.com/apache/lucene-solr/pull/1450#discussion_r414110471 ## File path: solr/example/files/README.md ## @@ -111,38 +120,46 @@ For further explanations, see the frequently asked questions at the end of the g * Another way to query the index is by manipulating the URL in your address bar once in the browse view. * i.e. : [http://localhost:8983/solr/files/browse?q=Lucene](http://localhost:8983/solr/files/browse?q=Lucene) - -##FAQs + +## FAQs * Why use -d when creating a core? * -d specifies a specific configuration to use. This example as a configuration tuned for indexing and query rich text files. * How do I delete a core? - * To delete a core (i.e. files), you can enter the following in your command shell: - bin/solr delete -c files - - * You should see the following output: + * To delete a core (i.e. files), you can enter the following in your command shell: - Deleting core 'files' using command: - http://localhost:8983/solr/admin/cores?action=UNLOAD=files=true=true=true +``` +bin/solr delete -c files +``` + + * You should see the following output: + + Deleting core 'files' using command: + + ``` + http://localhost:8983/solr/admin/cores?action=UNLOAD=files=true=true=true - {"responseHeader":{ - "status":0, - "QTime":19}} - - * This calls the Solr core admin handler, "UNLOAD", and the parameters "deleteDataDir" and "deleteInstanceDir" to ensure that all data associated with core is also removed + {"responseHeader":{ + "status":0, + "QTime":19}} +``` + + * This calls the Solr core admin handler, "UNLOAD", and the parameters "deleteDataDir" and "deleteInstanceDir" to ensure that all data associated with core is also removed * How can I change the /browse UI? - The primary templates are under example/files/conf/velocity. **In order to edit those files in place (without having to - re-create or patch a core/collection with an updated configuration)**, Solr can be started with a special system property - set to the _absolute_ path to the conf/velocity directory, like this: - - bin/solr start -Dvelocity.template.base.dir=/example/files/conf/velocity/ +The primary templates are under example/files/conf/velocity. **In order to edit those files in place (without having to +re-create or patch a core/collection with an updated configuration)**, Solr can be started with a special system property +set to the _absolute_ path to the conf/velocity directory, like this: -If you want to adjust the browse templates for an existing collection, edit the core’s configuration -under server/solr/files/conf/velocity. +``` +bin/solr start -Dvelocity.template.base.dir=/example/files/conf/velocity/ +``` + +If you want to adjust the browse templates for an existing collection, edit the core’s configuration +under server/solr/files/conf/velocity. === Review comment: At least on GitHub, this isn't showing as markup. (I assume this line is at the end directly about the provenance info) ## File path: solr/example/README.md ## @@ -1,57 +1,74 @@ -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. + Solr example This directory contains Solr examples. Each example is contained in a separate directory. To run a specific example, do: +``` bin/solr -e where is one of: cloud: SolrCloud example dih : Data Import Handler (rdbms, mail, atom, tika) schemaless : Schema-less example (schema is inferred from data during indexing) techproducts : Kitchen sink example providing comprehensive examples of Solr features +``` For instance, if you want to run the Solr Data Import Handler example, do: +``` bin/solr -e dih - +``` + To see all the options available when starting Solr: +``` bin/solr
[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness
[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090931#comment-17090931 ] Michael Gibney commented on SOLR-13132: --- Thanks, Hoss! Some initial responses re: some of the nocommit comments from [8fcd6271b6|https://github.com/magibney/lucene-solr/commit/8fcd6271b684da589ddae8e4319b564249ee76cb]: {code:java} // nocommit: for that matter: can we eliminate SweepingAcc as a class, // nocommit: and just roll that specific logic into CountSlotAcc? // nocommit: IIUC: there should only ever be a single SweepingAcc instance, // nocommit: and callers should never use/instantiate a SweepingAcc w/o using 'countAcc' ... correct? {code} That would definitely work. However, my initial inclination is to prefer leaving {{SweepingAcc}} as a separate class, because {{CountSlotAcc}} currently clearly does one specific thing, and folding the {{SweepingAcc}} functionality (which could be relatively complex -- potentially deduping {{DocSets}}, etc...) would mix in a different type of functionality that's only relevant in some of the contexts where {{CountSlotAcc}} is currently used. Accessing {{SweepingAcc}} via {{countAcc.getBaseSweepingAcc()}} strikes a balance of using {{countAcc}} as a coordination point for related but distinct functionality ... perhaps a cleaner separation of concerns? {code:java} // nocommit: since 'countAcc' is now the special place all sweeping is tracked, it seems // nocommit: unneccessary (and uneccessarly confusing) for it to also be a 'SweepableSlotAcc' // nocommit: any reason not to just remove this? abstract class CountSlotAcc extends SlotAcc implements ReadOnlyCountSlotAcc /*, SweepableSlotAcc ... nocommit... */ { ... // nocommit: CountSlotAcc no longer implements SweepableSlotAcc... // @Override // public CountSlotAcc registerSweepingAccs(SweepingAcc baseSweepingAcc) { // baseSweepingAcc.add(new SweepCountAccStruct(fcontext.base, false, this, this)); // baseSweepingAcc.registerMapping(this, this); // return null; // } {code} True, I'm glad you mentioned this. I left this in partly to illustrate another concrete case (aside from SKG) for which sweep collection might be useful. In its current state it admittedly seems a bit contrived, but my thinking was: although {{countAcc}} is currently the one and only {{CountSlotAcc}}, used to accumulate counts over the base domain {{DocSet}} only, there could be cases where extra {{CountSlotAccs}} are used more directly (e.g. as part of stats collection, analogous to how they're used indirectly for SKG sweep collection). In such a case, these "non-base" {{CountSlotAccs}} would respond as implemented in the above {{registerSweepingAccs(...)}} method. More practically speaking, it also occurred to me that one promising use of sweep collection would be to accumulate counts over all subfacet domains in a single sweep (for nested/sub-facets, pivot facets, what-have-you) -- not sure if that would be directly accommodated by the current incarnation of the "sweeping", but it might be a use case to consider. With all that said, I'm not at all opposed to removing the {{SweepableSlotAcc}} interface from {{CountSlotAcc}}; it should anyway be straightforward to add back in later should the need arise. > Improve JSON "terms" facet performance when sorted by relatedness > -- > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module >Affects Versions: 7.4, master (9.0) >Reporter: Michael Gibney >Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached
[jira] [Resolved] (LUCENE-9342) Collector's totalHitsThreshold should not be lower than numHits
[ https://issues.apache.org/jira/browse/LUCENE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Eduardo Fernandez Lobbe resolved LUCENE-9342. --- Fix Version/s: 8.6 master (9.0) Resolution: Fixed > Collector's totalHitsThreshold should not be lower than numHits > --- > > Key: LUCENE-9342 > URL: https://issues.apache.org/jira/browse/LUCENE-9342 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomas Eduardo Fernandez Lobbe >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Minor > Fix For: master (9.0), 8.6 > > > While looking at SOLR-13289 I noticed this situation. If I create a collector > with {{numHits}} greater than {{totalHitsThreshold}}, and the number of hits > in the query is somewhere between those two numbers, the collector’s > {{totalHitRelation}} will be {{TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO}}, > however the count will be accurate in this case. While this doesn't violate > the current contract, the {{totalHitRelation}} could be changed to > {{TotalHits.Relation.EQUAL_TO}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9342) Collector's totalHitsThreshold should not be lower than numHits
[ https://issues.apache.org/jira/browse/LUCENE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090919#comment-17090919 ] ASF subversion and git services commented on LUCENE-9342: - Commit edd00d933f6144293d74fc727fec6190f28c57a0 in lucene-solr's branch refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=edd00d9 ] LUCENE-9342: Collector's totalHitsThreshold should not be lower than numHits (#1448) Use the maximum of the two, this is so that relation is EQUAL_TO in the case of the number of hits in a query is less than the collector's numHits > Collector's totalHitsThreshold should not be lower than numHits > --- > > Key: LUCENE-9342 > URL: https://issues.apache.org/jira/browse/LUCENE-9342 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomas Eduardo Fernandez Lobbe >Priority: Minor > > While looking at SOLR-13289 I noticed this situation. If I create a collector > with {{numHits}} greater than {{totalHitsThreshold}}, and the number of hits > in the query is somewhere between those two numbers, the collector’s > {{totalHitRelation}} will be {{TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO}}, > however the count will be accurate in this case. While this doesn't violate > the current contract, the {{totalHitRelation}} could be changed to > {{TotalHits.Relation.EQUAL_TO}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9342) Collector's totalHitsThreshold should not be lower than numHits
[ https://issues.apache.org/jira/browse/LUCENE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Eduardo Fernandez Lobbe reassigned LUCENE-9342: - Assignee: Tomas Eduardo Fernandez Lobbe > Collector's totalHitsThreshold should not be lower than numHits > --- > > Key: LUCENE-9342 > URL: https://issues.apache.org/jira/browse/LUCENE-9342 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomas Eduardo Fernandez Lobbe >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Minor > > While looking at SOLR-13289 I noticed this situation. If I create a collector > with {{numHits}} greater than {{totalHitsThreshold}}, and the number of hits > in the query is somewhere between those two numbers, the collector’s > {{totalHitRelation}} will be {{TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO}}, > however the count will be accurate in this case. While this doesn't violate > the current contract, the {{totalHitRelation}} could be changed to > {{TotalHits.Relation.EQUAL_TO}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090877#comment-17090877 ] Mike Drob commented on SOLR-14428: -- Adding some notes here as I've been diving through this... The queryResultsCache is built in [SolrIndexSearcher|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L278], and uses a {{QueryResultKey}}, which stores the full {{Query}} object, which in this case means the FuzzyQuery with the automata already built. However, we don't really need to store the automata that we built, since they aren't used for the equality comparison. Maybe there is an elegant way to store a stripped down FuzzyQuery in the cache? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14432) SOLR Dataimport hanlder going to idle after some time
[ https://issues.apache.org/jira/browse/SOLR-14432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi kumar updated SOLR-14432: -- Description: I configured data import handler to process bulk PDF documents. after process 21000 documents. Process going to idle and not processing all the documents. When i see the log observed below things (attached log for your reference). Please let me know is there anyway that i can ignore this issue or any setting do i need to update. Error: 2020-04-23 18:39:55.749 INFO (qtp215219944-24) [ x:DMS] o.a.s.c.S.Request [DMS] webapp=/solr path=/dataimport params=\{indent=on=json=status&_=1587664092295} status=0 QTime=0 2020-04-23 18:39:55.972 WARN (Thread-14) [ ] o.a.p.p.COSParser T{color:#de350b}he end of the stream is out of range, using workaround to read the stream, stream start position: 4748210, length: 2007324, expected end position: 6755534{color} 2020-04-23 18:39:55.976 WARN (Thread-14) [ ] o.a.p.p.COSParser Removed null object COSObject\{50, 0} from pages dictionary 2020-04-23 18:39:55.976 WARN (Thread-14) [ ] o.a.p.p.COSParser Removed null object COSObject\{60, 0} from pages dictionary 2020-04-23 18:39:55.997 {color:#de350b}ERROR (Thread-14) [ ] o.a.p.c.o.s.SetGraphicsStateParameters name for 'gs' operator not found in resources: /R7{color} {color:#172b4d}{color:#de350b}Regards,{color}{color} {color:#172b4d}{color:#de350b}Ravi kumar{color}{color} was: I configured data import handler to process bulk PDF documents. after process 21000 documents. Process going to idle and not processing all the documents. When i see the log observed below things (attached log for your reference). 2020-04-23 18:39:55.749 INFO (qtp215219944-24) [ x:DMS] o.a.s.c.S.Request [DMS] webapp=/solr path=/dataimport params=\{indent=on=json=status&_=1587664092295} status=0 QTime=0 2020-04-23 18:39:55.972 WARN (Thread-14) [ ] o.a.p.p.COSParser T{color:#de350b}he end of the stream is out of range, using workaround to read the stream, stream start position: 4748210, length: 2007324, expected end position: 6755534{color} 2020-04-23 18:39:55.976 WARN (Thread-14) [ ] o.a.p.p.COSParser Removed null object COSObject\{50, 0} from pages dictionary 2020-04-23 18:39:55.976 WARN (Thread-14) [ ] o.a.p.p.COSParser Removed null object COSObject\{60, 0} from pages dictionary 2020-04-23 18:39:55.997 {color:#de350b}ERROR (Thread-14) [ ] o.a.p.c.o.s.SetGraphicsStateParameters name for 'gs' operator not found in resources: /R7{color} > SOLR Dataimport hanlder going to idle after some time > - > > Key: SOLR-14432 > URL: https://issues.apache.org/jira/browse/SOLR-14432 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler >Affects Versions: 8.5.1 > Environment: Windows >Reporter: Ravi kumar >Priority: Major > Labels: dataimportHandler, solr > Attachments: solr.log > > > I configured data import handler to process bulk PDF documents. after > process 21000 documents. Process going to idle and not processing all the > documents. > When i see the log observed below things (attached log for your reference). > Please let me know is there anyway that i can ignore this issue or any > setting do i need to update. > Error: > 2020-04-23 18:39:55.749 INFO (qtp215219944-24) [ x:DMS] o.a.s.c.S.Request > [DMS] webapp=/solr path=/dataimport > params=\{indent=on=json=status&_=1587664092295} status=0 QTime=0 > 2020-04-23 18:39:55.972 WARN (Thread-14) [ ] o.a.p.p.COSParser > T{color:#de350b}he end of the stream is out of range, using workaround to > read the stream, stream start position: 4748210, length: 2007324, expected > end position: 6755534{color} > 2020-04-23 18:39:55.976 WARN (Thread-14) [ ] o.a.p.p.COSParser Removed null > object COSObject\{50, 0} from pages dictionary > 2020-04-23 18:39:55.976 WARN (Thread-14) [ ] o.a.p.p.COSParser Removed null > object COSObject\{60, 0} from pages dictionary > 2020-04-23 18:39:55.997 {color:#de350b}ERROR (Thread-14) [ ] > o.a.p.c.o.s.SetGraphicsStateParameters name for 'gs' operator not found in > resources: /R7{color} > > {color:#172b4d}{color:#de350b}Regards,{color}{color} > {color:#172b4d}{color:#de350b}Ravi kumar{color}{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14432) SOLR Dataimport hanlder going to idle after some time
Ravi kumar created SOLR-14432: - Summary: SOLR Dataimport hanlder going to idle after some time Key: SOLR-14432 URL: https://issues.apache.org/jira/browse/SOLR-14432 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: contrib - DataImportHandler Affects Versions: 8.5.1 Environment: Windows Reporter: Ravi kumar Attachments: solr.log I configured data import handler to process bulk PDF documents. after process 21000 documents. Process going to idle and not processing all the documents. When i see the log observed below things (attached log for your reference). 2020-04-23 18:39:55.749 INFO (qtp215219944-24) [ x:DMS] o.a.s.c.S.Request [DMS] webapp=/solr path=/dataimport params=\{indent=on=json=status&_=1587664092295} status=0 QTime=0 2020-04-23 18:39:55.972 WARN (Thread-14) [ ] o.a.p.p.COSParser T{color:#de350b}he end of the stream is out of range, using workaround to read the stream, stream start position: 4748210, length: 2007324, expected end position: 6755534{color} 2020-04-23 18:39:55.976 WARN (Thread-14) [ ] o.a.p.p.COSParser Removed null object COSObject\{50, 0} from pages dictionary 2020-04-23 18:39:55.976 WARN (Thread-14) [ ] o.a.p.p.COSParser Removed null object COSObject\{60, 0} from pages dictionary 2020-04-23 18:39:55.997 {color:#de350b}ERROR (Thread-14) [ ] o.a.p.c.o.s.SetGraphicsStateParameters name for 'gs' operator not found in resources: /R7{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14431) SegmentsInfoRequestHandler.java does not release IndexWriter
Tiziano Degaetano created SOLR-14431: Summary: SegmentsInfoRequestHandler.java does not release IndexWriter Key: SOLR-14431 URL: https://issues.apache.org/jira/browse/SOLR-14431 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Admin UI Affects Versions: 8.5.1, 8.1.1 Reporter: Tiziano Degaetano If withCoreInfo is false iwRef.decref() will not be called to release the reader lock, preventing any further writer locks. https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144 Line 130 should be moved inside the if statement L144. [~ab] FYI -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9342) Collector's totalHitsThreshold should not be lower than numHits
[ https://issues.apache.org/jira/browse/LUCENE-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090849#comment-17090849 ] ASF subversion and git services commented on LUCENE-9342: - Commit a11b78e06a5947ffb43a9b66b37033ebe64753e0 in lucene-solr's branch refs/heads/master from Tomas Eduardo Fernandez Lobbe [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a11b78e ] LUCENE-9342: Collector's totalHitsThreshold should not be lower than numHits (#1448) Use the maximum of the two, this is so that relation is EQUAL_TO in the case of the number of hits in a query is less than the collector's numHits > Collector's totalHitsThreshold should not be lower than numHits > --- > > Key: LUCENE-9342 > URL: https://issues.apache.org/jira/browse/LUCENE-9342 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomas Eduardo Fernandez Lobbe >Priority: Minor > > While looking at SOLR-13289 I noticed this situation. If I create a collector > with {{numHits}} greater than {{totalHitsThreshold}}, and the number of hits > in the query is somewhere between those two numbers, the collector’s > {{totalHitRelation}} will be {{TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO}}, > however the count will be accurate in this case. While this doesn't violate > the current contract, the {{totalHitRelation}} could be changed to > {{TotalHits.Relation.EQUAL_TO}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on issue #1450: SOLR-14429: Convert XXX.txt files to proper XXX.md
madrob commented on issue #1450: URL: https://github.com/apache/lucene-solr/pull/1450#issuecomment-618573464 Do we need to update anything in `rat-sources.gradle` or `check-source-paterns.groovy`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob opened a new pull request #1452: Move audit logging docs under AAA section
madrob opened a new pull request #1452: URL: https://github.com/apache/lucene-solr/pull/1452 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on issue #1451: LUCENE-9345: Separate MergeSchedulder from IndexWriter
s1monw commented on issue #1451: URL: https://github.com/apache/lucene-solr/pull/1451#issuecomment-618546044 Thanks @jpountz - I was wondering if we should break this API in 8.6 already. it's very expert IMO. /cc @mikemccand This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on a change in pull request #1451: LUCENE-9345: Separate MergeSchedulder from IndexWriter
s1monw commented on a change in pull request #1451: URL: https://github.com/apache/lucene-solr/pull/1451#discussion_r413998721 ## File path: lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java ## @@ -516,18 +519,18 @@ public synchronized void merge(IndexWriter writer, MergeTrigger trigger) throws if (verbose()) { message("now merge"); - message(" index: " + writer.segString()); + message(" index(source): " + mergeSource.toString()); Review comment: I added it on purpose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-12778) Support encrypted password for ZK cred/ACL providers
[ https://issues.apache.org/jira/browse/SOLR-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-12778: -- Attachment: SOLR-12778.patch Status: Open (was: Open) I'm attaching a patch that starts to flesh out support for a new "{{zkDigestEncryptFile}}" option used by both {{VMParamsAllAndReadonlyDigestZkACLProvider}} and {{VMParamsSingleSetCredentialsDigestZkCredentialsProvider}} to decrypt all the username/password options they read if specified. The patch also includes a new {{public static String decodeAES(String base64CipherTxt, File encryptFile)}} method in {{CryptoKeys}} wrapping the existing {{decodeAES(String base64CipherTxt, String pwd)}} to simplify the common code of overhead for plugins like this (but i did not refactor the existing File handling code from DIH because it has a lot of code smells i didn't want to propogate: assuming limits on the file size, calling {{new String(byte[])}}, etc...) Unfortunately this patch doesn't work at the moment because the {{CryptoKeys}} class is in solr-core and these plugins live in solr-solrj. I know there has ben a lot of concern about hte size & dependencies of solrj, so i'm not sure how people will/would feel about migrating CryptoKeys into solrj ... i think it can be done w/o increasing the ivy dependencies, but i have not yet attempted. > Support encrypted password for ZK cred/ACL providers > > > Key: SOLR-12778 > URL: https://issues.apache.org/jira/browse/SOLR-12778 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Reporter: Jan Høydahl >Priority: Major > Attachments: SOLR-12778.patch > > > The {{VMParamsSingleSetCredentialsDigestZkCredentialsProvider}} takes a > {{zkDigestPassword}} in as a plain-text JVM param, and the > {{VMParamsAllAndReadonlyDigestZkACLProvider}} takes both {{zkDigestPassword}} > and {{zkDigestReadonlyPassword}}. > Propose to give an option to encrypt these password using the same mechanism > as DIH does: > # Add a new VM param "zkDigestPasswordEncryptionKeyFile" which is a path to > a file holding the encryption key > # Store an encryption key in above mentioned file and restrict access to > this file so only Solr user can read it. > # Encrypt the ZK passwords using the encryption key and provide the > encrypted password in place of the plaintext one > We could also create a utility command that takes the magic out of encrypting > the pw: > {noformat} > bin/solr util encrypt [-keyfile ] {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090733#comment-17090733 ] Alan Woodward commented on LUCENE-8811: --- TermInSetQuery is designed to be a more efficient replacement for a boolean disjunction of terms, so having it trip the max clauses check would defeat the point of having it in the first place. > Add maximum clause count check to IndexSearcher rather than BooleanQuery > > > Key: LUCENE-8811 > URL: https://issues.apache.org/jira/browse/LUCENE-8811 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Alan Woodward >Priority: Minor > Fix For: master (9.0) > > Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, > LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch > > > Currently we only check whether boolean queries have too many clauses. > However there are other ways that queries may have too many clauses, for > instance if you have boolean queries that have themselves inner boolean > queries. > Could we use the new Query visitor API to move this check from BooleanQuery > to IndexSearcher in order to make this check more consistent across queries? > See for instance LUCENE-8810 where a rewrite rule caused the maximum clause > count to be hit even though the total number of leaf queries remained the > same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1444: LUCENE-9338: Clean up type safety in SimpleBindings
romseygeek commented on a change in pull request #1444: URL: https://github.com/apache/lucene-solr/pull/1444#discussion_r413936078 ## File path: lucene/expressions/src/java/org/apache/lucene/expressions/SimpleBindings.java ## @@ -96,24 +90,51 @@ public DoubleValuesSource getDoubleValuesSource(String name) { case SCORE: return DoubleValuesSource.SCORES; default: -throw new UnsupportedOperationException(); +throw new UnsupportedOperationException(); } } - /** - * Traverses the graph of bindings, checking there are no cycles or missing references - * @throws IllegalArgumentException if the bindings is inconsistent + @Override + public DoubleValuesSource getDoubleValuesSource(String name) { +if (map.containsKey(name) == false) { Review comment: I'm pretty sure this won't be in the hot path - it's part of query setup, not query execution - and I think it reads more clearly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1444: LUCENE-9338: Clean up type safety in SimpleBindings
madrob commented on a change in pull request #1444: URL: https://github.com/apache/lucene-solr/pull/1444#discussion_r413933007 ## File path: lucene/expressions/src/java/org/apache/lucene/expressions/SimpleBindings.java ## @@ -96,24 +90,51 @@ public DoubleValuesSource getDoubleValuesSource(String name) { case SCORE: return DoubleValuesSource.SCORES; default: -throw new UnsupportedOperationException(); +throw new UnsupportedOperationException(); } } - /** - * Traverses the graph of bindings, checking there are no cycles or missing references - * @throws IllegalArgumentException if the bindings is inconsistent + @Override + public DoubleValuesSource getDoubleValuesSource(String name) { +if (map.containsKey(name) == false) { Review comment: Is `containsKey` boolean check more clear than `get` with a null check? I think the latter is going to be more efficient because it's only a single map operation, but I guess this way might be better for the JVM's escape analysis? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090728#comment-17090728 ] Ruben Q L commented on LUCENE-8811: --- Friendly reminder to see if someone can answer [~zabetak]'s question in the previous comment. > Add maximum clause count check to IndexSearcher rather than BooleanQuery > > > Key: LUCENE-8811 > URL: https://issues.apache.org/jira/browse/LUCENE-8811 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Alan Woodward >Priority: Minor > Fix For: master (9.0) > > Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, > LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch > > > Currently we only check whether boolean queries have too many clauses. > However there are other ways that queries may have too many clauses, for > instance if you have boolean queries that have themselves inner boolean > queries. > Could we use the new Query visitor API to move this check from BooleanQuery > to IndexSearcher in order to make this check more consistent across queries? > See for instance LUCENE-8810 where a rewrite rule caused the maximum clause > count to be hit even though the total number of leaf queries remained the > same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090721#comment-17090721 ] Mike Drob commented on SOLR-14428: -- If there are no terms in the index, then the fuzzy query should be collapsing pretty quickly and there would be no reason for it to take up so much memory. Do we only do that at query processing time now? I thought we would be doing that as aggressively as possible. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14430) Authorization plugins should check roles from request
Mike Drob created SOLR-14430: Summary: Authorization plugins should check roles from request Key: SOLR-14430 URL: https://issues.apache.org/jira/browse/SOLR-14430 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: security Reporter: Mike Drob The AuthorizationContext exposes {{getUserPrincipal}} to the plugin, but it does not allow the plugin to interrogate the request for {{isUserInRole}}. If we trust the request enough to get a principal from it, then we should trust it enough to ask about roles, as those could have been defined and verified by an authentication plugin. This model would be an alternative to the current model where RuleBasedAuthorizationPlugin maintains its own user->role mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1444: LUCENE-9338: Clean up type safety in SimpleBindings
romseygeek commented on a change in pull request #1444: URL: https://github.com/apache/lucene-solr/pull/1444#discussion_r413911753 ## File path: lucene/expressions/src/java/org/apache/lucene/expressions/SimpleBindings.java ## @@ -96,24 +90,51 @@ public DoubleValuesSource getDoubleValuesSource(String name) { case SCORE: return DoubleValuesSource.SCORES; default: -throw new UnsupportedOperationException(); +throw new UnsupportedOperationException(); } } - /** - * Traverses the graph of bindings, checking there are no cycles or missing references - * @throws IllegalArgumentException if the bindings is inconsistent + @Override + public DoubleValuesSource getDoubleValuesSource(String name) { +if (map.containsKey(name) == false) { + throw new IllegalArgumentException("Invalid reference '" + name + "'"); +} +return map.get(name).apply(this); + } + + /** + * Traverses the graph of bindings, checking there are no cycles or missing references + * @throws IllegalArgumentException if the bindings is inconsistent */ public void validate() { Review comment: Cacheing is a whole other conversation, which I think is related to the stuff that @mkhludnev is working on around grouping (in that we could plausibly have multiple references to the same iterator all moving in lockstep, where at the moment we pull separate iterators for each reference). But I think that's for a follow-up really, this issue is just a bit of refactoring to improve type safety. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #1444: LUCENE-9338: Clean up type safety in SimpleBindings
msokolov commented on a change in pull request #1444: URL: https://github.com/apache/lucene-solr/pull/1444#discussion_r413907462 ## File path: lucene/expressions/src/java/org/apache/lucene/expressions/SimpleBindings.java ## @@ -96,24 +90,51 @@ public DoubleValuesSource getDoubleValuesSource(String name) { case SCORE: return DoubleValuesSource.SCORES; default: -throw new UnsupportedOperationException(); +throw new UnsupportedOperationException(); } } - /** - * Traverses the graph of bindings, checking there are no cycles or missing references - * @throws IllegalArgumentException if the bindings is inconsistent + @Override + public DoubleValuesSource getDoubleValuesSource(String name) { +if (map.containsKey(name) == false) { + throw new IllegalArgumentException("Invalid reference '" + name + "'"); +} +return map.get(name).apply(this); + } + + /** + * Traverses the graph of bindings, checking there are no cycles or missing references + * @throws IllegalArgumentException if the bindings is inconsistent */ public void validate() { Review comment: Have you considered returning the map, or an immutable view on it, so that callers can use this to enumerate all the dependencies? In a similar framework, I've found this to be pretty helpful for analyzing query patterns. It's also nice to know if the same name occurs multiple times in the dependency tree; maybe one should cache its value in that case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mhitza edited a comment on issue #1435: SOLR-14410: Switch from SysV init script to systemd service file
mhitza edited a comment on issue #1435: URL: https://github.com/apache/lucene-solr/pull/1435#issuecomment-618450549 @janhoy just a quick ping, the PR is ready for a new review edit: just saw the Re-request review button after posting my comment, oh well :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mhitza commented on issue #1435: SOLR-14410: Switch from SysV init script to systemd service file
mhitza commented on issue #1435: URL: https://github.com/apache/lucene-solr/pull/1435#issuecomment-618450549 @janhoy just a quick ping, the PR is ready for a new review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14413) allow timeAllowed and cursorMark parameters
[ https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090672#comment-17090672 ] Mike Drob commented on SOLR-14413: -- That's awesome, thanks for verifying! Definitely update the docs with that info, please! The one concern I have at this point is about the return of zero results. Typically returning the same cursor indicates that we've reached the end of the results. Is there a way to distinguish the real end of the results from the case where we do not get any results in the time allowed? I know that we have the {{partialResults}} header there, but could there be a case where the opposite is true? We return partialResults:true, but there actually are no more results? Again, probably documenting the permutations here is sufficient. Also, can we add tests that explicitly demonstrate a partial cursor mark working? > allow timeAllowed and cursorMark parameters > --- > > Key: SOLR-14413 > URL: https://issues.apache.org/jira/browse/SOLR-14413 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: John Gallagher >Priority: Minor > Attachments: SOLR-14413.patch, timeallowed_cursormarks_results.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and > timeAllowed parameters were not allowed in combination ("Can not search using > both cursorMark and timeAllowed") > , from [QueryComponent.java|#L359]]: > > {code:java} > > if (null != rb.getCursorMark() && 0 < timeAllowed) { > // fundamentally incompatible > throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not > search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + > CommonParams.TIME_ALLOWED); > } {code} > While theoretically impure to use them in combination, it is often desirable > to support cursormarks-style deep paging and attempt to protect Solr nodes > from runaway queries using timeAllowed, in the hopes that most of the time, > the query completes in the allotted time, and there is no conflict. > > However if the query takes too long, it may be preferable to end the query > and protect the Solr node and provide the user with a somewhat inaccurate > sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is > frequently used to prevent runaway load. In fact, cursorMark and > shards.tolerant are allowed in combination, so any argument in favor of > purity would be a bit muddied in my opinion. > > This was discussed once in the mailing list that I can find: > [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E] > It did not look like there was strong support for preventing the combination. > > I have tested cursorMark and timeAllowed combination together, and even when > partial results are returned because the timeAllowed is exceeded, the > cursorMark response value is still valid and reasonable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
dsmiley commented on issue #1351: URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-618441539 Okay; I get your point about noise. I think it's also true that smaller (realistic) data sets may expose how well code can scale down and make different / better choices than it should make for larger sizes; and that's not noise. So both large and small data sets matter for benchmarking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14429) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/SOLR-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090663#comment-17090663 ] Tomoko Uchida commented on SOLR-14429: -- The branch passed {{nightly-smoke}}. {code} [smoker] SUCCESS! [0:43:25.703442] {code} > Convert XXX.txt files to proper XXX.md > -- > > Key: SOLR-14429 > URL: https://issues.apache.org/jira/browse/SOLR-14429 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > "README.txt" files are (partially) written in markdown and can be converted > to proper markdown files. This change was suggested on LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on issue #1444: LUCENE-9338: Clean up type safety in SimpleBindings
romseygeek commented on issue #1444: URL: https://github.com/apache/lucene-solr/pull/1444#issuecomment-618403151 Tricksy, tricksy... I've updated the cycle detection logic to handle multiple levels of recursion, and as a bonus we get a nicer error message as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
jpountz commented on issue #1351: URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-618405559 > Should we infer that you don't think a 1M doc corpus is realistic in many production settings of Lucene? It's certainly realistic, but I think that the point still holds that these collections are not very useful for benchmarking as they tend to be more noisy and can easily miss improvements as even a linear scan is fast on a small collection? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9344) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/LUCENE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090629#comment-17090629 ] Tomoko Uchida commented on LUCENE-9344: --- The branch passed {{nightly-smoke.}} {code:java} [smoker] SUCCESS! [0:44:59.564581] {code} > Convert XXX.txt files to proper XXX.md > --- > > Key: LUCENE-9344 > URL: https://issues.apache.org/jira/browse/LUCENE-9344 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Text files that are (partially) written in markdown (such as "README.txt") > can be converted to proper markdown files. This change was suggested on > LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on issue #1444: LUCENE-9338: Clean up type safety in SimpleBindings
jpountz commented on issue #1444: URL: https://github.com/apache/lucene-solr/pull/1444#issuecomment-618396995 I managed to defeat the new validation logic with this test: ``` public void testCoRecursion42() throws Exception { SimpleBindings bindings = new SimpleBindings(); bindings.add("cycle2", JavascriptCompiler.compile("cycle1")); bindings.add("cycle1", JavascriptCompiler.compile("cycle0")); bindings.add("cycle0", JavascriptCompiler.compile("cycle1")); IllegalArgumentException expected = expectThrows(IllegalArgumentException.class, () -> { bindings.validate(); }); assertTrue(expected.getMessage().contains("Cycle detected")); } ``` It depends on HashMap iteration order, so it might not reproduce for you, but the issue is that `cycle2` gets validated first. And as you recursively create expressions for bindings for `cycle2`, there is an infinite recursive loop, but it only includes `cycle0` and `cycle1` so we might need to track the names of the expressions in a set as we recursively resolve bindings to catch such cases too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on issue #1444: LUCENE-9338: Clean up type safety in SimpleBindings
jpountz commented on issue #1444: URL: https://github.com/apache/lucene-solr/pull/1444#issuecomment-618396405 I managed to defeat the new validation logic with this test: ``` public void testCoRecursion42() throws Exception { SimpleBindings bindings = new SimpleBindings(); bindings.add("cycle2", JavascriptCompiler.compile("cycle1")); bindings.add("cycle1", JavascriptCompiler.compile("cycle0")); bindings.add("cycle0", JavascriptCompiler.compile("cycle1")); IllegalArgumentException expected = expectThrows(IllegalArgumentException.class, () -> { bindings.validate(); }); assertTrue(expected.getMessage().contains("Cycle detected")); } ``` It depends on HashMap iteration order, so it might not reproduce for you, but the issue is that `cycle2` gets validated first. And as you recursively create expressions for bindings for `cycle2`, there is an infinite recursive loop, but it only includes `cycle0` and `cycle1` so we might need to track the names of the expressions in a set as we recursively resolve bindings to catch such cases too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090609#comment-17090609 ] Colvin Cowie commented on SOLR-14428: - Thanks, we'll just stick on 8.3.1 for the time being. Though I will look at moving to CaffeineCache in general since I see the other caches are being removed anyway. Cheers > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1451: LUCENE-9345: Separate MergeSchedulder from IndexWriter
jpountz commented on a change in pull request #1451: URL: https://github.com/apache/lucene-solr/pull/1451#discussion_r413791931 ## File path: lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java ## @@ -516,18 +519,18 @@ public synchronized void merge(IndexWriter writer, MergeTrigger trigger) throws if (verbose()) { message("now merge"); - message(" index: " + writer.segString()); + message(" index(source): " + mergeSource.toString()); Review comment: did you mean to add `(source)` after `index` or is it a side-effect of a search/replace? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090564#comment-17090564 ] Andrzej Bialecki edited comment on SOLR-14428 at 4/23/20, 12:22 PM: [~cjcowie] as a temporary workaround you can switch to using {{CaffeineCache}} and see if it behaves differently. Also, configuring the queryResultCache using {{maxRamMB}} instead of {{maxSize}} should cap the max RAM usage. Of course, these are just stopgaps, they don't address the underlying issue. [~romseygeek] Solr doesn't yet support any cache admission policy / gating, unfortunately. That would be a nice improvement. There's a robust admission policy in CaffeineCache but it serves a slightly different purpose - it considers usage patterns when deciding what items to evict first. It would be nice to also be able to automatically avoid caching objects that are eg. too large, or cheap to compute and large. I'll look into this in a few days (need to wrap up other stuff). was (Author: ab): [~cjcowie] as a temporary workaround you can switch to using {{CaffeineCache}} and see if it behaves differently. Also, configuring the queryResultCache using {{maxRamMB}} instead of {{maxSize}} should cap the max RAM usage. Of course, these are just stopgaps, they don't address the underlying issue. I'll look into this in a few days (need to wrap up other stuff). > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reassigned SOLR-14428: --- Assignee: Andrzej Bialecki > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090564#comment-17090564 ] Andrzej Bialecki commented on SOLR-14428: - [~cjcowie] as a temporary workaround you can switch to using {{CaffeineCache}} and see if it behaves differently. Also, configuring the queryResultCache using {{maxRamMB}} instead of {{maxSize}} should cap the max RAM usage. Of course, these are just stopgaps, they don't address the underlying issue. I'll look into this in a few days (need to wrap up other stuff). > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12690) Regularize LoggerFactory declarations
[ https://issues.apache.org/jira/browse/SOLR-12690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090546#comment-17090546 ] Erick Erickson commented on SOLR-12690: --- Fixed, thanks for catching! > Regularize LoggerFactory declarations > - > > Key: SOLR-12690 > URL: https://issues.apache.org/jira/browse/SOLR-12690 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Minor > Fix For: 7.5, 8.0 > > Attachments: SOLR-12690.patch, SOLR-12690.patch > > > LoggerFactory declarations have several different forms, they should all be: > private static final Logger log = > LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); > * lowercase log > * private static > * non hard-coded class lookup. > I'm going to regularize all of these, I think there are about 80 currently, > we've been nibbling away at this but I'll try to do it in one go. > [~cpoerschke] I think there's another Jira about this that I can't find just > now, ring any bells? > Once that's done, is there a good way to make violations of this fail > precommit? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12690) Regularize LoggerFactory declarations
[ https://issues.apache.org/jira/browse/SOLR-12690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090545#comment-17090545 ] ASF subversion and git services commented on SOLR-12690: Commit eb8d3d3a0f2e039a64e74b296eb64da2ae530800 in lucene-solr's branch refs/heads/branch_8x from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eb8d3d3 ] SOLR-12690: Regularize LoggerFactory declarations. Fixing an incorrect change > Regularize LoggerFactory declarations > - > > Key: SOLR-12690 > URL: https://issues.apache.org/jira/browse/SOLR-12690 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Minor > Fix For: 7.5, 8.0 > > Attachments: SOLR-12690.patch, SOLR-12690.patch > > > LoggerFactory declarations have several different forms, they should all be: > private static final Logger log = > LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); > * lowercase log > * private static > * non hard-coded class lookup. > I'm going to regularize all of these, I think there are about 80 currently, > we've been nibbling away at this but I'll try to do it in one go. > [~cpoerschke] I think there's another Jira about this that I can't find just > now, ring any bells? > Once that's done, is there a good way to make violations of this fail > precommit? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12690) Regularize LoggerFactory declarations
[ https://issues.apache.org/jira/browse/SOLR-12690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090544#comment-17090544 ] ASF subversion and git services commented on SOLR-12690: Commit 4eb755db18f6a605bf62e1c8f029093ad8d6ca7b in lucene-solr's branch refs/heads/master from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4eb755d ] SOLR-12690: Regularize LoggerFactory declarations. Fixing an incorrect change > Regularize LoggerFactory declarations > - > > Key: SOLR-12690 > URL: https://issues.apache.org/jira/browse/SOLR-12690 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Minor > Fix For: 7.5, 8.0 > > Attachments: SOLR-12690.patch, SOLR-12690.patch > > > LoggerFactory declarations have several different forms, they should all be: > private static final Logger log = > LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); > * lowercase log > * private static > * non hard-coded class lookup. > I'm going to regularize all of these, I think there are about 80 currently, > we've been nibbling away at this but I'll try to do it in one go. > [~cpoerschke] I think there's another Jira about this that I can't find just > now, ring any bells? > Once that's done, is there a good way to make violations of this fail > precommit? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on issue #1440: LUCENE-9330: Make SortFields responsible for index sorting and serialization
jpountz commented on issue #1440: URL: https://github.com/apache/lucene-solr/pull/1440#issuecomment-618356399 > This is trickier for the segment info format because both reading and writing are handled by the same class. I think so far we've only done this for Postings formats, and not for other parts of the Codec? We've done it in the past already, see e.g. https://github.com/apache/lucene-solr/commit/23b002a0fdf2f6025f1eb026c0afca247fb21ed0. LuceneXXSegmentInfoFormat is changed to throw an UOE in the `write` method, then a LuceneXXRWSegmentInfoFormat is created that extends LuceneXXSegmentInfoFormat and adds back the `write` implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1440: LUCENE-9330: Make SortFields responsible for index sorting and serialization
jpountz commented on a change in pull request #1440: URL: https://github.com/apache/lucene-solr/pull/1440#discussion_r413746031 ## File path: lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java ## @@ -527,45 +589,63 @@ private void indexPoint(PerField fp, IndexableField field) throws IOException { fp.pointValuesWriter.addPackedValue(docState.docID, field.binaryValue()); } - private void validateIndexSortDVType(Sort indexSort, String fieldName, DocValuesType dvType) { + private void validateIndexSortDVType(Sort indexSort, String fieldToValidate, DocValuesType dvType) throws IOException { for (SortField sortField : indexSort.getSort()) { - if (sortField.getField().equals(fieldName)) { -switch (dvType) { - case NUMERIC: -if (sortField.getType().equals(SortField.Type.INT) == false && - sortField.getType().equals(SortField.Type.LONG) == false && - sortField.getType().equals(SortField.Type.FLOAT) == false && - sortField.getType().equals(SortField.Type.DOUBLE) == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; + IndexSorter sorter = sortField.getIndexSorter(); + if (sorter == null) { +throw new IllegalStateException("Cannot sort index with sort order " + sortField); + } + sorter.getDocComparator(new DocValuesLeafReader() { +@Override +public NumericDocValues getNumericDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.NUMERIC) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be NUMERIC but it is [" + dvType + "]"); + } + return DocValues.emptyNumeric(); +} - case BINARY: -throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); +@Override +public BinaryDocValues getBinaryDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.BINARY) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be BINARY but it is [" + dvType + "]"); + } + return DocValues.emptyBinary(); +} - case SORTED: -if (sortField.getType().equals(SortField.Type.STRING) == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; +@Override +public SortedDocValues getSortedDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.SORTED) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be SORTED but it is [" + dvType + "]"); + } + return DocValues.emptySorted(); +} - case SORTED_NUMERIC: -if (sortField instanceof SortedNumericSortField == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; +@Override +public SortedNumericDocValues getSortedNumericDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.SORTED_NUMERIC) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be SORTED_NUMERIC but it is [" + dvType + "]"); + } + return DocValues.emptySortedNumeric(0); +} - case SORTED_SET: -if (sortField instanceof SortedSetSortField == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; +@Override +public SortedSetDocValues getSortedSetDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.SORTED_SET) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be SORTED_SET but it is [" + dvType + "]"); + } + return DocValues.emptySortedSet(); +} - default: -throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); +@Override +public FieldInfos getFieldInfos() { + throw new UnsupportedOperationException(); } -break; - } + +@Override +public int maxDoc() { + return 0; Review comment: +1 This is an automated message from the
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090534#comment-17090534 ] Alan Woodward commented on SOLR-14428: -- Hi [~cjcowie], thanks for opening this and for the thorough investigation! I'm afraid I don't really know much about the Solr query caches choose which queries to cache - [~ab] might have a better idea of how to fix this? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14429) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/SOLR-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090523#comment-17090523 ] Tomoko Uchida commented on SOLR-14429: -- bq. Is Lucene excluded here? Yes. I have modified only files under {{solr/}} folder here, to create a CHANGES entry for Solr. For Lucene LUCENE-9344 (and a PR) has been opened. > Convert XXX.txt files to proper XXX.md > -- > > Key: SOLR-14429 > URL: https://issues.apache.org/jira/browse/SOLR-14429 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > "README.txt" files are (partially) written in markdown and can be converted > to proper markdown files. This change was suggested on LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Description: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache QRC on 8.3.1 !screenshot-4.png! <1mb With an empty cache, running this query _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory allocation {noformat} 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 {noformat} ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 was: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache QRC on 8.3.1 !screenshot-4.png! <1mb With an empty cache, running this query _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory allocation {noformat} 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 {noformat} > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb
[jira] [Commented] (SOLR-12845) Add a default cluster policy
[ https://issues.apache.org/jira/browse/SOLR-12845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090488#comment-17090488 ] ASF subversion and git services commented on SOLR-12845: Commit 789c97be5fb66b61210cff9dafb89daabec9fe39 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=789c97b ] SOLR-12845: Properly clear default policy between tests. > Add a default cluster policy > > > Key: SOLR-12845 > URL: https://issues.apache.org/jira/browse/SOLR-12845 > Project: Solr > Issue Type: Improvement > Components: AutoScaling >Reporter: Shalin Shekhar Mangar >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.6 > > Attachments: SOLR-12845.patch, SOLR-12845.patch > > > [~varunthacker] commented on SOLR-12739: > bq. We should also ship with some default policies - "Don't allow more than > one replica of a shard on the same JVM" , "Distribute cores across the > cluster evenly" , "Distribute replicas per collection across the nodes" > This issue is about adding these defaults. I propose the following as default > cluster policy: > {code} > # Each shard cannot have more than one replica on the same node if possible > {"replica": "<2", "shard": "#EACH", "node": "#ANY", "strict":false} > # Each collections replicas should be equally distributed amongst nodes > {"replica": "#EQUAL", "node": "#ANY", "strict":false} > # All cores should be equally distributed amongst nodes > {"cores": "#EQUAL", "node": "#ANY", "strict":false} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14429) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/SOLR-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090489#comment-17090489 ] Uwe Schindler commented on SOLR-14429: -- Is Lucene excluded here? I was not aware that the site docs were already fixed to have mdtext as extension. I have no preference on it, but md seems more common than mdtext. > Convert XXX.txt files to proper XXX.md > -- > > Key: SOLR-14429 > URL: https://issues.apache.org/jira/browse/SOLR-14429 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > "README.txt" files are (partially) written in markdown and can be converted > to proper markdown files. This change was suggested on LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12845) Add a default cluster policy
[ https://issues.apache.org/jira/browse/SOLR-12845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090486#comment-17090486 ] ASF subversion and git services commented on SOLR-12845: Commit 2a7ba5a48e065a5bb064a9c62562e73a0c3fb62e in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2a7ba5a ] SOLR-12845: Properly clear default policy between tests. > Add a default cluster policy > > > Key: SOLR-12845 > URL: https://issues.apache.org/jira/browse/SOLR-12845 > Project: Solr > Issue Type: Improvement > Components: AutoScaling >Reporter: Shalin Shekhar Mangar >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.6 > > Attachments: SOLR-12845.patch, SOLR-12845.patch > > > [~varunthacker] commented on SOLR-12739: > bq. We should also ship with some default policies - "Don't allow more than > one replica of a shard on the same JVM" , "Distribute cores across the > cluster evenly" , "Distribute replicas per collection across the nodes" > This issue is about adding these defaults. I propose the following as default > cluster policy: > {code} > # Each shard cannot have more than one replica on the same node if possible > {"replica": "<2", "shard": "#EACH", "node": "#ANY", "strict":false} > # Each collections replicas should be equally distributed amongst nodes > {"replica": "#EQUAL", "node": "#ANY", "strict":false} > # All cores should be equally distributed amongst nodes > {"cores": "#EQUAL", "node": "#ANY", "strict":false} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on issue #1440: LUCENE-9330: Make SortFields responsible for index sorting and serialization
romseygeek commented on issue #1440: URL: https://github.com/apache/lucene-solr/pull/1440#issuecomment-618322909 > Should we remove write support from Lucene70SegmentInfoFormat and have a RW version under test-framework like we do for other components, so that users can't use it in their codecs but we can still run the segment info format test case? This is trickier for the segment info format because both reading and writing are handled by the same class. I think so far we've only done this for Postings formats, and not for other parts of the Codec? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090482#comment-17090482 ] Colvin Cowie commented on SOLR-14428: - Hi [~romseygeek], what are your thoughts on this? Thanks > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Description: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache QRC on 8.3.1 !screenshot-4.png! <1mb With an empty cache, running this query _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory allocation {noformat} 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 {noformat} was: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache QRC on 8.3.1 !screenshot-4.png! <1mb With an empty cache, running this query _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory allocation {noformat} 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:1520 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 {noformat} > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query >
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Description: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache QRC on 8.3.1 !screenshot-4.png! <1mb With an empty cache, running this query _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory allocation {noformat} 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:1520 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 {noformat} was: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache QRC on 8.3.1 !screenshot-4.png! <1mb > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} -- This message was sent by Atlassian Jira
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Description: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache QRC on 8.3.1 !screenshot-4.png! <1mb With an empty cache, running this query _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory allocation {noformat} 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 {noformat} was: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache QRC on 8.3.1 !screenshot-4.png! <1mb With an empty cache, running this query _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory allocation {noformat} 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 {noformat} > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query >
[GitHub] [lucene-solr] s1monw opened a new pull request #1451: LUCENE-9345: Separate MergeSchedulder from IndexWriter
s1monw opened a new pull request #1451: URL: https://github.com/apache/lucene-solr/pull/1451 This change extracts the methods that are used by MergeScheduler into a MergeSource interface. This allows IndexWriter to better ensure locking, hide internal methods and removes the tight coupling between the two complex classes. This will also improve future testing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9345) Separate IndexWriter from MergeScheduler
Simon Willnauer created LUCENE-9345: --- Summary: Separate IndexWriter from MergeScheduler Key: LUCENE-9345 URL: https://issues.apache.org/jira/browse/LUCENE-9345 Project: Lucene - Core Issue Type: Improvement Affects Versions: master (9.0) Reporter: Simon Willnauer MergeScheduler is tightly coupled with IndexWriter which causes IW to expose unnecessary methods. For instance only the scheduler should call IW#getNextMerge() but it's a public method. With some refactorings we can nicely separate the two. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1440: LUCENE-9330: Make SortFields responsible for index sorting and serialization
romseygeek commented on a change in pull request #1440: URL: https://github.com/apache/lucene-solr/pull/1440#discussion_r413692946 ## File path: lucene/core/src/java/org/apache/lucene/index/SortedDocValuesWriter.java ## @@ -79,11 +78,6 @@ public void addValue(int docID, BytesRef value) { lastDocID = docID; } - @Override - public void finish(int maxDoc) { -updateBytesUsed(); - } Review comment: It was always called either immediately before `flush` or `getDocComparator`, so it seemed to make sense to just fold it directly into those methods. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1440: LUCENE-9330: Make SortFields responsible for index sorting and serialization
romseygeek commented on a change in pull request #1440: URL: https://github.com/apache/lucene-solr/pull/1440#discussion_r413689133 ## File path: lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java ## @@ -527,45 +589,63 @@ private void indexPoint(PerField fp, IndexableField field) throws IOException { fp.pointValuesWriter.addPackedValue(docState.docID, field.binaryValue()); } - private void validateIndexSortDVType(Sort indexSort, String fieldName, DocValuesType dvType) { + private void validateIndexSortDVType(Sort indexSort, String fieldToValidate, DocValuesType dvType) throws IOException { for (SortField sortField : indexSort.getSort()) { - if (sortField.getField().equals(fieldName)) { -switch (dvType) { - case NUMERIC: -if (sortField.getType().equals(SortField.Type.INT) == false && - sortField.getType().equals(SortField.Type.LONG) == false && - sortField.getType().equals(SortField.Type.FLOAT) == false && - sortField.getType().equals(SortField.Type.DOUBLE) == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; + IndexSorter sorter = sortField.getIndexSorter(); + if (sorter == null) { +throw new IllegalStateException("Cannot sort index with sort order " + sortField); + } + sorter.getDocComparator(new DocValuesLeafReader() { +@Override +public NumericDocValues getNumericDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.NUMERIC) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be NUMERIC but it is [" + dvType + "]"); + } + return DocValues.emptyNumeric(); +} - case BINARY: -throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); +@Override +public BinaryDocValues getBinaryDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.BINARY) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be BINARY but it is [" + dvType + "]"); + } + return DocValues.emptyBinary(); +} - case SORTED: -if (sortField.getType().equals(SortField.Type.STRING) == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; +@Override +public SortedDocValues getSortedDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.SORTED) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be SORTED but it is [" + dvType + "]"); + } + return DocValues.emptySorted(); +} - case SORTED_NUMERIC: -if (sortField instanceof SortedNumericSortField == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; +@Override +public SortedNumericDocValues getSortedNumericDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.SORTED_NUMERIC) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be SORTED_NUMERIC but it is [" + dvType + "]"); + } + return DocValues.emptySortedNumeric(0); +} - case SORTED_SET: -if (sortField instanceof SortedSetSortField == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; +@Override +public SortedSetDocValues getSortedSetDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.SORTED_SET) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be SORTED_SET but it is [" + dvType + "]"); + } + return DocValues.emptySortedSet(); +} - default: -throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); +@Override +public FieldInfos getFieldInfos() { + throw new UnsupportedOperationException(); } -break; - } + +@Override +public int maxDoc() { + return 0; Review comment: `IndexSorter.getDocComparator(Reader)` calls `maxDoc()` on the reader to allocate its comparison arrays. We
[jira] [Commented] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090455#comment-17090455 ] Tomoko Uchida commented on LUCENE-9321: --- I opened LUCENE-9344 and SOLR-14429 with draft patches that converts ".txt" files to ".md". Note: Solr has a lot "README.txt" files and the part of them are not actually markdown but pure text; I converted all of them to .md for consistency. I would like to merge it before this issue (because the gradle task also refer the md files), would you review it please? > Port documentation task to gradle > - > > Key: LUCENE-9321 > URL: https://issues.apache.org/jira/browse/LUCENE-9321 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/build >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > This is a placeholder issue for porting ant "documentation" task to gradle. > The generated documents should be able to be published on lucene.apache.org > web site on "as-is" basis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9344) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/LUCENE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090440#comment-17090440 ] Tomoko Uchida edited comment on LUCENE-9344 at 4/23/20, 9:50 AM: - I opened [https://github.com/apache/lucene-solr/pull/1449] - Changed README.txt, MIGRATE.txt, etc. to .md and partially fix its markdown formatting. - LICENCE.txt and NOTICE.txt were not modified. This also modifies build.xml so that distribution package built by {{ant package-tgz}} includes the all .md files. {{ant documentation}} also works fine. TODO: run smoke test was (Author: tomoko uchida): I opened [https://github.com/apache/lucene-solr/pull/1449] - Changed README.txt, MIGRATE.txt, etc. to .md and partially fix its markdown formatting. - LICENCE.txt and NOTICE.txt were not modified. This also modifies build.xml so that distribution package built by {{ant package-tgz}} includes the all .md files TODO: run smoke test > Convert XXX.txt files to proper XXX.md > --- > > Key: LUCENE-9344 > URL: https://issues.apache.org/jira/browse/LUCENE-9344 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Text files that are (partially) written in markdown (such as "README.txt") > can be converted to proper markdown files. This change was suggested on > LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14429) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/SOLR-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090448#comment-17090448 ] Tomoko Uchida commented on SOLR-14429: -- I opened [https://github.com/apache/lucene-solr/pull/1450] * Converted all README.txt to README.md and partially fixed its formatting (as proper markdown). I also fixed pointers to the files. * LICENCE.txt and NOTICE.txt were not modified. This also modifies build.xml so that distribution package built by {{ant create-package}} includes the all .md files TODO: run smoke test > Convert XXX.txt files to proper XXX.md > -- > > Key: SOLR-14429 > URL: https://issues.apache.org/jira/browse/SOLR-14429 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > "README.txt" files are (partially) written in markdown and can be converted > to proper markdown files. This change was suggested on LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1444: LUCENE-9338: Clean up type safety in SimpleBindings
romseygeek commented on a change in pull request #1444: URL: https://github.com/apache/lucene-solr/pull/1444#discussion_r413670977 ## File path: lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionValueSource.java ## @@ -42,13 +42,17 @@ this.expression = Objects.requireNonNull(expression); variables = new DoubleValuesSource[expression.variables.length]; boolean needsScores = false; -for (int i = 0; i < variables.length; i++) { - DoubleValuesSource source = bindings.getDoubleValuesSource(expression.variables[i]); - if (source == null) { -throw new RuntimeException("Internal error. Variable (" + expression.variables[i] + ") does not exist."); +try { + for (int i = 0; i < variables.length; i++) { +DoubleValuesSource source = bindings.getDoubleValuesSource(expression.variables[i]); +if (source == null) { + throw new RuntimeException("Internal error. Variable (" + expression.variables[i] + ") does not exist."); +} +needsScores |= source.needsScores(); +variables[i] = source; } - needsScores |= source.needsScores(); - variables[i] = source; +} catch (StackOverflowError e) { Review comment: I've reworked this so that instead of a `Supplier` we store a `Function`, and supply a special `Bindings` implementation in `validate()` that checks for cycles. Definitely much nicer, thanks for nudging me in the right direction! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9344) Convert XXX.txt files to proper XXX.md
[ https://issues.apache.org/jira/browse/LUCENE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090440#comment-17090440 ] Tomoko Uchida commented on LUCENE-9344: --- I opened [https://github.com/apache/lucene-solr/pull/1449] - Changed README.txt, MIGRATE.txt, etc. to .md and partially fix its markdown formatting. - LICENCE.txt and NOTICE.txt were not modified. This also modifies build.xml so that distribution package built by {{ant package-tgz}} includes the all .md files TODO: run smoke test > Convert XXX.txt files to proper XXX.md > --- > > Key: LUCENE-9344 > URL: https://issues.apache.org/jira/browse/LUCENE-9344 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Text files that are (partially) written in markdown (such as "README.txt") > can be converted to proper markdown files. This change was suggested on > LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1440: LUCENE-9330: Make SortFields responsible for index sorting and serialization
jpountz commented on a change in pull request #1440: URL: https://github.com/apache/lucene-solr/pull/1440#discussion_r413639432 ## File path: lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene70/package.html ## @@ -0,0 +1,25 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + + + + + + +Lucene 7.0 file format. + + Review comment: you could use a package-info.java instead. We use the html version in backward-codecs because there are otherwise conflicts if the package also exists in core, but it looks like you removed the package from core here so we could use a package-info.java in backward-codecs? ## File path: lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java ## @@ -527,45 +589,63 @@ private void indexPoint(PerField fp, IndexableField field) throws IOException { fp.pointValuesWriter.addPackedValue(docState.docID, field.binaryValue()); } - private void validateIndexSortDVType(Sort indexSort, String fieldName, DocValuesType dvType) { + private void validateIndexSortDVType(Sort indexSort, String fieldToValidate, DocValuesType dvType) throws IOException { for (SortField sortField : indexSort.getSort()) { - if (sortField.getField().equals(fieldName)) { -switch (dvType) { - case NUMERIC: -if (sortField.getType().equals(SortField.Type.INT) == false && - sortField.getType().equals(SortField.Type.LONG) == false && - sortField.getType().equals(SortField.Type.FLOAT) == false && - sortField.getType().equals(SortField.Type.DOUBLE) == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; + IndexSorter sorter = sortField.getIndexSorter(); + if (sorter == null) { +throw new IllegalStateException("Cannot sort index with sort order " + sortField); + } + sorter.getDocComparator(new DocValuesLeafReader() { +@Override +public NumericDocValues getNumericDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.NUMERIC) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be NUMERIC but it is [" + dvType + "]"); + } + return DocValues.emptyNumeric(); +} - case BINARY: -throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); +@Override +public BinaryDocValues getBinaryDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.BINARY) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be BINARY but it is [" + dvType + "]"); + } + return DocValues.emptyBinary(); +} - case SORTED: -if (sortField.getType().equals(SortField.Type.STRING) == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; +@Override +public SortedDocValues getSortedDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.SORTED) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be SORTED but it is [" + dvType + "]"); + } + return DocValues.emptySorted(); +} - case SORTED_NUMERIC: -if (sortField instanceof SortedNumericSortField == false) { - throw new IllegalArgumentException("invalid doc value type:" + dvType + " for sortField:" + sortField); -} -break; +@Override +public SortedNumericDocValues getSortedNumericDocValues(String field) { + if (Objects.equals(field, fieldToValidate) && dvType != DocValuesType.SORTED_NUMERIC) { +throw new IllegalArgumentException("SortField " + sortField + " expected field [" + field + "] to be SORTED_NUMERIC
[GitHub] [lucene-solr] mocobeta opened a new pull request #1450: SOLR-14429: Convert XXX.txt files to proper XXX.md
mocobeta opened a new pull request #1450: URL: https://github.com/apache/lucene-solr/pull/1450 # Description Converted all README.txt to README.md and partially fixed its formatting (as proper markdown). I also fixed pointers to the files. See https://issues.apache.org/jira/browse/SOLR-14429 # Tests - Distribution package built by `ant create-package` includes the all .md files - TODO: run smoke test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on a change in pull request #1449: LUCENE-9344: Convert XXX.txt files to proper XXX.md
mocobeta commented on a change in pull request #1449: URL: https://github.com/apache/lucene-solr/pull/1449#discussion_r413662279 ## File path: lucene/SYSTEM_REQUIREMENTS.md ## @@ -14,5 +14,5 @@ implementing Lucene (document size, number of documents, and number of hits retrieved to name a few). The benchmarks page has some information related to performance on particular platforms. -*To build Apache Lucene from source, refer to the `BUILD.txt` file in +*To build Apache Lucene from the source, refer to the `BUILD.txt` file in Review comment: My linter complained, so fixed the phrasing as it suggested; can be reverted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on a change in pull request #1449: LUCENE-9344: Convert XXX.txt files to proper XXX.md
mocobeta commented on a change in pull request #1449: URL: https://github.com/apache/lucene-solr/pull/1449#discussion_r413661856 ## File path: lucene/JRE_VERSION_MIGRATION.md ## @@ -19,16 +19,16 @@ For reference, JRE major versions with their corresponding Unicode versions: * Java 8, Unicode 6.2 * Java 9, Unicode 8.0 -In general, whether or not you need to re-index largely depends upon the data that +In general, whether you need to re-index largely depends upon the data that Review comment: My linter complained, so fixed the phrasing as it suggested; can be reverted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on a change in pull request #1449: LUCENE-9344: Convert XXX.txt files to proper XXX.md
mocobeta commented on a change in pull request #1449: URL: https://github.com/apache/lucene-solr/pull/1449#discussion_r413662067 ## File path: lucene/JRE_VERSION_MIGRATION.md ## @@ -19,16 +19,16 @@ For reference, JRE major versions with their corresponding Unicode versions: * Java 8, Unicode 6.2 * Java 9, Unicode 8.0 -In general, whether or not you need to re-index largely depends upon the data that +In general, whether you need to re-index largely depends upon the data that you are searching, and what was changed in any given Unicode version. For example, -if you are completely sure that your content is limited to the "Basic Latin" range +if you are completely sure your content is limited to the "Basic Latin" range of Unicode, you can safely ignore this. ## Special Notes: LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION * `StandardAnalyzer` will return the same results under Java 5 as it did under Java 1.4. This is because it is largely independent of the runtime JRE for -Unicode support, (with the exception of lowercasing). However, no changes to +Unicode support, (except for lowercasing). However, no changes to Review comment: My linter complained, so fixed the phrasing as it suggested; can be reverted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on a change in pull request #1449: LUCENE-9344: Convert XXX.txt files to proper XXX.md
mocobeta commented on a change in pull request #1449: URL: https://github.com/apache/lucene-solr/pull/1449#discussion_r413660560 ## File path: lucene/BUILD.md ## @@ -66,7 +66,7 @@ system. NOTE: the ~ character represents your user account home directory. -Step 3) Run ant +## Step 4) Run ant Review comment: Actually it's step 4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta opened a new pull request #1449: LUCENE-9344: Convert XXX.txt files to proper XXX.md
mocobeta opened a new pull request #1449: URL: https://github.com/apache/lucene-solr/pull/1449 # Description Changed README.txt, MIGRATE.txt, etc. to `.md` and partially fix its markdown formatting. LICENCE.txt and NOTICE.txt were not modified. See https://issues.apache.org/jira/browse/LUCENE-9344 # Tests - Distribution package built by `ant package-tgz` includes the all .md files - TODO: run smoke test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster
[ https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geza Nagy updated SOLR-12182: - Attachment: SOLR-12182_20200423.patch > Can not switch urlScheme in 7x if there are any cores in the cluster > > > Key: SOLR-12182 > URL: https://issues.apache.org/jira/browse/SOLR-12182 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0, 7.1, 7.2 >Reporter: Anshum Gupta >Priority: Major > Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch > > > I was trying to enable TLS on a cluster that was already in use i.e. had > existing collections and ended up with down cores, that wouldn't come up and > the following core init errors in the logs: > *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > replica with coreNodeName core_node4 exists but with a different name or > base_url.* > What is happening here is that the core/replica is defined in the > clusterstate with the urlScheme as part of it's base URL e.g. > *"base_url":"http:hostname:port/solr"*. > Switching the urlScheme in Solr breaks this convention as the host now uses > HTTPS instead. > Actually, I ran into this with an older version because I was running with > *legacyCloud=false* and then realized that we switched that to the default > behavior only in 7x i.e while most users did not hit this issue with older > versions, unless they overrode the legacyCloud value explicitly, users > running 7x are bound to run into this more often. > Switching the value of legacyCloud to true, bouncing the cluster so that the > clusterstate gets flushed, and then setting it back to false is a workaround > but a bit risky one if you don't know if you have any old cores lying around. > Ideally, I think we shouldn't prepend the urlScheme to the base_url value and > use the urlScheme on the fly to construct it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster
[ https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090432#comment-17090432 ] Geza Nagy commented on SOLR-12182: -- Hi, I'm working on this, I uploaded a patch. I've put my changes into the ZKSyncTool class and I attached the sh for starting it. Originally It's made for synchronize the security json to ensure it's content in ZK. I extended it to looking for and correcting wrong base urls replica by replica. It collects the information from env variables, I guess it should been modified to read system properties or other sources maybe. > Can not switch urlScheme in 7x if there are any cores in the cluster > > > Key: SOLR-12182 > URL: https://issues.apache.org/jira/browse/SOLR-12182 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0, 7.1, 7.2 >Reporter: Anshum Gupta >Priority: Major > Attachments: SOLR-12182.patch > > > I was trying to enable TLS on a cluster that was already in use i.e. had > existing collections and ended up with down cores, that wouldn't come up and > the following core init errors in the logs: > *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > replica with coreNodeName core_node4 exists but with a different name or > base_url.* > What is happening here is that the core/replica is defined in the > clusterstate with the urlScheme as part of it's base URL e.g. > *"base_url":"http:hostname:port/solr"*. > Switching the urlScheme in Solr breaks this convention as the host now uses > HTTPS instead. > Actually, I ran into this with an older version because I was running with > *legacyCloud=false* and then realized that we switched that to the default > behavior only in 7x i.e while most users did not hit this issue with older > versions, unless they overrode the legacyCloud value explicitly, users > running 7x are bound to run into this more often. > Switching the value of legacyCloud to true, bouncing the cluster so that the > clusterstate gets flushed, and then setting it back to false is a workaround > but a bit risky one if you don't know if you have any old cores lying around. > Ideally, I think we shouldn't prepend the urlScheme to the base_url value and > use the urlScheme on the fly to construct it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Description: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache QRC on 8.3.1 !screenshot-4.png! <1mb was: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14429) Convert XXX.txt files to proper XXX.md
Tomoko Uchida created SOLR-14429: Summary: Convert XXX.txt files to proper XXX.md Key: SOLR-14429 URL: https://issues.apache.org/jira/browse/SOLR-14429 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Tomoko Uchida Assignee: Tomoko Uchida "README.txt" files are (partially) written in markdown and can be converted to proper markdown files. This change was suggested on LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9344) Convert XXX.txt files to proper XXX.md
Tomoko Uchida created LUCENE-9344: - Summary: Convert XXX.txt files to proper XXX.md Key: LUCENE-9344 URL: https://issues.apache.org/jira/browse/LUCENE-9344 Project: Lucene - Core Issue Type: Improvement Affects Versions: master (9.0) Reporter: Tomoko Uchida Assignee: Tomoko Uchida Text files that are (partially) written in markdown (such as "README.txt") can be converted to proper markdown files. This change was suggested on LUCENE-9321. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Attachment: screenshot-4.png > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Attachment: screenshot-3.png > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Description: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. Query Result Cache on 8.5.1: !screenshot-3.png! ~316mb in the cache was: I sent this to the mailing list I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors while running our normal tests. After profiling it was clear that the majority of the heap was allocated through FuzzyQuery. LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries from random UUID strings for 5 minutes {code} FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" {code} When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while the memory usage has increased drastically on 8.5.0 and 8.5.1. Comparison of heap usage while running the attached test against Solr 8.3.1 and 8.5.1 with a single (empty) shard and 4GB heap: !image-2020-04-23-09-18-06-070.png! And with 4 shards on 8.4.1 and 8.5.0: !screenshot-2.png! I'm guessing that the memory might be being leaked if the FuzzyQuery objects are referenced from the cache, while the FuzzyTermsEnum would not have been. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1444: LUCENE-9338: Clean up type safety in SimpleBindings
jpountz commented on a change in pull request #1444: URL: https://github.com/apache/lucene-solr/pull/1444#discussion_r412947903 ## File path: lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionValueSource.java ## @@ -42,13 +42,17 @@ this.expression = Objects.requireNonNull(expression); variables = new DoubleValuesSource[expression.variables.length]; boolean needsScores = false; -for (int i = 0; i < variables.length; i++) { - DoubleValuesSource source = bindings.getDoubleValuesSource(expression.variables[i]); - if (source == null) { -throw new RuntimeException("Internal error. Variable (" + expression.variables[i] + ") does not exist."); +try { + for (int i = 0; i < variables.length; i++) { +DoubleValuesSource source = bindings.getDoubleValuesSource(expression.variables[i]); +if (source == null) { + throw new RuntimeException("Internal error. Variable (" + expression.variables[i] + ") does not exist."); +} +needsScores |= source.needsScores(); +variables[i] = source; } - needsScores |= source.needsScores(); - variables[i] = source; +} catch (StackOverflowError e) { Review comment: Hmm this is a pre-existing issues but catching stack overflows is usually a bad idea as it might leave objects in an inconsistent state. I wonder if it could be checked differently. Also is it ok to move the catch from validate() to here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12690) Regularize LoggerFactory declarations
[ https://issues.apache.org/jira/browse/SOLR-12690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090409#comment-17090409 ] Andrzej Bialecki commented on SOLR-12690: - Good catch David! Indeed, it should be {{TLOG}} . > Regularize LoggerFactory declarations > - > > Key: SOLR-12690 > URL: https://issues.apache.org/jira/browse/SOLR-12690 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Minor > Fix For: 7.5, 8.0 > > Attachments: SOLR-12690.patch, SOLR-12690.patch > > > LoggerFactory declarations have several different forms, they should all be: > private static final Logger log = > LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); > * lowercase log > * private static > * non hard-coded class lookup. > I'm going to regularize all of these, I think there are about 80 currently, > we've been nibbling away at this but I'll try to do it in one go. > [~cpoerschke] I think there's another Jira about this that I can't find just > now, ring any bells? > Once that's done, is there a good way to make violations of this fail > precommit? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Attachment: (was: screenshot-1.png) > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org