[GitHub] [lucene] mikemccand merged pull request #116: Explicit flush
mikemccand merged pull request #116: URL: https://github.com/apache/lucene/pull/116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #116: Explicit flush
mikemccand commented on pull request #116: URL: https://github.com/apache/lucene/pull/116#issuecomment-829720860 Thank you @balmukundblr! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #115: LUCENE-4198: add format description for term impacts to javadocs
mocobeta commented on pull request #115: URL: https://github.com/apache/lucene/pull/115#issuecomment-829706157 @jpountz Could you take a look at this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9936) update gradle build to support gpg signing of tgz/zip distributions
[ https://issues.apache.org/jira/browse/LUCENE-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated LUCENE-9936: --- Attachment: LUCENE-9936.patch Status: Open (was: Open) Patch updated with additional lessons learned in SOLR-15361 > update gradle build to support gpg signing of tgz/zip distributions > --- > > Key: LUCENE-9936 > URL: https://issues.apache.org/jira/browse/LUCENE-9936 > Project: Lucene - Core > Issue Type: Task >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: LUCENE-9936.patch, LUCENE-9936.patch > > > the gradle build does not currently have any support for gpg signing the > distributions we produce. > this is neccessary for releases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy merged pull request #2487: SOLR-15383 Solr Zookeeper status page shows green even when a ZK is down
janhoy merged pull request #2487: URL: https://github.com/apache/lucene-solr/pull/2487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #2487: SOLR-15383 Solr Zookeeper status page shows green even when a ZK is down
janhoy opened a new pull request #2487: URL: https://github.com/apache/lucene-solr/pull/2487 This is a backport to 8.9 of https://github.com/apache/solr/pull/103 See https://issues.apache.org/jira/browse/SOLR-15383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #116: Explicit flush
mikemccand commented on pull request #116: URL: https://github.com/apache/lucene/pull/116#issuecomment-829452800 Thanks @balmukundblr -- looks great -- I'll try to push today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] balmukundblr commented on pull request #2349: Added FlushIndexTask to flush documents at index thread level.
balmukundblr commented on pull request #2349: URL: https://github.com/apache/lucene-solr/pull/2349#issuecomment-829439874 > Thanks @balmukundblr this looks great! Could you please open a new PR on the new Lucene GitHub repo? https://github.com/apache/lucene > > Thanks! As you suggested, i've raised a PR(https://github.com/apache/lucene/pull/116) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] balmukundblr opened a new pull request #116: Explicit flush
balmukundblr opened a new pull request #116: URL: https://github.com/apache/lucene/pull/116 # Description Longer completion time for Close Index call. Once AddDoc task completes, Benchmark algo calls ForceMerge/CloseIndex task, which eventually allows all pending flushes to be completed. Since flushes during CloseIndex call are sequential, it takes longer time to complete and delays the overall Index completion time. While indexing 1 million documents with reuters21578 (plain text Document derived from reuters21578 corpus), we observed CloseIndex call takes around 35% of total time. # Solution Developed a new FlushIndexTask, which uses flushNextBuffer() Lucene API, to flush document at Index thread level, while not impacting any other Index threads. Adding this task in the algo file, immediately after AddDoc task, would ensure flushing all docs before calling ForceMerge/CloseIndex task. With this solution in place, CloseIndex task time was reduced significantly and it also improved total time for Indexing. # Tests Since, we are using existing Lucene API - flushNextBuffer(), hence it already has test cases. -Passed existing tests # Checklist Please review the following and check all that apply: - [ x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x ] I have developed this patch against the `main` branch. - [x ] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9944) Implement alternative drill sideways faceting with provided CollectorManager
[ https://issues.apache.org/jira/browse/LUCENE-9944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335643#comment-17335643 ] Greg Miller commented on LUCENE-9944: - Here is a draft version of this change. It's rough (mostly a copy/paste job) but illustrates the desired functionality. If folks think this is a good idea, I'll clean up the change, add some testing and put together a proper PR. https://github.com/apache/lucene/compare/main...gsmiller:LUCENE-9944/draft > Implement alternative drill sideways faceting with provided CollectorManager > > > Key: LUCENE-9944 > URL: https://issues.apache.org/jira/browse/LUCENE-9944 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: main (9.0) >Reporter: Greg Miller >Priority: Minor > > Today, if a user of {{DrillSideways}} wants to provide their own > {{CollectorManager}} when invoking {{search}}, they get this alternate, > "concurrent" implementation that creates N copies of the provided > {{DrillDownQuery}} (where N is the number of drill-down dimensions) and runs > them all concurrently. This is a very different implementation than the one a > user would get if providing a {{Collector}} instead. Additionally, an > {{ExecutorService}} must be provided when constructing a {{DrillSideways}} > instance if the user wants to bring their own {{CollectorManager}} > (otherwise, they'll get an unfriendly NPE when calling {{search}}). > I propose adding an implementation to {{DrillSideways}} that will run the > "non-concurrent" algorithm in the case that a user wants to provide their own > {{CollectorManager}} but doesn't want to provide an {{ExecutorService}} (and > doesn't want the concurrent algorithm). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9944) Implement alternative drill sideways faceting with provided CollectorManager
Greg Miller created LUCENE-9944: --- Summary: Implement alternative drill sideways faceting with provided CollectorManager Key: LUCENE-9944 URL: https://issues.apache.org/jira/browse/LUCENE-9944 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: main (9.0) Reporter: Greg Miller Today, if a user of {{DrillSideways}} wants to provide their own {{CollectorManager}} when invoking {{search}}, they get this alternate, "concurrent" implementation that creates N copies of the provided {{DrillDownQuery}} (where N is the number of drill-down dimensions) and runs them all concurrently. This is a very different implementation than the one a user would get if providing a {{Collector}} instead. Additionally, an {{ExecutorService}} must be provided when constructing a {{DrillSideways}} instance if the user wants to bring their own {{CollectorManager}} (otherwise, they'll get an unfriendly NPE when calling {{search}}). I propose adding an implementation to {{DrillSideways}} that will run the "non-concurrent" algorithm in the case that a user wants to provide their own {{CollectorManager}} but doesn't want to provide an {{ExecutorService}} (and doesn't want the concurrent algorithm). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #2486: SOLR-15384 Zookeeper Status handler /admin/zookeeper/status not queryable from SolrJ
janhoy opened a new pull request #2486: URL: https://github.com/apache/lucene-solr/pull/2486 Backport of https://github.com/apache/solr/pull/105 targeting 8x (8.9) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building
neoremind commented on a change in pull request #91: URL: https://github.com/apache/lucene/pull/91#discussion_r623150413 ## File path: lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointsReaderUtils.java ## @@ -35,63 +37,60 @@ MutablePointsReaderUtils() {} - /** Sort the given {@link MutablePointValues} based on its packed value then doc ID. */ + /** + * Sort the given {@link MutablePointValues} based on its packed value, note that doc ID is not + * taken into sorting algorithm, since if they are already in ascending order, stable sort is able + * to maintain the ordering of doc ID. + */ Review comment: What if the doc IDs came in are not in ascending order? Shall we use MSBRadixSorter with stable reorder to sort? If so, then there needs an if-else clause to provide two different sorters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building
neoremind commented on a change in pull request #91: URL: https://github.com/apache/lucene/pull/91#discussion_r623147194 ## File path: lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointsReaderUtils.java ## @@ -35,63 +37,60 @@ MutablePointsReaderUtils() {} - /** Sort the given {@link MutablePointValues} based on its packed value then doc ID. */ + /** + * Sort the given {@link MutablePointValues} based on its packed value, note that doc ID is not + * taken into sorting algorithm, since if they are already in ascending order, stable sort is able + * to maintain the ordering of doc ID. + */ Review comment: Sure, I can check here, well, I search the codebase, the only user is `BKDWriter`, the others are some testcases (actually I updated `TestMutablePointsReaderUtils` to ignore doc ID when comparing), could we make it a must for the input to have doc IDs in ascending order? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building
neoremind commented on a change in pull request #91: URL: https://github.com/apache/lucene/pull/91#discussion_r623143156 ## File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +/** + * Stable radix sorter for variable-length strings. + * + * @lucene.internal + */ +public abstract class StableMSBRadixSorter extends MSBRadixSorter { Review comment: The `InPlaceMergeSorter` works like a closure, it needs some runtime variables like `reader` and `config` to construct, so it is not able to override `getFallbackSorter`. How about make `getFallbackSorter` throws an UnexpectedOperationException here explicitly, to give the caller a hint to override? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building
neoremind commented on a change in pull request #91: URL: https://github.com/apache/lucene/pull/91#discussion_r623139405 ## File path: lucene/core/src/test/org/apache/lucene/util/bkd/TestBKD.java ## @@ -1698,6 +1698,16 @@ public void getValue(int i, BytesRef packedValue) { public byte getByteAt(int i, int k) { throw new UnsupportedOperationException(); } + + @Override + public void assign(int from, int to) { +// do nothing + } + + @Override + public void finalizeAssign(int from, int to) { +// do nothing + } Review comment: comment addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building
neoremind commented on a change in pull request #91: URL: https://github.com/apache/lucene/pull/91#discussion_r623139120 ## File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +/** + * Stable radix sorter for variable-length strings. + * + * @lucene.internal + */ +public abstract class StableMSBRadixSorter extends MSBRadixSorter { + + protected boolean useStableSort; + + public StableMSBRadixSorter(int maxLength) { +super(maxLength); + } + + /** Assign the from-th value to to-th position in another array which used temporarily. */ + protected void assign(int from, int to) { +throw new UnsupportedOperationException(); + } + + /** Finalize assign operation, to switch array. */ + protected void finalizeAssign(int from, int to) { +throw new UnsupportedOperationException(); + } Review comment: comment addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building
neoremind commented on a change in pull request #91: URL: https://github.com/apache/lucene/pull/91#discussion_r623138722 ## File path: lucene/core/src/java/org/apache/lucene/codecs/MutablePointValues.java ## @@ -41,4 +41,10 @@ protected MutablePointValues() {} /** Swap the i-th and j-th values. */ public abstract void swap(int i, int j); + + /** Assign the from-th value to to-th position in another array which used temporarily. */ + public abstract void assign(int from, int to); + + /** Finalize assign operation, to switch array. */ + public abstract void finalizeAssign(int from, int to); Review comment: I have updated the two method names. In terms of performance, `memcpy` should be faster than many `memset` operations, so I thought maybe the current implementation makes sense, what do you think? ## File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +/** + * Stable radix sorter for variable-length strings. + * + * @lucene.internal + */ +public abstract class StableMSBRadixSorter extends MSBRadixSorter { + + protected boolean useStableSort; Review comment: removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
[ https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved LUCENE-9572. -- Fix Version/s: 8.9 Resolution: Implemented > Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types > --- > > Key: LUCENE-9572 > URL: https://issues.apache.org/jira/browse/LUCENE-9572 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis, modules/test-framework >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Fix For: 8.9 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > (Breaking this off of SOLR-14597 for independent review) > TypeAsSynonymFilter converts types attributes to a synonym. In some cases the > original token may have already had flags set on it and it may be useful to > propagate some or all of those flags to the synonym we are generating. This > ticket provides that ability and allows the user to specify a bitmask to > specify which flags are retained. > Additionally there may be some set of types that should not be converted to > synonyms, and this change allows the user to specify a comma separated list > of types to ignore (most common case will be to ignore a common default type > of 'word' I suspect) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #115: LUCENE-4198: add format description for term impacts to javadocs
mocobeta commented on a change in pull request #115: URL: https://github.com/apache/lucene/pull/115#discussion_r623054572 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsFormat.java ## @@ -299,7 +306,7 @@ * positions. Some payloads and offsets will be separated out into .pos file, for performance * reasons. * - * PayFile(.pay): --> Header,+ * PayFile(.pay): --> Header, Review comment: It is not related to LUCENE-4198 at all... but TermPayloads seems to be optional as well as TermOffsets. https://github.com/apache/lucene/blob/a9a3f6529dac48b9e83a03343b5dda3dc492d955/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsWriter.java#L328-L333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #115: LUCENE-4198: add format description for term impacts to javadocs
mocobeta commented on a change in pull request #115: URL: https://github.com/apache/lucene/pull/115#discussion_r623054572 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsFormat.java ## @@ -299,7 +306,7 @@ * positions. Some payloads and offsets will be separated out into .pos file, for performance * reasons. * - * PayFile(.pay): --> Header,+ * PayFile(.pay): --> Header, Review comment: It is not related to LUCENE-4198 at all... but TermPayloads seems to be optional as well as TermOffsets. https://github.com/apache/lucene/blob/a9a3f6529dac48b9e83a03343b5dda3dc492d955/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsWriter.java#L300-L313 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #115: LUCENE-4198: add format description for term impacts to javadocs
mocobeta commented on a change in pull request #115: URL: https://github.com/apache/lucene/pull/115#discussion_r623050451 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsFormat.java ## @@ -235,6 +238,10 @@ * and stored as a difference sequence. * PayByteUpto indicates the start offset of the current payload. It is equivalent to * the sum of the payload lengths in the current block up to PosBlockOffset + * ImpactLength is the total length of CompetitiveFreqDelta and CompetitiveNormDelta + * pairs. CompetitiveFreqDelta and CompetitiveNormDelta are used to safely skip score + * calculation for uncompetitive documents; See {@link + * org.apache.lucene.codecs.CompetitiveImpactAccumulator} for more details. Review comment: Also added brief description about the impacts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta opened a new pull request #115: LUCENE-4198: add format description for term impacts to javadocs
mocobeta opened a new pull request #115: URL: https://github.com/apache/lucene/pull/115 https://issues.apache.org/jira/browse/LUCENE-4198 introduced term impacts to PostingsFormat; this fixed the format documentation ("SkipDatum" part) to reflect the changes. **latest javadocs (for .doc file)** ![postingsformat_latest_javadoc](https://user-images.githubusercontent.com/1825333/116557310-49200f80-a939-11eb-88a3-470c9b7c3162.png) **modified javadocs with this patch (for .doc file)** ![postingsformat_updated_javadoc](https://user-images.githubusercontent.com/1825333/116557363-55a46800-a939-11eb-91bc-cd02ec86aa44.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov opened a new pull request #114: LUCENE-9905: PerFieldVectorFormat
msokolov opened a new pull request #114: URL: https://github.com/apache/lucene/pull/114 This emulates the approach taken for per-field customization of postings and doc-values formats and applies that to numeric vectors, ie VectorFormat. It registers discoverable services for Lucen90HnswVectorFormat and a new AssertingVectorFormat, to enable testing akin to what we have for the other formats. The asserting format doesn't assert very much, but it's a start. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova merged pull request #103: Fix regression to account payloads while merging
mayya-sharipova merged pull request #103: URL: https://github.com/apache/lucene/pull/103 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #103: Fix regression to account payloads while merging
mayya-sharipova commented on a change in pull request #103: URL: https://github.com/apache/lucene/pull/103#discussion_r623002647 ## File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java ## @@ -16,70 +16,22 @@ */ package org.apache.lucene.index; +import static com.carrotsearch.randomizedtesting.RandomizedTest.randomIntBetween; + import java.io.IOException; import org.apache.lucene.analysis.MockAnalyzer; -import org.apache.lucene.analysis.MockTokenizer; +import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.FieldType; import org.apache.lucene.document.TextField; import org.apache.lucene.store.Directory; -import org.apache.lucene.util.English; +import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.IOUtils; import org.apache.lucene.util.LuceneTestCase; import org.apache.lucene.util.TestUtil; -import org.junit.AfterClass; -import org.junit.BeforeClass; public class TestTermVectors extends LuceneTestCase { - private static IndexReader reader; - private static Directory directory; - - @BeforeClass - public static void beforeClass() throws Exception { -directory = newDirectory(); -RandomIndexWriter writer = -new RandomIndexWriter( -random(), -directory, -newIndexWriterConfig(new MockAnalyzer(random(), MockTokenizer.SIMPLE, true)) -.setMergePolicy(newLogMergePolicy())); -// writer.setNoCFSRatio(1.0); -// writer.infoStream = System.out; -for (int i = 0; i < 1000; i++) { - Document doc = new Document(); - FieldType ft = new FieldType(TextField.TYPE_STORED); - int mod3 = i % 3; - int mod2 = i % 2; - if (mod2 == 0 && mod3 == 0) { -ft.setStoreTermVectors(true); -ft.setStoreTermVectorOffsets(true); -ft.setStoreTermVectorPositions(true); - } else if (mod2 == 0) { -ft.setStoreTermVectors(true); -ft.setStoreTermVectorPositions(true); - } else if (mod3 == 0) { -ft.setStoreTermVectors(true); -ft.setStoreTermVectorOffsets(true); - } else { -ft.setStoreTermVectors(true); - } - doc.add(new Field("field", English.intToEnglish(i), ft)); - // test no term vectors too - doc.add(new TextField("noTV", English.intToEnglish(i), Field.Store.YES)); - writer.addDocument(doc); -} -reader = writer.getReader(); -writer.close(); - } - - @AfterClass - public static void afterClass() throws Exception { -reader.close(); -directory.close(); -reader = null; -directory = null; - } Review comment: Indeed, it was unused. Each test was using different variables for directory and different names fields. ## File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java ## @@ -166,4 +118,97 @@ public void testFullMergeAddIndexesReader() throws Exception { verifyIndex(target); IOUtils.close(target, input[0], input[1]); } + + /** + * Assert that a merged segment has payloads set up in fieldInfo, if at least 1 segment has + * payloads for this field. + */ + public void testMergeWithPayloads() throws Exception { + +final FieldType ft1 = new FieldType(TextField.TYPE_NOT_STORED); +ft1.setStoreTermVectors(true); +ft1.setStoreTermVectorOffsets(true); +ft1.setStoreTermVectorPositions(true); +ft1.setStoreTermVectorPayloads(true); +ft1.freeze(); + +Directory dir = newDirectory(); +final int numDocsInSegment = 10; +IndexWriterConfig indexWriterConfig = +new IndexWriterConfig(new MockAnalyzer(random())).setMaxBufferedDocs(numDocsInSegment); +IndexWriter writer = new IndexWriter(dir, indexWriterConfig); + +boolean hasPayloads1 = random().nextBoolean(); +boolean hasPayloads2 = !hasPayloads1; Review comment: Addressed in 70bdf18a7ba81f4c80b2663d72793c2c38707da8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9204) Move span queries to the queries module
[ https://issues.apache.org/jira/browse/LUCENE-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335319#comment-17335319 ] Adrien Grand commented on LUCENE-9204: -- +1 I think this code would be best duplicated anyway, so that boolean queries can evolve the way they work without risking to break spans or vice-versa. > Move span queries to the queries module > --- > > Key: LUCENE-9204 > URL: https://issues.apache.org/jira/browse/LUCENE-9204 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > > We have a slightly odd situation currently, with two parallel query > structures for building complex positional queries: the long-standing span > queries, in core; and interval queries, in the queries module. Given that > interval queries solve at least some of the problems we've had with Spans, I > think we should be pushing users more towards these implementations. It's > counter-intuitive to do that when Spans are in core though. I've opened this > issue to discuss moving the spans package as a whole to the queries module. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building
jpountz commented on a change in pull request #91: URL: https://github.com/apache/lucene/pull/91#discussion_r622900018 ## File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +/** + * Stable radix sorter for variable-length strings. + * + * @lucene.internal + */ +public abstract class StableMSBRadixSorter extends MSBRadixSorter { + + protected boolean useStableSort; Review comment: unused? ## File path: lucene/core/src/java/org/apache/lucene/codecs/MutablePointValues.java ## @@ -41,4 +41,10 @@ protected MutablePointValues() {} /** Swap the i-th and j-th values. */ public abstract void swap(int i, int j); + + /** Assign the from-th value to to-th position in another array which used temporarily. */ + public abstract void assign(int from, int to); + + /** Finalize assign operation, to switch array. */ + public abstract void finalizeAssign(int from, int to); Review comment: `TimSorter` uses `save` and `restore` as names, maybe we should try to reuse this terminology for consistency? By the way, maybe we could even reuse the semantics of these methods where `save` copies a whole range while `restore` restores a single element at a time. This would only require slight modifications to `StableMSBRadixSorter#reorder` to copy the whole range first and then move items from the `temp` array to their expected index in the original array? ## File path: lucene/core/src/test/org/apache/lucene/util/bkd/TestBKD.java ## @@ -1698,6 +1698,16 @@ public void getValue(int i, BytesRef packedValue) { public byte getByteAt(int i, int k) { throw new UnsupportedOperationException(); } + + @Override + public void assign(int from, int to) { +// do nothing + } + + @Override + public void finalizeAssign(int from, int to) { +// do nothing + } Review comment: can you throw an UnexpectedOperationException in both these methods instead, like we do for other methods? ## File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +/** + * Stable radix sorter for variable-length strings. + * + * @lucene.internal + */ +public abstract class StableMSBRadixSorter extends MSBRadixSorter { Review comment: Since the name has "stable", can you override `getFallbackSorter` to return an `InPlaceMergeSorter` so that the sort is guaranteed to be stable in all cases? ## File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed unde
[jira] [Commented] (LUCENE-9204) Move span queries to the queries module
[ https://issues.apache.org/jira/browse/LUCENE-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335265#comment-17335265 ] Alan Woodward commented on LUCENE-9204: --- So there are a few things that need to be done here, because Spans reuse some of the two-phase conjunction and disjunction logic. It can all be done as a single PR but it might make sense to split things out into smaller tickets? Or multiple PRs attached to this ticket maybe. * Duplicate ConjunctionDISI and add a package-private version in the spans package. We should be able to make the standard impl package-private at this point as well * Duplicate DisiWrapper/DisiPriorityQueue/DisjunctionDISI and add a package-private version in the spans package. This will make three copies of this code (there's one for intervals as well) which suggests that we may be able to generify it a bit, but it's also on the hot path for queries so maybe we just live with the duplication. * Actually move all the spans code and move those tests that rely on spans into queries/tests. Some of these can be turned into base test classes - for example the Matches test class has a lot of methods which could be moved to test-framework and then the span test cases can be added to a new suite in queries. > Move span queries to the queries module > --- > > Key: LUCENE-9204 > URL: https://issues.apache.org/jira/browse/LUCENE-9204 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > > We have a slightly odd situation currently, with two parallel query > structures for building complex positional queries: the long-standing span > queries, in core; and interval queries, in the queries module. Given that > interval queries solve at least some of the problems we've had with Spans, I > think we should be pushing users more towards these implementations. It's > counter-intuitive to do that when Spans are in core though. I've opened this > issue to discuss moving the spans package as a whole to the queries module. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9940) The order of disjuncts in DisjunctionMaxQuery affects equals() impl
[ https://issues.apache.org/jira/browse/LUCENE-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335255#comment-17335255 ] ASF subversion and git services commented on LUCENE-9940: - Commit f7a3587091f8ce05ef08a56523571239b383b217 in lucene's branch refs/heads/main from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f7a3587 ] LUCENE-9940: DisjunctionMaxQuery shouldn't depend on disjunct order for equals checks (#110) DisjunctionMaxQuery stores its disjuncts in a Query[], and uses Arrays.equals() for comparisons in its equals() implementation. This means that the order in which disjuncts are added to the query matters for equality checks. This commit changes DMQ to instead store its disjuncts in a Multiset, meaning that ordering no longer matters. The getDisjuncts() method now returns a Collection rather than a List, and some tests are changed to use query equality checks rather than iterating over disjuncts and expecting a particular order. > The order of disjuncts in DisjunctionMaxQuery affects equals() impl > --- > > Key: LUCENE-9940 > URL: https://issues.apache.org/jira/browse/LUCENE-9940 > Project: Lucene - Core > Issue Type: Bug >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > DisjunctionMaxQuery stores its disjuncts in a java array, and its equals() > implementation uses Arrays.equal() when checking equality. This means that > two queries with the same disjuncts but added in a different order will > compare as different, even though their results will be identical. We should > replace the array with a Set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9940) The order of disjuncts in DisjunctionMaxQuery affects equals() impl
[ https://issues.apache.org/jira/browse/LUCENE-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward resolved LUCENE-9940. --- Fix Version/s: main (9.0) Resolution: Fixed > The order of disjuncts in DisjunctionMaxQuery affects equals() impl > --- > > Key: LUCENE-9940 > URL: https://issues.apache.org/jira/browse/LUCENE-9940 > Project: Lucene - Core > Issue Type: Bug >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: main (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > DisjunctionMaxQuery stores its disjuncts in a java array, and its equals() > implementation uses Arrays.equal() when checking equality. This means that > two queries with the same disjuncts but added in a different order will > compare as different, even though their results will be identical. We should > replace the array with a Set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek merged pull request #110: LUCENE-9940: DisjunctionMaxQuery shouldn't depend on disjunct order for equals checks
romseygeek merged pull request #110: URL: https://github.com/apache/lucene/pull/110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #103: Fix regression to account payloads while merging
jpountz commented on a change in pull request #103: URL: https://github.com/apache/lucene/pull/103#discussion_r622844633 ## File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java ## @@ -166,4 +118,97 @@ public void testFullMergeAddIndexesReader() throws Exception { verifyIndex(target); IOUtils.close(target, input[0], input[1]); } + + /** + * Assert that a merged segment has payloads set up in fieldInfo, if at least 1 segment has + * payloads for this field. + */ + public void testMergeWithPayloads() throws Exception { + +final FieldType ft1 = new FieldType(TextField.TYPE_NOT_STORED); +ft1.setStoreTermVectors(true); +ft1.setStoreTermVectorOffsets(true); +ft1.setStoreTermVectorPositions(true); +ft1.setStoreTermVectorPayloads(true); +ft1.freeze(); + +Directory dir = newDirectory(); +final int numDocsInSegment = 10; +IndexWriterConfig indexWriterConfig = +new IndexWriterConfig(new MockAnalyzer(random())).setMaxBufferedDocs(numDocsInSegment); +IndexWriter writer = new IndexWriter(dir, indexWriterConfig); + +boolean hasPayloads1 = random().nextBoolean(); +boolean hasPayloads2 = !hasPayloads1; Review comment: Can you test both cases everytime instead of relying on randomization? ## File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java ## @@ -16,70 +16,22 @@ */ package org.apache.lucene.index; +import static com.carrotsearch.randomizedtesting.RandomizedTest.randomIntBetween; + import java.io.IOException; import org.apache.lucene.analysis.MockAnalyzer; -import org.apache.lucene.analysis.MockTokenizer; +import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.FieldType; import org.apache.lucene.document.TextField; import org.apache.lucene.store.Directory; -import org.apache.lucene.util.English; +import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.IOUtils; import org.apache.lucene.util.LuceneTestCase; import org.apache.lucene.util.TestUtil; -import org.junit.AfterClass; -import org.junit.BeforeClass; public class TestTermVectors extends LuceneTestCase { - private static IndexReader reader; - private static Directory directory; - - @BeforeClass - public static void beforeClass() throws Exception { -directory = newDirectory(); -RandomIndexWriter writer = -new RandomIndexWriter( -random(), -directory, -newIndexWriterConfig(new MockAnalyzer(random(), MockTokenizer.SIMPLE, true)) -.setMergePolicy(newLogMergePolicy())); -// writer.setNoCFSRatio(1.0); -// writer.infoStream = System.out; -for (int i = 0; i < 1000; i++) { - Document doc = new Document(); - FieldType ft = new FieldType(TextField.TYPE_STORED); - int mod3 = i % 3; - int mod2 = i % 2; - if (mod2 == 0 && mod3 == 0) { -ft.setStoreTermVectors(true); -ft.setStoreTermVectorOffsets(true); -ft.setStoreTermVectorPositions(true); - } else if (mod2 == 0) { -ft.setStoreTermVectors(true); -ft.setStoreTermVectorPositions(true); - } else if (mod3 == 0) { -ft.setStoreTermVectors(true); -ft.setStoreTermVectorOffsets(true); - } else { -ft.setStoreTermVectors(true); - } - doc.add(new Field("field", English.intToEnglish(i), ft)); - // test no term vectors too - doc.add(new TextField("noTV", English.intToEnglish(i), Field.Store.YES)); - writer.addDocument(doc); -} -reader = writer.getReader(); -writer.close(); - } - - @AfterClass - public static void afterClass() throws Exception { -reader.close(); -directory.close(); -reader = null; -directory = null; - } Review comment: oh, this was unused? ## File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java ## @@ -166,4 +118,97 @@ public void testFullMergeAddIndexesReader() throws Exception { verifyIndex(target); IOUtils.close(target, input[0], input[1]); } + + /** + * Assert that a merged segment has payloads set up in fieldInfo, if at least 1 segment has + * payloads for this field. + */ + public void testMergeWithPayloads() throws Exception { + +final FieldType ft1 = new FieldType(TextField.TYPE_NOT_STORED); +ft1.setStoreTermVectors(true); +ft1.setStoreTermVectorOffsets(true); +ft1.setStoreTermVectorPositions(true); +ft1.setStoreTermVectorPayloads(true); +ft1.freeze(); + +Directory dir = newDirectory(); +final int numDocsInSegment = 10; +IndexWriterConfig indexWriterConfig = +new IndexWriterConfig(new MockAnalyzer(random())).setMaxBufferedDocs(numDocsInSegment); +IndexWriter writer = new IndexWriter(dir, indexWriterConfig); + +boolean hasPayloads1 = random().nextBo
[jira] [Resolved] (LUCENE-9930) UkrainianMorfologikAnalyzer reloads its Dictionary for every new TokenStreamComponents instance
[ https://issues.apache.org/jira/browse/LUCENE-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward resolved LUCENE-9930. --- Fix Version/s: main (9.0) Resolution: Fixed > UkrainianMorfologikAnalyzer reloads its Dictionary for every new > TokenStreamComponents instance > --- > > Key: LUCENE-9930 > URL: https://issues.apache.org/jira/browse/LUCENE-9930 > Project: Lucene - Core > Issue Type: Bug >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: main (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > Large static data structures should be loaded in Analyzer constructors and > shared between threads, but the UkrainianMorfologikAnalyzer is loading its > dictionary in `createComponents`, which means it is reloaded and stored on > every new analysis thread. If you have a large dictionary and highly > concurrent indexing then this can lead to you running out of memory as > multiple copies of the dictionary are held in thread locals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #108: LUCENE-9897 Change dependency checking mechanism to use gradle checksum verification
dweiss commented on a change in pull request #108: URL: https://github.com/apache/lucene/pull/108#discussion_r622807200 ## File path: gradle/verification-metadata.xml ## @@ -0,0 +1,2198 @@ + +https://schema.gradle.org/dependency-verification"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="https://schema.gradle.org/dependency-verification https://schema.gradle.org/dependency-verification/dependency-verification-1.0.xsd";> + + true + false + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Review comment: This link points at disabling transitive dependencies for a particular dependency - it's not related to checksumming. We do want those transitive dependencies and palantir's plugin makes sure they're consistent. > Gradle provides an API which allows disabling dependency verification on some specific configurations. Right. I see how it's done in the documentation and indeed it may be difficult to hook into all the configurations to just pick the subset that we want checksums for. We could leave all the checksums but things would have to work. If they don't then I'd stick to the custom solution we rolled out until switching without so much pain is possible? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on pull request #107: LUCENE-9047: Move the Directory APIs to be little endian (take 2)
iverase commented on pull request #107: URL: https://github.com/apache/lucene/pull/107#issuecomment-828998218 Thanks @rmuir! I will wait until Monday, if there is no more feedback I will proceed merging the current patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org