[GitHub] [lucene] zhaih commented on pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily
zhaih commented on pull request #163: URL: https://github.com/apache/lucene/pull/163#issuecomment-853538280 Thank you @mikemccand @dweiss and @bruno-roustant all for reviewing this PR! Since this PR is more of an optimization for adversarial cases so we don't want to sacrifice performance of our normal use cases. I will take some time to benchmark those changes (this one as well as few others that are not yet presented) first and see what numbers we'll get to decide how to move on with this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7321) Character Mapping
[ https://issues.apache.org/jira/browse/LUCENE-7321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356136#comment-17356136 ] Marcus Eagan commented on LUCENE-7321: -- Hi [~iprovalo] I'm curious if you have been maintaining this patch through version `8` for your company? If so, do you want to revive this discussion? > Character Mapping > - > > Key: LUCENE-7321 > URL: https://issues.apache.org/jira/browse/LUCENE-7321 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.6.1, 5.4.1, 6.0, 6.0.1 >Reporter: Ivan Provalov >Priority: Minor > Labels: patch > Fix For: 6.0.1 > > Attachments: CharacterMappingComponent.pdf, LUCENE-7321.patch > > > One of the challenges in search is recall of an item with a common typing > variant. These cases can be as simple as lower/upper case in most languages, > accented characters, or more complex morphological phenomena like prefix > omitting, or constructing a character with some combining mark. This > component addresses the cases, which are not covered by ASCII folding > component, or more complex to design with other tools. The idea is that a > linguist could provide the mappings in a tab-delimited file, which then can > be directly used by Solr. > The mappings are maintained in the tab-delimited file, which could be just a > copy paste from Excel spreadsheet. This gives the linguists the opportunity > to create the mappings, then for the developer to include them in Solr > configuration. There are a few cases, when the mappings grow complex, where > some additional debugging may be required. The mappings can contain any > sequence of characters to any other sequence of characters. > Some of the cases I discuss in detail document are handling the voiced vowels > for Japanese; common typing substitutions for Korean, Russian, Polish; > transliteration for Polish, Arabic; prefix removal for Arabic; suffix folding > for Japanese. In the appendix, I give an example of implementing a Russian > light weight stemmer using this component. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9976) WANDScorer assertion error in ensureConsistent
[ https://issues.apache.org/jira/browse/LUCENE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356092#comment-17356092 ] Zach Chen commented on LUCENE-9976: --- Hi Dawid and Michael! I tried again with the command line above with 1000 iterations, but it still didn't reproduce for me for some reasons. {code:java} xichen@Xis-MacBook-Pro lucene % ./gradlew test -Ptests.iters=1000 --tests TestExpressionSorts.testQueries -Dtests.seed=FF571CE915A0955 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.asserts=true -p lucene/expressions/ Starting a Gradle Daemon, 7 busy and 18 incompatible Daemons could not be reused, use --status for details > Task :randomizationInfo Running tests with randomization seed: tests.seed=FF571CE915A0955 > Task :lucene:expressions:test :lucene:expressions:test (SUCCESS): 1000 test(s) The slowest tests (exceeding 500 ms) during this run: 6.62s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:159F353910AC3564]} (:lucene:expressions) 6.56s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:993EFB36FB8A23F3]} (:lucene:expressions) 6.22s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:C9E931CFB8A6C82E]} (:lucene:expressions) 6.21s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:2854FA7396FAF62F]} (:lucene:expressions) 5.84s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:5515E173B4FD16BA]} (:lucene:expressions) 5.65s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:A8C1890BB457C90F]} (:lucene:expressions) 5.62s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:A44F7F3F8B79B2DB]} (:lucene:expressions) 5.57s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:328FA3364F99C839]} (:lucene:expressions) 5.56s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:9D8BCE5B3371B6E2]} (:lucene:expressions) 5.55s TestExpressionSorts.testQueries {seed=[FF571CE915A0955:2E635F6265446CED]} (:lucene:expressions) The slowest suites (exceeding 1s) during this run: 2662.21s TestExpressionSorts (:lucene:expressions) BUILD SUCCESSFUL in 45m 1s{code} > WANDScorer assertion error in ensureConsistent > -- > > Key: LUCENE-9976 > URL: https://issues.apache.org/jira/browse/LUCENE-9976 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Priority: Major > > Build fails and is reproducible: > https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/283/console > {code} > ./gradlew test --tests TestExpressionSorts.testQueries > -Dtests.seed=FF571CE915A0955 -Dtests.multiplier=2 -Dtests.nightly=true > -Dtests.slow=true -Dtests.asserts=true -p lucene/expressions/ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9823) SynonymQuery rewrite can change field boost calculation
[ https://issues.apache.org/jira/browse/LUCENE-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani resolved LUCENE-9823. -- Fix Version/s: main (9.0) Resolution: Fixed > SynonymQuery rewrite can change field boost calculation > --- > > Key: LUCENE-9823 > URL: https://issues.apache.org/jira/browse/LUCENE-9823 > Project: Lucene - Core > Issue Type: Bug >Reporter: Julie Tibshirani >Priority: Minor > Labels: newdev > Fix For: main (9.0) > > Time Spent: 50m > Remaining Estimate: 0h > > SynonymQuery accepts a boost per term, which acts as a multiplier on the term > frequency in the document. When rewriting a SynonymQuery with a single term, > we create a BoostQuery wrapping a TermQuery. This changes the meaning of the > boost: it now multiplies the final TermQuery score instead of multiplying the > term frequency before it's passed to the score calculation. > This is a small point, but maybe it's worth avoiding rewriting a single-term > SynonymQuery unless the boost is 1.0. > The same consideration affects CombinedFieldQuery in sandbox. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9823) SynonymQuery rewrite can change field boost calculation
[ https://issues.apache.org/jira/browse/LUCENE-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356058#comment-17356058 ] ASF subversion and git services commented on LUCENE-9823: - Commit 89034ad8cf8019c62a0a4ed1e477cd52e1277e60 in lucene's branch refs/heads/main from Naoto MINAMI [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=89034ad ] LUCENE-9823: Prevent unsafe rewrites for SynonymQuery and CombinedFieldQuery. (#160) Before, rewriting could slightly change the scoring when weights were specified. We now rewrite less aggressively to avoid changing the query's behavior. > SynonymQuery rewrite can change field boost calculation > --- > > Key: LUCENE-9823 > URL: https://issues.apache.org/jira/browse/LUCENE-9823 > Project: Lucene - Core > Issue Type: Bug >Reporter: Julie Tibshirani >Priority: Minor > Labels: newdev > Time Spent: 40m > Remaining Estimate: 0h > > SynonymQuery accepts a boost per term, which acts as a multiplier on the term > frequency in the document. When rewriting a SynonymQuery with a single term, > we create a BoostQuery wrapping a TermQuery. This changes the meaning of the > boost: it now multiplies the final TermQuery score instead of multiplying the > term frequency before it's passed to the score calculation. > This is a small point, but maybe it's worth avoiding rewriting a single-term > SynonymQuery unless the boost is 1.0. > The same consideration affects CombinedFieldQuery in sandbox. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #160: LUCENE-9823: Fix not to rewrite boosted single term SynonymQuery
jtibshirani merged pull request #160: URL: https://github.com/apache/lucene/pull/160 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec
jtibshirani commented on a change in pull request #166: URL: https://github.com/apache/lucene/pull/166#discussion_r644395730 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorFormat.java ## @@ -76,14 +79,55 @@ static final int VERSION_START = 0; static final int VERSION_CURRENT = VERSION_START; - /** Sole constructor */ + static final String BEAM_WIDTH_KEY = + Lucene90HnswVectorFormat.class.getSimpleName() + ".beam_width"; + static final String MAX_CONN_KEY = Lucene90HnswVectorFormat.class.getSimpleName() + ".max_conn"; + + /** + * Controls how many of the nearest neighbor candidates are connected to the new node. See {@link + * HnswGraph} for details. + */ + private final int maxConn; + + /** + * The number of candidate neighbors to track while searching the graph for each newly inserted + * node. See {@link HnswGraph} for details. + */ + private final int beamWidth; + public Lucene90HnswVectorFormat() { super("Lucene90HnswVectorFormat"); +this.maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN; +this.beamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH; + } + + public Lucene90HnswVectorFormat(int maxConn, int beamWidth) { +super("Lucene90HnswVectorFormat"); +this.maxConn = maxConn; +this.beamWidth = beamWidth; } @Override public VectorWriter fieldsWriter(SegmentWriteState state) throws IOException { -return new Lucene90HnswVectorWriter(state); +SegmentInfo segmentInfo = state.segmentInfo; +putFormatAttribute(segmentInfo, MAX_CONN_KEY, String.valueOf(maxConn)); +putFormatAttribute(segmentInfo, BEAM_WIDTH_KEY, String.valueOf(beamWidth)); +return new Lucene90HnswVectorWriter(state, maxConn, beamWidth); + } + + private void putFormatAttribute(SegmentInfo si, String key, String value) { +String previousValue = si.putAttribute(key, value); +if (previousValue != null && previousValue.equals(value) == false) { Review comment: I'm not sure that writing and validating these format attributes is necessary, since we don't use them when reading. It just seemed nice (and low cost) to have the construction parameters available in the segment infos for debugging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new pull request #166: LUCENE-9905: Move HNSW build parameters to codec
jtibshirani opened a new pull request #166: URL: https://github.com/apache/lucene/pull/166 Previously, the max connections and beam width parameters could be configured as field type attributes. This PR moves them to be parameters on Lucene90HnswVectorFormat, to avoid exposing details of the vector format implementation in the API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] chlorochrule commented on pull request #160: LUCENE-9823: Fix not to rewrite boosted single term SynonymQuery
chlorochrule commented on pull request #160: URL: https://github.com/apache/lucene/pull/160#issuecomment-853248112 Thanks for explaining and reviewing again! I fixed :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a change in pull request #157: LUCENE-9963 Fix issue with FlattenGraphFilter throwing exceptions from holes
zhaih commented on a change in pull request #157: URL: https://github.com/apache/lucene/pull/157#discussion_r644172488 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -255,6 +260,32 @@ private boolean releaseBufferedToken() { return false; } + /** + * Free output nodes before the given outputs. Free inputs nodes before the minimum input node for + * this output. + * + * @param output target output node + */ + private void freeBefore(OutputNode output) { +// We've released all of the tokens that end at the current output, +// so free all output nodes before this. Input nodes are more complex. +// The second shingled tokens with alternate paths can appear later in the output graph than +// than some of their alternate path tokens. +// Because of this case we can only free from the minimum because the minimum node will have +// come from before the second shingled token. +// This means we have to hold onto input nodes who's tokens get stacked on previous nodes until +// we've completely passed those inputs. +// Related tests testShingledGap, testShingledGapWithHoles +outputFrom++; +int freeBefore = Collections.min(output.inputNodes); +// This will catch a node being freed early if it's input to the next output. +// Could a freed early node be input to a later output? +assert outputNodes.get(outputFrom).inputNodes.stream().filter(n -> freeBefore < n).count() > 0 +: "FreeBefore " + output.inputNodes.get(0) + " will free in use nodes"; Review comment: Isn't this still the old assertion that need to be changed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a change in pull request #157: LUCENE-9963 Fix issue with FlattenGraphFilter throwing exceptions from holes
zhaih commented on a change in pull request #157: URL: https://github.com/apache/lucene/pull/157#discussion_r644165757 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -362,6 +394,48 @@ public boolean incrementToken() throws IOException { } } + private OutputNode recoverFromHole(InputNode src, int startOffset) { +// This means the "from" node of this token was never seen as a "to" node, +// which should only happen if we just crossed a hole. This is a challenging +// case for us because we normally rely on the full dependencies expressed +// by the arcs to assign outgoing node IDs. It would be better if tokens +// were never dropped but instead just marked deleted with a new +// TermDeletedAttribute (boolean valued) ... but until that future, we have +// a hack here to forcefully jump the output node ID: +assert src.outputNode == -1; +src.node = inputFrom; + +int maxOutIndex = outputNodes.getMaxPos(); +OutputNode outSrc = outputNodes.get(maxOutIndex); +// There are two types of holes, neighbor holes and consumed holes. A neighbor hole is between +// two tokens, it looks like a->*hole*->b. +// A consumed hole is between the start a long token and the next token that is "under" the path +// of the long token. +// It looks like :___abc__ +// || +// |V +// *hole*->b->c +// A consumed hole should have the outputsrc node of the short token after the hole be the out +// dest +// of the long token as that's how we'd resolve it if the missing token were present. +// neighbor holes should start a new output node and continue as if the hole didn't +// exist. +// Related tests testAltPathLastStepHoleFollowedByHole, testAltPathFirstStepHole, Review comment: Thank you for linking the tests here! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a change in pull request #157: LUCENE-9963 Fix issue with FlattenGraphFilter throwing exceptions from holes
zhaih commented on a change in pull request #157: URL: https://github.com/apache/lucene/pull/157#discussion_r644165438 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -362,6 +378,40 @@ public boolean incrementToken() throws IOException { } } + private OutputNode recoverFromHole(InputNode src, int startOffset) { +// This means the "from" node of this token was never seen as a "to" node, +// which should only happen if we just crossed a hole. This is a challenging +// case for us because we normally rely on the full dependencies expressed +// by the arcs to assign outgoing node IDs. It would be better if tokens +// were never dropped but instead just marked deleted with a new +// TermDeletedAttribute (boolean valued) ... but until that future, we have +// a hack here to forcefully jump the output node ID: +assert src.outputNode == -1; +src.node = inputFrom; + +int maxOutIndex = outputNodes.getMaxPos(); +OutputNode outSrc = outputNodes.get(maxOutIndex); +// There are two types of holes, neighbor holes and consumed holes. A neighbor hole is between Review comment: Thank you, that helps a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #160: LUCENE-9823: Fix not to rewrite boosted single term SynonymQuery
jtibshirani commented on a change in pull request #160: URL: https://github.com/apache/lucene/pull/160#discussion_r644158197 ## File path: lucene/core/src/test/org/apache/lucene/search/TestSynonymQuery.java ## @@ -466,4 +467,26 @@ public void testRandomTopDocs() throws IOException { reader.close(); dir.close(); } + + public void testRewrite() throws IOException { +// zero length SynonymQuery is rewritten +SynonymQuery q = new SynonymQuery.Builder("f").build(); Review comment: A small comment: in most other rewrite tests (like `TestBoostQuery#testRewrite`) we check the higher-level call `IndexSearcher#rewrite`. It'd be nice to do this to be consistent and to exercise the full rewrite logic. ## File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java ## @@ -237,14 +236,6 @@ public Query rewrite(IndexReader reader) throws IOException { if (fieldTerms.length == 1) { Review comment: I like the simplification below of removing the rewrite to synonym query (which is not perfectly accurate). I think we also need to fix or remove this check `if (fieldTerms.length == 1) { ... }`, since it's only accurate when the field weight is 1.0f. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily
zhaih commented on a change in pull request #163: URL: https://github.com/apache/lucene/pull/163#discussion_r644146024 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.automaton; + +import com.carrotsearch.hppc.BitMixer; +import com.carrotsearch.hppc.IntIntHashMap; +import com.carrotsearch.hppc.cursors.IntCursor; +import java.util.Arrays; + +/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */ +final class StateSet extends IntSet { + + private final IntIntHashMap inner; + private int hashCode; + private boolean changed; + private int[] arrayCache = new int[0]; + + StateSet(int capacity) { +inner = new IntIntHashMap(capacity); + } + + // Adds this state to the set + void incr(int num) { +if (inner.addTo(num, 1) == 1) { + changed = true; +} + } + + // Removes this state from the set, if count decrs to 0 + void decr(int num) { +assert inner.containsKey(num); +int keyIndex = inner.indexOf(num); +int count = inner.indexGet(keyIndex) - 1; +if (count == 0) { + inner.remove(num); + changed = true; +} else { + inner.indexReplace(keyIndex, count); +} + } + + void computeHash() { +if (changed == false) { + return; +} +hashCode = inner.size(); +for (IntCursor cursor : inner.keys()) { + hashCode += BitMixer.mix(cursor.value); +} + } + + /** + * Create a snapshot of this int set associated with a given state. The snapshot will not retain + * any frequency information about the elements of this set, only existence. + * + * It is the caller's responsibility to ensure that the hashCode and data are up to date via + * the {@link #computeHash()} method before calling this method. + * + * @param state the state to associate with the frozen set. + * @return A new FrozenIntSet with the same values as this set. + */ + FrozenIntSet freeze(int state) { +if (changed == false) { + assert arrayCache != null; Review comment: @mikemccand We actually might fall inside this `if`. So before we call `freeze`, we will perform a `get` using this `StateSet` by the `newStates` hashmap, there we might call `getArray()` if there's hashcode collision and `changed` should be set to `false` there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily
zhaih commented on a change in pull request #163: URL: https://github.com/apache/lucene/pull/163#discussion_r644142000 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.automaton; + +import com.carrotsearch.hppc.BitMixer; +import com.carrotsearch.hppc.IntIntHashMap; +import com.carrotsearch.hppc.cursors.IntCursor; +import java.util.Arrays; + +/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */ +final class StateSet extends IntSet { + + private final IntIntHashMap inner; + private int hashCode; + private boolean changed; + private int[] arrayCache = new int[0]; + + StateSet(int capacity) { +inner = new IntIntHashMap(capacity); + } + + // Adds this state to the set + void incr(int num) { +if (inner.addTo(num, 1) == 1) { + changed = true; Review comment: +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily
zhaih commented on a change in pull request #163: URL: https://github.com/apache/lucene/pull/163#discussion_r644141734 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java ## @@ -676,7 +677,7 @@ public static Automaton determinize(Automaton a, int maxDeterminizedStates) { // a.writeDot("/l/la/lucene/core/detin.dot"); // Same initial values and state will always have the same hashCode -FrozenIntSet initialset = new FrozenIntSet(new int[] {0}, 683, 0); +FrozenIntSet initialset = new FrozenIntSet(new int[] {0}, BitMixer.mix(0) + 1, 0); Review comment: Just to keep the hash code the same with the one used in `StateSet`, 0 should be the hash code used for 0 length array I think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily
zhaih commented on a change in pull request #163: URL: https://github.com/apache/lucene/pull/163#discussion_r644138606 ## File path: lucene/core/build.gradle ## @@ -20,6 +20,8 @@ apply plugin: 'java-library' description = 'Lucene core library' dependencies { + implementation 'com.carrotsearch:hppc' Review comment: @bruno-roustant Thank you for the advice! Unfortunately I tried WormMap yesterday (with hppc 0.9.0.RC2) and I didn't see benefits from the adversarial test case. Just to educate myself, is removal also a fast operation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily
zhaih commented on a change in pull request #163: URL: https://github.com/apache/lucene/pull/163#discussion_r644138606 ## File path: lucene/core/build.gradle ## @@ -20,6 +20,8 @@ apply plugin: 'java-library' description = 'Lucene core library' dependencies { + implementation 'com.carrotsearch:hppc' Review comment: @bruno-roustant I tried WormMap yesterday (with hppc 0.9.0.RC2) and I didn't see benefits from the adversarial test case. Just to educate myself, is removal also a fast operation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9983) Stop sorting determinize powersets unnecessarily
[ https://issues.apache.org/jira/browse/LUCENE-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355852#comment-17355852 ] Haoyu Zhai commented on LUCENE-9983: +1 to have a set of regexps so that we can benchmark them, I'm also a little worried the PR might make the normal cases worse too. [~broustant] That is a good idea, I've tried to use a 128 size array as a map for first 128 states and it doesn't help the adversarial cases (I also pulled out some stats and found in adversarial cases states are actually much more than that number). But I think we might see some benefits from the normal cases once we have benchmark set up. > Stop sorting determinize powersets unnecessarily > > > Key: LUCENE-9983 > URL: https://issues.apache.org/jira/browse/LUCENE-9983 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Spinoff from LUCENE-9981. > Today, our {{Operations.determinize}} implementation builds powersets of all > subsets of NFA states that "belong" in the same determinized state, using > [this algorithm|https://en.wikipedia.org/wiki/Powerset_construction]. > To hold each powerset, we use a malleable {{SortedIntSet}} and periodically > freeze it to a {{FrozenIntSet}}, also sorted. We pay a high price to keep > these growing maps of int key, int value sorted by key, e.g. upgrading to a > {{TreeMap}} once the map is large enough (> 30 entries). > But I think sorting is entirely unnecessary here! Really all we need is the > ability to add/delete keys from the map, and hashCode / equals (by key only – > ignoring value!), and to freeze the map (a small optimization that we could > skip initially). We only use these maps to lookup in the (growing) > determinized automaton whether this powerset has already been seen. > Maybe we could simply poach the {{IntIntScatterMap}} implementation from > [HPPC|https://github.com/carrotsearch/hppc]? And then change its > {{hashCode}}/{{equals }}to only use keys (not values). > This change should be a big speedup for the kinds of (admittedly adversarial) > regexps we saw on LUCENE-9981. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9905) Revise approach to specifying NN algorithm
[ https://issues.apache.org/jira/browse/LUCENE-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355808#comment-17355808 ] ASF subversion and git services commented on LUCENE-9905: - Commit eecd1971fa748c2593e8a452484af5ba5d598915 in lucene's branch refs/heads/main from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=eecd197 ] LUCENE-9905: Allow Lucene90Codec to be configured with a per-field vector format (#164) Previously only AssertingCodec could handle a per-field vector format. This PR also strengthens the checks in TestPerFieldVectorFormat. > Revise approach to specifying NN algorithm > -- > > Key: LUCENE-9905 > URL: https://issues.apache.org/jira/browse/LUCENE-9905 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: main (9.0) >Reporter: Julie Tibshirani >Priority: Blocker > Time Spent: 2h 20m > Remaining Estimate: 0h > > In LUCENE-9322 we decided that the new vectors API shouldn’t assume a > particular nearest-neighbor search data structure and algorithm. This > flexibility is important since NN search is a developing area and we'd like > to be able to experiment and evolve the algorithm. Right now we only have one > algorithm (HNSW), but we want to maintain the ability to use another. > Currently the algorithm to use is specified through {{SearchStrategy}}, for > example {{SearchStrategy.EUCLIDEAN_HNSW}}. So a single format implementation > is expected to handle multiple algorithms. Instead we could have one format > implementation per algorithm. Our current implementation would be > HNSW-specific like {{HnswVectorFormat}}, and to experiment with another > algorithm you could create a new implementation like {{ClusterVectorFormat}}. > This would be better aligned with the codec framework, and help avoid > exposing algorithm details in the API. > A concrete proposal (note many of these names will change when LUCENE-9855 is > addressed): > # Rename {{Lucene90VectorFormat}} to {{Lucene90HnswVectorFormat}}. Also add > HNSW to name of {{Lucene90VectorWriter}} and {{Lucene90VectorReader}}. > # Remove references to HNSW in {{SearchStrategy}}, so there is just > {{SearchStrategy.EUCLIDEAN}}, etc. Rename {{SearchStrategy}} to something > like {{SimilarityFunction}}. > # Remove {{FieldType}} attributes related to HNSW parameters (maxConn and > beamWidth). Instead make these arguments to {{Lucene90HnswVectorFormat}}. > # Introduce {{PerFieldVectorFormat}} to allow a different NN approach or > parameters to be configured per-field \(?\) > One note: the current HNSW-based format includes logic for storing a numeric > vector per document, as well as constructing + storing a HNSW graph. When > adding another implementation, it’d be nice to be able to reuse logic for > reading/ writing numeric vectors. I don’t think we need to design for this > right now, but we can keep it in mind for the future? > This issue is based on a thread [~jpountz] started: > [https://mail-archives.apache.org/mod_mbox/lucene-dev/202103.mbox/%3CCAPsWd%2BOuQv5y2Vw39%3DXdOuqXGtDbM4qXx5-pmYiB1X4jPEdiFQ%40mail.gmail.com%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #164: LUCENE-9905: Allow Lucene90Codec to be configured with a per-field vector format
jtibshirani merged pull request #164: URL: https://github.com/apache/lucene/pull/164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #164: LUCENE-9905: Allow Lucene90Codec to be configured with a per-field vector format
jtibshirani commented on pull request #164: URL: https://github.com/apache/lucene/pull/164#issuecomment-853136935 Thanks for reviewing ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9987) JVM 11.0.6 crash while trying to read term vectors in CheckIndex?
[ https://issues.apache.org/jira/browse/LUCENE-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355764#comment-17355764 ] Michael McCandless commented on LUCENE-9987: OK good to know, thanks everyone! > JVM 11.0.6 crash while trying to read term vectors in CheckIndex? > - > > Key: LUCENE-9987 > URL: https://issues.apache.org/jira/browse/LUCENE-9987 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Attachments: hs_err_pid529873.log, hs_err_pid536810.log > > > [This build|https://jenkins.thetaphi.de/job/Lucene-main-Linux/30482/] failed > with JVM crash: > {noformat} > Current thread (0x7f68780d24e0): JavaThread > "TEST-TestExpressionSorts.testQueries-seed#[25E6600265A5C2D4]" > [_thread_in_Java, id=530195, stack(0x7f68ae1f2000,0x7f68ae2f3000)] > Stack: [0x7f68ae1f2000,0x7f68ae2f3000], sp=0x7f68ae2ef800, free > space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > J 4987% c2 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.positionIndex(IILorg/apache/lucene/util/LongValues;[I)[[I > (136 bytes) @ 0x7f69347af02e [0x7f69347ae480+0x0bae] > J 4952 c1 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (2695 bytes) @ 0x7f692db07754 [0x7f692db025c0+0x5194] > J 4895 c1 > org.apache.lucene.codecs.asserting.AssertingTermVectorsFormat$AssertingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (26 bytes) @ 0x7f692daace2c [0x7f692daacd20+0x010c] > j > org.apache.lucene.index.CheckIndex.testTermVectors(Lorg/apache/lucene/index/CodecReader;Ljava/io/PrintStream;ZZZ)Lorg/apache/lucene/index/CheckIndex$Status$TermVectorStatus;+96 > j > org.apache.lucene.index.CheckIndex.checkIndex(Ljava/util/List;)Lorg/apache/lucene/index/CheckIndex$Status;+1718 > j > org.apache.lucene.util.TestUtil.checkIndex(Lorg/apache/lucene/store/Directory;ZZLjava/io/ByteArrayOutputStream;)Lorg/apache/lucene/index/CheckIndex$Status;+67 > j org.apache.lucene.store.MockDirectoryWrapper.close()V+276 > j org.apache.lucene.expressions.TestExpressionSorts.tearDown()V+11 > v ~StubRoutines::call_stub > V [libjvm.so+0x887569] JavaCalls::call_helper(JavaValue*, methodHandle > const&, JavaCallArguments*, Thread*)+0x3b9 > V [libjvm.so+0xcb1a2d] invoke(InstanceKlass*, methodHandle const&, Handle, > bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone > .constprop.80]+0x43d > V [libjvm.so+0xcb2a62] Reflection::invoke_method(oopDesc*, Handle, > objArrayHandle, Thread*)+0x102 > V [libjvm.so+0x93b0cc] JVM_InvokeMethod+0xfc > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base@11.0.6 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base@11.0.6 > j > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base@11.0.6 > j > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 > java.base@11.0.6 > j > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)V+69 > j com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate()V+69 > j org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate()V+20 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] chlorochrule commented on pull request #160: LUCENE-9823: Fix not to rewrite boosted single term SynonymQuery
chlorochrule commented on pull request #160: URL: https://github.com/apache/lucene/pull/160#issuecomment-853065004 Thanks for reviewing, @jtibshirani ! I added `TestSynonymQuery#testRewrite` and fixed the same problem of `CombinedFieldQuery`. I may not fully understand the meaning of: > The same consideration affects CombinedFieldQuery in sandbox. If the fix of bc85a2a is incorrect, please explain what is the same consideration. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on pull request #2503: Re-introduce ant precommit github action in 8x branch
janhoy commented on pull request #2503: URL: https://github.com/apache/lucene-solr/pull/2503#issuecomment-853050898 Thanks, now let's see if this works... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy merged pull request #2503: Re-introduce ant precommit github action in 8x branch
janhoy merged pull request #2503: URL: https://github.com/apache/lucene-solr/pull/2503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9987) JVM 11.0.6 crash while trying to read term vectors in CheckIndex?
[ https://issues.apache.org/jira/browse/LUCENE-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355745#comment-17355745 ] Uwe Schindler commented on LUCENE-9987: --- Again: JDK 11 is unstable and was used to drill down the failures. JDK 16 is quite stable, but I would still not run on production with ZGC or Shennandoah. > JVM 11.0.6 crash while trying to read term vectors in CheckIndex? > - > > Key: LUCENE-9987 > URL: https://issues.apache.org/jira/browse/LUCENE-9987 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Attachments: hs_err_pid529873.log, hs_err_pid536810.log > > > [This build|https://jenkins.thetaphi.de/job/Lucene-main-Linux/30482/] failed > with JVM crash: > {noformat} > Current thread (0x7f68780d24e0): JavaThread > "TEST-TestExpressionSorts.testQueries-seed#[25E6600265A5C2D4]" > [_thread_in_Java, id=530195, stack(0x7f68ae1f2000,0x7f68ae2f3000)] > Stack: [0x7f68ae1f2000,0x7f68ae2f3000], sp=0x7f68ae2ef800, free > space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > J 4987% c2 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.positionIndex(IILorg/apache/lucene/util/LongValues;[I)[[I > (136 bytes) @ 0x7f69347af02e [0x7f69347ae480+0x0bae] > J 4952 c1 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (2695 bytes) @ 0x7f692db07754 [0x7f692db025c0+0x5194] > J 4895 c1 > org.apache.lucene.codecs.asserting.AssertingTermVectorsFormat$AssertingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (26 bytes) @ 0x7f692daace2c [0x7f692daacd20+0x010c] > j > org.apache.lucene.index.CheckIndex.testTermVectors(Lorg/apache/lucene/index/CodecReader;Ljava/io/PrintStream;ZZZ)Lorg/apache/lucene/index/CheckIndex$Status$TermVectorStatus;+96 > j > org.apache.lucene.index.CheckIndex.checkIndex(Ljava/util/List;)Lorg/apache/lucene/index/CheckIndex$Status;+1718 > j > org.apache.lucene.util.TestUtil.checkIndex(Lorg/apache/lucene/store/Directory;ZZLjava/io/ByteArrayOutputStream;)Lorg/apache/lucene/index/CheckIndex$Status;+67 > j org.apache.lucene.store.MockDirectoryWrapper.close()V+276 > j org.apache.lucene.expressions.TestExpressionSorts.tearDown()V+11 > v ~StubRoutines::call_stub > V [libjvm.so+0x887569] JavaCalls::call_helper(JavaValue*, methodHandle > const&, JavaCallArguments*, Thread*)+0x3b9 > V [libjvm.so+0xcb1a2d] invoke(InstanceKlass*, methodHandle const&, Handle, > bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone > .constprop.80]+0x43d > V [libjvm.so+0xcb2a62] Reflection::invoke_method(oopDesc*, Handle, > objArrayHandle, Thread*)+0x102 > V [libjvm.so+0x93b0cc] JVM_InvokeMethod+0xfc > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base@11.0.6 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base@11.0.6 > j > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base@11.0.6 > j > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 > java.base@11.0.6 > j > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)V+69 > j com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate()V+69 > j org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate()V+20 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9987) JVM 11.0.6 crash while trying to read term vectors in CheckIndex?
[ https://issues.apache.org/jira/browse/LUCENE-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355737#comment-17355737 ] Uwe Schindler commented on LUCENE-9987: --- ZGC in Java 11 is very early release. So we just know from this: Unuseable with stable Java. I have the variant running to allow Oracle to analyse this. It got better with later versions, so the checks helped. > JVM 11.0.6 crash while trying to read term vectors in CheckIndex? > - > > Key: LUCENE-9987 > URL: https://issues.apache.org/jira/browse/LUCENE-9987 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Attachments: hs_err_pid529873.log, hs_err_pid536810.log > > > [This build|https://jenkins.thetaphi.de/job/Lucene-main-Linux/30482/] failed > with JVM crash: > {noformat} > Current thread (0x7f68780d24e0): JavaThread > "TEST-TestExpressionSorts.testQueries-seed#[25E6600265A5C2D4]" > [_thread_in_Java, id=530195, stack(0x7f68ae1f2000,0x7f68ae2f3000)] > Stack: [0x7f68ae1f2000,0x7f68ae2f3000], sp=0x7f68ae2ef800, free > space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > J 4987% c2 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.positionIndex(IILorg/apache/lucene/util/LongValues;[I)[[I > (136 bytes) @ 0x7f69347af02e [0x7f69347ae480+0x0bae] > J 4952 c1 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (2695 bytes) @ 0x7f692db07754 [0x7f692db025c0+0x5194] > J 4895 c1 > org.apache.lucene.codecs.asserting.AssertingTermVectorsFormat$AssertingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (26 bytes) @ 0x7f692daace2c [0x7f692daacd20+0x010c] > j > org.apache.lucene.index.CheckIndex.testTermVectors(Lorg/apache/lucene/index/CodecReader;Ljava/io/PrintStream;ZZZ)Lorg/apache/lucene/index/CheckIndex$Status$TermVectorStatus;+96 > j > org.apache.lucene.index.CheckIndex.checkIndex(Ljava/util/List;)Lorg/apache/lucene/index/CheckIndex$Status;+1718 > j > org.apache.lucene.util.TestUtil.checkIndex(Lorg/apache/lucene/store/Directory;ZZLjava/io/ByteArrayOutputStream;)Lorg/apache/lucene/index/CheckIndex$Status;+67 > j org.apache.lucene.store.MockDirectoryWrapper.close()V+276 > j org.apache.lucene.expressions.TestExpressionSorts.tearDown()V+11 > v ~StubRoutines::call_stub > V [libjvm.so+0x887569] JavaCalls::call_helper(JavaValue*, methodHandle > const&, JavaCallArguments*, Thread*)+0x3b9 > V [libjvm.so+0xcb1a2d] invoke(InstanceKlass*, methodHandle const&, Handle, > bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone > .constprop.80]+0x43d > V [libjvm.so+0xcb2a62] Reflection::invoke_method(oopDesc*, Handle, > objArrayHandle, Thread*)+0x102 > V [libjvm.so+0x93b0cc] JVM_InvokeMethod+0xfc > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base@11.0.6 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base@11.0.6 > j > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base@11.0.6 > j > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 > java.base@11.0.6 > j > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)V+69 > j com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate()V+69 > j org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate()V+20 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9962) DrillSideways users should be able to opt-out of "drill down" facet collecting
[ https://issues.apache.org/jira/browse/LUCENE-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller resolved LUCENE-9962. - Fix Version/s: main (9.0) Resolution: Fixed > DrillSideways users should be able to opt-out of "drill down" facet collecting > -- > > Key: LUCENE-9962 > URL: https://issues.apache.org/jira/browse/LUCENE-9962 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: main (9.0) >Reporter: Greg Miller >Priority: Minor > Fix For: main (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The {{DrillSideways}} search methods will _always_ populate a > {{FacetsCollector}} for the "drill down" dimensions in addition to the "drill > sideways" dimensions. For most cases, this makes sense, but it would be nice > if users had a way to opt-out of this collection. It's possible a user may > not care to do any faceting on "drill down" dims, or may have custom needs > for facet collecting on the "drill downs." For the latter case, the user > might want to provide a {{Collector}}/{{CollectorManager}} that does facet > collecting with some custom logic (e.g., behind a > {{MultiCollector}}/{{MultiCollectorManager}}), in which case the population > of an additional {{FacetsCollector}} in {{DrillSideways}} is wasteful. > The {{DrillSidewaysScorer}} already supports a {{null}} > {{drillDownCollector}} gracefully, so this change should mostly just involve > creating a {{protected}} method in {{DrillSideways}} for the purpose of > creating a "drill down" {{FacetsCollector}} that users can override by > providing {{null}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9979) Implement negation of facet path in DrillDownQuery
[ https://issues.apache.org/jira/browse/LUCENE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller updated LUCENE-9979: Component/s: (was: core/search) modules/facet > Implement negation of facet path in DrillDownQuery > -- > > Key: LUCENE-9979 > URL: https://issues.apache.org/jira/browse/LUCENE-9979 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Nicola Buso >Priority: Major > Labels: faceted-search > Attachments: 0001-Implement-negate-facet-path-in-DrillDownQuery.patch > > > Suppose the following facet values tree: > Facet > - V1 > -- V1.1 > -- V1.2 > -- V1.3 > -- V1.4 > -- (not topK values) > - V2 > -- V2.1 > -- V2.2 > -- V2.3 > -- V2.4 > -- (not topK values) > Use case: > 1 - select V1 => all V1.x are selected > 2 - de-select V1.1 > The implementation of the negation of value V1.1 is missing in > DrillDownQuery, it would be nice to implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9944) Implement alternative drill sideways faceting with provided CollectorManager
[ https://issues.apache.org/jira/browse/LUCENE-9944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller resolved LUCENE-9944. - Fix Version/s: main (9.0) Resolution: Fixed > Implement alternative drill sideways faceting with provided CollectorManager > > > Key: LUCENE-9944 > URL: https://issues.apache.org/jira/browse/LUCENE-9944 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: main (9.0) >Reporter: Greg Miller >Priority: Minor > Fix For: main (9.0) > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Today, if a user of {{DrillSideways}} wants to provide their own > {{CollectorManager}} when invoking {{search}}, they get this alternate, > "concurrent" implementation that creates N copies of the provided > {{DrillDownQuery}} (where N is the number of drill-down dimensions) and runs > them all concurrently. This is a very different implementation than the one a > user would get if providing a {{Collector}} instead. Additionally, an > {{ExecutorService}} must be provided when constructing a {{DrillSideways}} > instance if the user wants to bring their own {{CollectorManager}} > (otherwise, they'll get an unfriendly NPE when calling {{search}}). > I propose adding an implementation to {{DrillSideways}} that will run the > "non-concurrent" algorithm in the case that a user wants to provide their own > {{CollectorManager}} but doesn't want to provide an {{ExecutorService}} (and > doesn't want the concurrent algorithm). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9944) Implement alternative drill sideways faceting with provided CollectorManager
[ https://issues.apache.org/jira/browse/LUCENE-9944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355718#comment-17355718 ] ASF subversion and git services commented on LUCENE-9944: - Commit 8b60641bcac14663a75f8efe5667c506347acda5 in lucene's branch refs/heads/main from Greg Miller [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8b60641 ] LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSideways approach). (#142) > Implement alternative drill sideways faceting with provided CollectorManager > > > Key: LUCENE-9944 > URL: https://issues.apache.org/jira/browse/LUCENE-9944 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: main (9.0) >Reporter: Greg Miller >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Today, if a user of {{DrillSideways}} wants to provide their own > {{CollectorManager}} when invoking {{search}}, they get this alternate, > "concurrent" implementation that creates N copies of the provided > {{DrillDownQuery}} (where N is the number of drill-down dimensions) and runs > them all concurrently. This is a very different implementation than the one a > user would get if providing a {{Collector}} instead. Additionally, an > {{ExecutorService}} must be provided when constructing a {{DrillSideways}} > instance if the user wants to bring their own {{CollectorManager}} > (otherwise, they'll get an unfriendly NPE when calling {{search}}). > I propose adding an implementation to {{DrillSideways}} that will run the > "non-concurrent" algorithm in the case that a user wants to provide their own > {{CollectorManager}} but doesn't want to provide an {{ExecutorService}} (and > doesn't want the concurrent algorithm). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller merged pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSideways approach).
gsmiller merged pull request #142: URL: https://github.com/apache/lucene/pull/142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily
dweiss commented on a change in pull request #163: URL: https://github.com/apache/lucene/pull/163#discussion_r643959663 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.automaton; + +import com.carrotsearch.hppc.BitMixer; +import com.carrotsearch.hppc.IntIntHashMap; +import com.carrotsearch.hppc.cursors.IntCursor; +import java.util.Arrays; + +/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */ +final class StateSet extends IntSet { + + private final IntIntHashMap inner; + private int hashCode; + private boolean changed; + private int[] arrayCache = new int[0]; + + StateSet(int capacity) { +inner = new IntIntHashMap(capacity); + } + + // Adds this state to the set + void incr(int num) { +if (inner.addTo(num, 1) == 1) { + changed = true; +} + } + + // Removes this state from the set, if count decrs to 0 + void decr(int num) { +assert inner.containsKey(num); +int keyIndex = inner.indexOf(num); +int count = inner.indexGet(keyIndex) - 1; +if (count == 0) { + inner.remove(num); + changed = true; +} else { + inner.indexReplace(keyIndex, count); +} + } + + void computeHash() { +if (changed == false) { + return; +} +hashCode = inner.size(); +for (IntCursor cursor : inner.keys()) { + hashCode += BitMixer.mix(cursor.value); +} + } + + /** + * Create a snapshot of this int set associated with a given state. The snapshot will not retain + * any frequency information about the elements of this set, only existence. + * + * It is the caller's responsibility to ensure that the hashCode and data are up to date via + * the {@link #computeHash()} method before calling this method. + * + * @param state the state to associate with the frozen set. + * @return A new FrozenIntSet with the same values as this set. + */ + FrozenIntSet freeze(int state) { +if (changed == false) { + assert arrayCache != null; Review comment: bq. why hasn't this been "published in a public revision" yet :) Don't know. Life gets in the way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSidew
gsmiller commented on a change in pull request #142: URL: https://github.com/apache/lucene/pull/142#discussion_r643959302 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java ## @@ -131,8 +185,24 @@ public boolean isCacheable(LeafReaderContext ctx) { public BulkScorer bulkScorer(LeafReaderContext context) throws IOException { Scorer baseScorer = baseWeight.scorer(context); +int drillDownCount = drillDowns.length; + +// TODO: If the caller provided a FacetsCollectorManager instead of directly providing +// FacetsCollectors, we assume this will be invoked during a concurrent search. Ideally +// we'd only create new FacetsCollectors for each "leaf slice" that will be concurrently +// searched, as opposed to each actual leaf, but we don't have that information at this +// level so we always provide a new FacetsCollector. There might be a better way to +// refactor this logic. Review comment: Thanks for this suggestion! I realized I never responded. I suspect it wouldn't make all that much of a practical difference if `DrillSidewaysQuery` creates new `FacetsCollector`s for each `BulkScorer` it instantiates (as this implementation currently does), but it feels just a bit "cleaner" if it only created new FCs for each `LeafSlice` since that's the granularity of concurrency within `IndexSearcher`. It would take a bit of refactoring to get this working since the `FacetsCollector`s are a bit of a (package-private) implementation detail of `DrillSidewaysQuery` right now, but I'm going to try to experiment with this a bit more to see what it looks like. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9987) JVM 11.0.6 crash while trying to read term vectors in CheckIndex?
[ https://issues.apache.org/jira/browse/LUCENE-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355714#comment-17355714 ] Dawid Weiss commented on LUCENE-9987: - See build failure history here: https://jenkins.thetaphi.de/job/Lucene-main-Linux/ > JVM 11.0.6 crash while trying to read term vectors in CheckIndex? > - > > Key: LUCENE-9987 > URL: https://issues.apache.org/jira/browse/LUCENE-9987 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Attachments: hs_err_pid529873.log, hs_err_pid536810.log > > > [This build|https://jenkins.thetaphi.de/job/Lucene-main-Linux/30482/] failed > with JVM crash: > {noformat} > Current thread (0x7f68780d24e0): JavaThread > "TEST-TestExpressionSorts.testQueries-seed#[25E6600265A5C2D4]" > [_thread_in_Java, id=530195, stack(0x7f68ae1f2000,0x7f68ae2f3000)] > Stack: [0x7f68ae1f2000,0x7f68ae2f3000], sp=0x7f68ae2ef800, free > space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > J 4987% c2 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.positionIndex(IILorg/apache/lucene/util/LongValues;[I)[[I > (136 bytes) @ 0x7f69347af02e [0x7f69347ae480+0x0bae] > J 4952 c1 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (2695 bytes) @ 0x7f692db07754 [0x7f692db025c0+0x5194] > J 4895 c1 > org.apache.lucene.codecs.asserting.AssertingTermVectorsFormat$AssertingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (26 bytes) @ 0x7f692daace2c [0x7f692daacd20+0x010c] > j > org.apache.lucene.index.CheckIndex.testTermVectors(Lorg/apache/lucene/index/CodecReader;Ljava/io/PrintStream;ZZZ)Lorg/apache/lucene/index/CheckIndex$Status$TermVectorStatus;+96 > j > org.apache.lucene.index.CheckIndex.checkIndex(Ljava/util/List;)Lorg/apache/lucene/index/CheckIndex$Status;+1718 > j > org.apache.lucene.util.TestUtil.checkIndex(Lorg/apache/lucene/store/Directory;ZZLjava/io/ByteArrayOutputStream;)Lorg/apache/lucene/index/CheckIndex$Status;+67 > j org.apache.lucene.store.MockDirectoryWrapper.close()V+276 > j org.apache.lucene.expressions.TestExpressionSorts.tearDown()V+11 > v ~StubRoutines::call_stub > V [libjvm.so+0x887569] JavaCalls::call_helper(JavaValue*, methodHandle > const&, JavaCallArguments*, Thread*)+0x3b9 > V [libjvm.so+0xcb1a2d] invoke(InstanceKlass*, methodHandle const&, Handle, > bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone > .constprop.80]+0x43d > V [libjvm.so+0xcb2a62] Reflection::invoke_method(oopDesc*, Handle, > objArrayHandle, Thread*)+0x102 > V [libjvm.so+0x93b0cc] JVM_InvokeMethod+0xfc > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base@11.0.6 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base@11.0.6 > j > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base@11.0.6 > j > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 > java.base@11.0.6 > j > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)V+69 > j com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate()V+69 > j org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate()V+20 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9987) JVM 11.0.6 crash while trying to read term vectors in CheckIndex?
[ https://issues.apache.org/jira/browse/LUCENE-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355713#comment-17355713 ] Dawid Weiss commented on LUCENE-9987: - This is happening all the time on jenkins. Always with ZGC. > JVM 11.0.6 crash while trying to read term vectors in CheckIndex? > - > > Key: LUCENE-9987 > URL: https://issues.apache.org/jira/browse/LUCENE-9987 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Attachments: hs_err_pid529873.log, hs_err_pid536810.log > > > [This build|https://jenkins.thetaphi.de/job/Lucene-main-Linux/30482/] failed > with JVM crash: > {noformat} > Current thread (0x7f68780d24e0): JavaThread > "TEST-TestExpressionSorts.testQueries-seed#[25E6600265A5C2D4]" > [_thread_in_Java, id=530195, stack(0x7f68ae1f2000,0x7f68ae2f3000)] > Stack: [0x7f68ae1f2000,0x7f68ae2f3000], sp=0x7f68ae2ef800, free > space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > J 4987% c2 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.positionIndex(IILorg/apache/lucene/util/LongValues;[I)[[I > (136 bytes) @ 0x7f69347af02e [0x7f69347ae480+0x0bae] > J 4952 c1 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (2695 bytes) @ 0x7f692db07754 [0x7f692db025c0+0x5194] > J 4895 c1 > org.apache.lucene.codecs.asserting.AssertingTermVectorsFormat$AssertingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (26 bytes) @ 0x7f692daace2c [0x7f692daacd20+0x010c] > j > org.apache.lucene.index.CheckIndex.testTermVectors(Lorg/apache/lucene/index/CodecReader;Ljava/io/PrintStream;ZZZ)Lorg/apache/lucene/index/CheckIndex$Status$TermVectorStatus;+96 > j > org.apache.lucene.index.CheckIndex.checkIndex(Ljava/util/List;)Lorg/apache/lucene/index/CheckIndex$Status;+1718 > j > org.apache.lucene.util.TestUtil.checkIndex(Lorg/apache/lucene/store/Directory;ZZLjava/io/ByteArrayOutputStream;)Lorg/apache/lucene/index/CheckIndex$Status;+67 > j org.apache.lucene.store.MockDirectoryWrapper.close()V+276 > j org.apache.lucene.expressions.TestExpressionSorts.tearDown()V+11 > v ~StubRoutines::call_stub > V [libjvm.so+0x887569] JavaCalls::call_helper(JavaValue*, methodHandle > const&, JavaCallArguments*, Thread*)+0x3b9 > V [libjvm.so+0xcb1a2d] invoke(InstanceKlass*, methodHandle const&, Handle, > bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone > .constprop.80]+0x43d > V [libjvm.so+0xcb2a62] Reflection::invoke_method(oopDesc*, Handle, > objArrayHandle, Thread*)+0x102 > V [libjvm.so+0x93b0cc] JVM_InvokeMethod+0xfc > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base@11.0.6 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base@11.0.6 > j > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base@11.0.6 > j > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 > java.base@11.0.6 > j > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)V+69 > j com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate()V+69 > j org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate()V+20 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSidew
gsmiller commented on a change in pull request #142: URL: https://github.com/apache/lucene/pull/142#discussion_r643949418 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java ## @@ -233,11 +251,32 @@ public ScoreMode scoreMode() { } searcher.search(dsq, hitCollector); +FacetsCollector drillDownCollector; +if (drillDownCollectorManager != null) { + drillDownCollector = drillDownCollectorManager.reduce(dsq.managedDrillDownCollectors); +} else { + drillDownCollector = null; +} + +FacetsCollector[] drillSidewaysCollectors = new FacetsCollector[numDims]; +int numSlices = dsq.managedDrillSidewaysCollectors.size(); + +for (int dim = 0; dim < numDims; dim++) { + List facetsCollectorsForDim = new ArrayList<>(numSlices); + + for (int slice = 0; slice < numSlices; slice++) { + facetsCollectorsForDim.add(dsq.managedDrillSidewaysCollectors.get(slice)[dim]); + } + + drillSidewaysCollectors[dim] = + drillSidewaysFacetsCollectorManagers[dim].reduce(facetsCollectorsForDim); +} + return new DrillSidewaysResult( buildFacetsResult( drillDownCollector, drillSidewaysCollectors, -drillDownDims.keySet().toArray(new String[drillDownDims.size()])), +drillDownDims.keySet().toArray(new String[0])), Review comment: I agree; it's very strange! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9987) JVM 11.0.6 crash while trying to read term vectors in CheckIndex?
[ https://issues.apache.org/jira/browse/LUCENE-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355694#comment-17355694 ] Robert Muir commented on LUCENE-9987: - ZAC though. Workaround: don't use ZGC > JVM 11.0.6 crash while trying to read term vectors in CheckIndex? > - > > Key: LUCENE-9987 > URL: https://issues.apache.org/jira/browse/LUCENE-9987 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Attachments: hs_err_pid529873.log, hs_err_pid536810.log > > > [This build|https://jenkins.thetaphi.de/job/Lucene-main-Linux/30482/] failed > with JVM crash: > {noformat} > Current thread (0x7f68780d24e0): JavaThread > "TEST-TestExpressionSorts.testQueries-seed#[25E6600265A5C2D4]" > [_thread_in_Java, id=530195, stack(0x7f68ae1f2000,0x7f68ae2f3000)] > Stack: [0x7f68ae1f2000,0x7f68ae2f3000], sp=0x7f68ae2ef800, free > space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > J 4987% c2 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.positionIndex(IILorg/apache/lucene/util/LongValues;[I)[[I > (136 bytes) @ 0x7f69347af02e [0x7f69347ae480+0x0bae] > J 4952 c1 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (2695 bytes) @ 0x7f692db07754 [0x7f692db025c0+0x5194] > J 4895 c1 > org.apache.lucene.codecs.asserting.AssertingTermVectorsFormat$AssertingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (26 bytes) @ 0x7f692daace2c [0x7f692daacd20+0x010c] > j > org.apache.lucene.index.CheckIndex.testTermVectors(Lorg/apache/lucene/index/CodecReader;Ljava/io/PrintStream;ZZZ)Lorg/apache/lucene/index/CheckIndex$Status$TermVectorStatus;+96 > j > org.apache.lucene.index.CheckIndex.checkIndex(Ljava/util/List;)Lorg/apache/lucene/index/CheckIndex$Status;+1718 > j > org.apache.lucene.util.TestUtil.checkIndex(Lorg/apache/lucene/store/Directory;ZZLjava/io/ByteArrayOutputStream;)Lorg/apache/lucene/index/CheckIndex$Status;+67 > j org.apache.lucene.store.MockDirectoryWrapper.close()V+276 > j org.apache.lucene.expressions.TestExpressionSorts.tearDown()V+11 > v ~StubRoutines::call_stub > V [libjvm.so+0x887569] JavaCalls::call_helper(JavaValue*, methodHandle > const&, JavaCallArguments*, Thread*)+0x3b9 > V [libjvm.so+0xcb1a2d] invoke(InstanceKlass*, methodHandle const&, Handle, > bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone > .constprop.80]+0x43d > V [libjvm.so+0xcb2a62] Reflection::invoke_method(oopDesc*, Handle, > objArrayHandle, Thread*)+0x102 > V [libjvm.so+0x93b0cc] JVM_InvokeMethod+0xfc > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base@11.0.6 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base@11.0.6 > j > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base@11.0.6 > j > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 > java.base@11.0.6 > j > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)V+69 > j com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate()V+69 > j org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate()V+20 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a change in pull request #157: LUCENE-9963 Fix issue with FlattenGraphFilter throwing exceptions from holes
msokolov commented on a change in pull request #157: URL: https://github.com/apache/lucene/pull/157#discussion_r643925065 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -273,6 +260,32 @@ private boolean releaseBufferedToken() { return false; } + /** + * Free output nodes before the given outputs. Free inputs nodes before the minimum input node for + * this output. + * + * @param output target output node + */ + private void freeBefore(OutputNode output) { +// We've released all of the tokens that end at the current output, +// so free all output nodes before this. Input nodes are more complex. +// The second shingled tokens with alternate paths can appear later in the output graph than Review comment: "than than" ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -273,6 +260,32 @@ private boolean releaseBufferedToken() { return false; } + /** + * Free output nodes before the given outputs. Free inputs nodes before the minimum input node for Review comment: "Free input nodes" I think ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -310,8 +323,11 @@ public boolean incrementToken() throws IOException { int outMax = outputNodes.getMaxPos(); // If positionIncrement > 1 this node should be at the end of the flattened graph if (positionIncrement > 1 && src.outputNode < outMax) { -// We crossed a gap that we need to account for. This node exists from a length >1 path -// jumping to get here. +// If there was a hole at the end of an alternate path then the input and output nodes Review comment: minor, but: if you use block comments, then our autoformatter won't apply its annoying line-break rules :) ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -273,6 +260,32 @@ private boolean releaseBufferedToken() { return false; } + /** + * Free output nodes before the given outputs. Free inputs nodes before the minimum input node for + * this output. + * + * @param output target output node + */ + private void freeBefore(OutputNode output) { +// We've released all of the tokens that end at the current output, +// so free all output nodes before this. Input nodes are more complex. +// The second shingled tokens with alternate paths can appear later in the output graph than +// than some of their alternate path tokens. +// Because of this case we can only free from the minimum because the minimum node will have +// come from before the second shingled token. +// This means we have to hold onto input nodes who's tokens get stacked on previous nodes until Review comment: "whose" ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -392,12 +408,20 @@ private OutputNode recoverFromHole(InputNode src, int startOffset) { int maxOutIndex = outputNodes.getMaxPos(); OutputNode outSrc = outputNodes.get(maxOutIndex); // There are two types of holes, neighbor holes and consumed holes. A neighbor hole is between -// two tokens. A consumed hole is -// between the start a long token and the next token that is "under" the path of the long token. -// A consumed hole should have the outputsrc node of the short token be the out dest +// two tokens, it looks like a->*hole*->b. +// A consumed hole is between the start a long token and the next token that is "under" the path +// of the long token. +// It looks like :___abc__ +// || Review comment: Have you run the formatter? I think it might mess these pictures up unless you use block comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9976) WANDScorer assertion error in ensureConsistent
[ https://issues.apache.org/jira/browse/LUCENE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355687#comment-17355687 ] Michael McCandless commented on LUCENE-9976: Spooky that the seed only sometimes reproduces! Are there threads involved in these tests? if not, maybe there some sneaky "randomness bug" in the test, though I thought we buttoned down the forbidden APIs here. > WANDScorer assertion error in ensureConsistent > -- > > Key: LUCENE-9976 > URL: https://issues.apache.org/jira/browse/LUCENE-9976 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Priority: Major > > Build fails and is reproducible: > https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/283/console > {code} > ./gradlew test --tests TestExpressionSorts.testQueries > -Dtests.seed=FF571CE915A0955 -Dtests.multiplier=2 -Dtests.nightly=true > -Dtests.slow=true -Dtests.asserts=true -p lucene/expressions/ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9986) Create a simple "real world" regexp benchmark
[ https://issues.apache.org/jira/browse/LUCENE-9986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355684#comment-17355684 ] Michael Sokolov commented on LUCENE-9986: - [This SO post|https://stackoverflow.com/questions/15819919/where-can-i-find-unit-tests-for-regular-expressions-in-multiple-languages] links to many test suites in various open source projects. Not sure which would be best/best licensed for copying here? > Create a simple "real world" regexp benchmark > - > > Key: LUCENE-9986 > URL: https://issues.apache.org/jira/browse/LUCENE-9986 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > For issues like LUCENE-9983, where we are struggling to decide which > low-level optimizations to make for our (complicated!) {{determinize}} > method, it would really help to have a large, real-world corpus of regexps to > evaluate performance metrics of our automata operations, like CPU and HEAP > required to parse the regexp and determinize. > Does anyone know of such an existing, hopefully compatibly licensed, corpus? > Probably we would add these benchmarks to {{luceneutil}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9987) JVM 11.0.6 crash while trying to read term vectors in CheckIndex?
[ https://issues.apache.org/jira/browse/LUCENE-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-9987: --- Attachment: hs_err_pid536810.log > JVM 11.0.6 crash while trying to read term vectors in CheckIndex? > - > > Key: LUCENE-9987 > URL: https://issues.apache.org/jira/browse/LUCENE-9987 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Attachments: hs_err_pid529873.log, hs_err_pid536810.log > > > [This build|https://jenkins.thetaphi.de/job/Lucene-main-Linux/30482/] failed > with JVM crash: > {noformat} > Current thread (0x7f68780d24e0): JavaThread > "TEST-TestExpressionSorts.testQueries-seed#[25E6600265A5C2D4]" > [_thread_in_Java, id=530195, stack(0x7f68ae1f2000,0x7f68ae2f3000)] > Stack: [0x7f68ae1f2000,0x7f68ae2f3000], sp=0x7f68ae2ef800, free > space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > J 4987% c2 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.positionIndex(IILorg/apache/lucene/util/LongValues;[I)[[I > (136 bytes) @ 0x7f69347af02e [0x7f69347ae480+0x0bae] > J 4952 c1 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (2695 bytes) @ 0x7f692db07754 [0x7f692db025c0+0x5194] > J 4895 c1 > org.apache.lucene.codecs.asserting.AssertingTermVectorsFormat$AssertingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (26 bytes) @ 0x7f692daace2c [0x7f692daacd20+0x010c] > j > org.apache.lucene.index.CheckIndex.testTermVectors(Lorg/apache/lucene/index/CodecReader;Ljava/io/PrintStream;ZZZ)Lorg/apache/lucene/index/CheckIndex$Status$TermVectorStatus;+96 > j > org.apache.lucene.index.CheckIndex.checkIndex(Ljava/util/List;)Lorg/apache/lucene/index/CheckIndex$Status;+1718 > j > org.apache.lucene.util.TestUtil.checkIndex(Lorg/apache/lucene/store/Directory;ZZLjava/io/ByteArrayOutputStream;)Lorg/apache/lucene/index/CheckIndex$Status;+67 > j org.apache.lucene.store.MockDirectoryWrapper.close()V+276 > j org.apache.lucene.expressions.TestExpressionSorts.tearDown()V+11 > v ~StubRoutines::call_stub > V [libjvm.so+0x887569] JavaCalls::call_helper(JavaValue*, methodHandle > const&, JavaCallArguments*, Thread*)+0x3b9 > V [libjvm.so+0xcb1a2d] invoke(InstanceKlass*, methodHandle const&, Handle, > bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone > .constprop.80]+0x43d > V [libjvm.so+0xcb2a62] Reflection::invoke_method(oopDesc*, Handle, > objArrayHandle, Thread*)+0x102 > V [libjvm.so+0x93b0cc] JVM_InvokeMethod+0xfc > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base@11.0.6 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base@11.0.6 > j > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base@11.0.6 > j > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 > java.base@11.0.6 > j > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)V+69 > j com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate()V+69 > j org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate()V+20 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9987) JVM 11.0.6 crash while trying to read term vectors in CheckIndex?
[ https://issues.apache.org/jira/browse/LUCENE-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-9987: --- Attachment: hs_err_pid529873.log > JVM 11.0.6 crash while trying to read term vectors in CheckIndex? > - > > Key: LUCENE-9987 > URL: https://issues.apache.org/jira/browse/LUCENE-9987 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Attachments: hs_err_pid529873.log, hs_err_pid536810.log > > > [This build|https://jenkins.thetaphi.de/job/Lucene-main-Linux/30482/] failed > with JVM crash: > {noformat} > Current thread (0x7f68780d24e0): JavaThread > "TEST-TestExpressionSorts.testQueries-seed#[25E6600265A5C2D4]" > [_thread_in_Java, id=530195, stack(0x7f68ae1f2000,0x7f68ae2f3000)] > Stack: [0x7f68ae1f2000,0x7f68ae2f3000], sp=0x7f68ae2ef800, free > space=1014k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > J 4987% c2 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.positionIndex(IILorg/apache/lucene/util/LongValues;[I)[[I > (136 bytes) @ 0x7f69347af02e [0x7f69347ae480+0x0bae] > J 4952 c1 > org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (2695 bytes) @ 0x7f692db07754 [0x7f692db025c0+0x5194] > J 4895 c1 > org.apache.lucene.codecs.asserting.AssertingTermVectorsFormat$AssertingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; > (26 bytes) @ 0x7f692daace2c [0x7f692daacd20+0x010c] > j > org.apache.lucene.index.CheckIndex.testTermVectors(Lorg/apache/lucene/index/CodecReader;Ljava/io/PrintStream;ZZZ)Lorg/apache/lucene/index/CheckIndex$Status$TermVectorStatus;+96 > j > org.apache.lucene.index.CheckIndex.checkIndex(Ljava/util/List;)Lorg/apache/lucene/index/CheckIndex$Status;+1718 > j > org.apache.lucene.util.TestUtil.checkIndex(Lorg/apache/lucene/store/Directory;ZZLjava/io/ByteArrayOutputStream;)Lorg/apache/lucene/index/CheckIndex$Status;+67 > j org.apache.lucene.store.MockDirectoryWrapper.close()V+276 > j org.apache.lucene.expressions.TestExpressionSorts.tearDown()V+11 > v ~StubRoutines::call_stub > V [libjvm.so+0x887569] JavaCalls::call_helper(JavaValue*, methodHandle > const&, JavaCallArguments*, Thread*)+0x3b9 > V [libjvm.so+0xcb1a2d] invoke(InstanceKlass*, methodHandle const&, Handle, > bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone > .constprop.80]+0x43d > V [libjvm.so+0xcb2a62] Reflection::invoke_method(oopDesc*, Handle, > objArrayHandle, Thread*)+0x102 > V [libjvm.so+0x93b0cc] JVM_InvokeMethod+0xfc > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 > java.base@11.0.6 > j > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 > java.base@11.0.6 > j > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 > java.base@11.0.6 > j > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 > java.base@11.0.6 > j > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)V+69 > j com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate()V+69 > j org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate()V+20 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9987) JVM 11.0.6 crash while trying to read term vectors in CheckIndex?
Michael McCandless created LUCENE-9987: -- Summary: JVM 11.0.6 crash while trying to read term vectors in CheckIndex? Key: LUCENE-9987 URL: https://issues.apache.org/jira/browse/LUCENE-9987 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless [This build|https://jenkins.thetaphi.de/job/Lucene-main-Linux/30482/] failed with JVM crash: {noformat} Current thread (0x7f68780d24e0): JavaThread "TEST-TestExpressionSorts.testQueries-seed#[25E6600265A5C2D4]" [_thread_in_Java, id=530195, stack(0x7f68ae1f2000,0x7f68ae2f3000)] Stack: [0x7f68ae1f2000,0x7f68ae2f3000], sp=0x7f68ae2ef800, free space=1014k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) J 4987% c2 org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.positionIndex(IILorg/apache/lucene/util/LongValues;[I)[[I (136 bytes) @ 0x7f69347af02e [0x7f69347ae480+0x0bae] J 4952 c1 org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; (2695 bytes) @ 0x7f692db07754 [0x7f692db025c0+0x5194] J 4895 c1 org.apache.lucene.codecs.asserting.AssertingTermVectorsFormat$AssertingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; (26 bytes) @ 0x7f692daace2c [0x7f692daacd20+0x010c] j org.apache.lucene.index.CheckIndex.testTermVectors(Lorg/apache/lucene/index/CodecReader;Ljava/io/PrintStream;ZZZ)Lorg/apache/lucene/index/CheckIndex$Status$TermVectorStatus;+96 j org.apache.lucene.index.CheckIndex.checkIndex(Ljava/util/List;)Lorg/apache/lucene/index/CheckIndex$Status;+1718 j org.apache.lucene.util.TestUtil.checkIndex(Lorg/apache/lucene/store/Directory;ZZLjava/io/ByteArrayOutputStream;)Lorg/apache/lucene/index/CheckIndex$Status;+67 j org.apache.lucene.store.MockDirectoryWrapper.close()V+276 j org.apache.lucene.expressions.TestExpressionSorts.tearDown()V+11 v ~StubRoutines::call_stub V [libjvm.so+0x887569] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3b9 V [libjvm.so+0xcb1a2d] invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*) [clone .constprop.80]+0x43d V [libjvm.so+0xcb2a62] Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, Thread*)+0x102 V [libjvm.so+0x93b0cc] JVM_InvokeMethod+0xfc j jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0 java.base@11.0.6 j jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100 java.base@11.0.6 j jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6 java.base@11.0.6 j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+59 java.base@11.0.6 j com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)V+69 j com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate()V+69 j org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate()V+20 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a change in pull request #164: LUCENE-9905: Allow Lucene90Codec to be configured with a per-field vector format
msokolov commented on a change in pull request #164: URL: https://github.com/apache/lucene/pull/164#discussion_r643912033 ## File path: lucene/core/src/test/org/apache/lucene/codecs/perfield/TestPerFieldVectorFormat.java ## @@ -52,53 +52,54 @@ protected Codec getCodec() { return codec; } - // just a simple trivial test public void testTwoFieldsTwoFormats() throws IOException { Analyzer analyzer = new MockAnalyzer(random()); try (Directory directory = newDirectory()) { // we don't use RandomIndexWriter because it might add more values than we expect 1 IndexWriterConfig iwc = newIndexWriterConfig(analyzer); - final VectorFormat fast = TestUtil.getDefaultVectorFormat(); - final VectorFormat slow = VectorFormat.forName("Asserting"); + VectorFormat defaultFormat = TestUtil.getDefaultVectorFormat(); + VectorFormat emptyFormat = VectorFormat.EMPTY; iwc.setCodec( new AssertingCodec() { @Override public VectorFormat getVectorFormatForField(String field) { - if ("v1".equals(field)) { -return fast; + if ("empty".equals(field)) { +return emptyFormat; } else { -return slow; +return defaultFormat; } } }); + try (IndexWriter iwriter = new IndexWriter(directory, iwc)) { Document doc = new Document(); doc.add(newTextField("id", "1", Field.Store.YES)); -doc.add(new VectorField("v1", new float[] {1, 2, 3})); +doc.add(new VectorField("field", new float[] {1, 2, 3})); iwriter.addDocument(doc); -doc = new Document(); +iwriter.commit(); + +// Check that we use the empty vector format, which doesn't support writes +doc.clear(); doc.add(newTextField("id", "2", Field.Store.YES)); -doc.add(new VectorField("v2", new float[] {4, 5, 6})); -iwriter.addDocument(doc); +doc.add(new VectorField("empty", new float[] {4, 5, 6})); +expectThrows( +RuntimeException.class, +() -> { + iwriter.addDocument(doc); + iwriter.commit(); +}); } - // Now search the index: + // Now search for the field that was successfully indexed try (IndexReader ireader = DirectoryReader.open(directory)) { TopDocs hits1 = ireader .leaves() .get(0) .reader() -.searchNearestVectors("v1", new float[] {1, 2, 3}, 10, 1); +.searchNearestVectors("field", new float[] {1, 2, 3}, 10, 1); assertEquals(1, hits1.scoreDocs.length); -TopDocs hits2 = Review comment: weird, what was this doing here :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a change in pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSidew
msokolov commented on a change in pull request #142: URL: https://github.com/apache/lucene/pull/142#discussion_r643904768 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java ## @@ -233,11 +251,32 @@ public ScoreMode scoreMode() { } searcher.search(dsq, hitCollector); +FacetsCollector drillDownCollector; +if (drillDownCollectorManager != null) { + drillDownCollector = drillDownCollectorManager.reduce(dsq.managedDrillDownCollectors); +} else { + drillDownCollector = null; +} + +FacetsCollector[] drillSidewaysCollectors = new FacetsCollector[numDims]; +int numSlices = dsq.managedDrillSidewaysCollectors.size(); + +for (int dim = 0; dim < numDims; dim++) { + List facetsCollectorsForDim = new ArrayList<>(numSlices); + + for (int slice = 0; slice < numSlices; slice++) { + facetsCollectorsForDim.add(dsq.managedDrillSidewaysCollectors.get(slice)[dim]); + } + + drillSidewaysCollectors[dim] = + drillSidewaysFacetsCollectorManagers[dim].reduce(facetsCollectorsForDim); +} + return new DrillSidewaysResult( buildFacetsResult( drillDownCollector, drillSidewaysCollectors, -drillDownDims.keySet().toArray(new String[drillDownDims.size()])), +drillDownDims.keySet().toArray(new String[0])), Review comment: This is the weirdest Java idiom. I've never understood why creating a useless zero-length array is the accepted way to handle type-safety?? Yet apparently it is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily
mikemccand commented on a change in pull request #163: URL: https://github.com/apache/lucene/pull/163#discussion_r643866949 ## File path: lucene/core/build.gradle ## @@ -20,6 +20,8 @@ apply plugin: 'java-library' description = 'Lucene core library' dependencies { + implementation 'com.carrotsearch:hppc' Review comment: > @bruno-roustant came up with some clever new hashing improvements recently - these are not published as a public revision but you can get them from the repository and compile it locally. See this for details: > > https://issues.carrot2.org/browse/HPPC-176 Whoa, this looks new "worm" hashing looks great! That is frequently a great tradeoff (slower put, faster get)? Hmm why hasn't this been "published in a public revision" yet :) ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.automaton; + +import com.carrotsearch.hppc.BitMixer; +import com.carrotsearch.hppc.IntIntHashMap; +import com.carrotsearch.hppc.cursors.IntCursor; +import java.util.Arrays; + +/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */ +final class StateSet extends IntSet { + + private final IntIntHashMap inner; + private int hashCode; + private boolean changed; + private int[] arrayCache = new int[0]; + + StateSet(int capacity) { +inner = new IntIntHashMap(capacity); + } + + // Adds this state to the set + void incr(int num) { +if (inner.addTo(num, 1) == 1) { + changed = true; +} + } + + // Removes this state from the set, if count decrs to 0 + void decr(int num) { +assert inner.containsKey(num); +int keyIndex = inner.indexOf(num); +int count = inner.indexGet(keyIndex) - 1; +if (count == 0) { + inner.remove(num); + changed = true; +} else { + inner.indexReplace(keyIndex, count); +} + } + + void computeHash() { +if (changed == false) { + return; +} +hashCode = inner.size(); +for (IntCursor cursor : inner.keys()) { + hashCode += BitMixer.mix(cursor.value); +} + } + + /** + * Create a snapshot of this int set associated with a given state. The snapshot will not retain + * any frequency information about the elements of this set, only existence. + * + * It is the caller's responsibility to ensure that the hashCode and data are up to date via + * the {@link #computeHash()} method before calling this method. + * + * @param state the state to associate with the frozen set. + * @return A new FrozenIntSet with the same values as this set. + */ + FrozenIntSet freeze(int state) { +if (changed == false) { + assert arrayCache != null; Review comment: Hmm do we actually fall inside this `if`? I would think we shouldn't ever hit this -- we should't call `freeze` unless something had in fact changed? Or, if we are, something else might be wrong? Or maybe I am simply confused ;) ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.automaton; + +import com.carrotsearch.hppc.BitMixer; +import com.carrotsearch.hppc.IntIntHashMap; +import com.carrotsearch.hppc.cursors.IntCursor; +import java.util.Arrays; + +/** A thin wrapper
[jira] [Commented] (LUCENE-9983) Stop sorting determinize powersets unnecessarily
[ https://issues.apache.org/jira/browse/LUCENE-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355644#comment-17355644 ] Michael McCandless commented on LUCENE-9983: OK I opened LUCENE-9986. > Stop sorting determinize powersets unnecessarily > > > Key: LUCENE-9983 > URL: https://issues.apache.org/jira/browse/LUCENE-9983 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Spinoff from LUCENE-9981. > Today, our {{Operations.determinize}} implementation builds powersets of all > subsets of NFA states that "belong" in the same determinized state, using > [this algorithm|https://en.wikipedia.org/wiki/Powerset_construction]. > To hold each powerset, we use a malleable {{SortedIntSet}} and periodically > freeze it to a {{FrozenIntSet}}, also sorted. We pay a high price to keep > these growing maps of int key, int value sorted by key, e.g. upgrading to a > {{TreeMap}} once the map is large enough (> 30 entries). > But I think sorting is entirely unnecessary here! Really all we need is the > ability to add/delete keys from the map, and hashCode / equals (by key only – > ignoring value!), and to freeze the map (a small optimization that we could > skip initially). We only use these maps to lookup in the (growing) > determinized automaton whether this powerset has already been seen. > Maybe we could simply poach the {{IntIntScatterMap}} implementation from > [HPPC|https://github.com/carrotsearch/hppc]? And then change its > {{hashCode}}/{{equals }}to only use keys (not values). > This change should be a big speedup for the kinds of (admittedly adversarial) > regexps we saw on LUCENE-9981. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9986) Create a simple "real world" regexp benchmark
Michael McCandless created LUCENE-9986: -- Summary: Create a simple "real world" regexp benchmark Key: LUCENE-9986 URL: https://issues.apache.org/jira/browse/LUCENE-9986 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless For issues like LUCENE-9983, where we are struggling to decide which low-level optimizations to make for our (complicated!) {{determinize}} method, it would really help to have a large, real-world corpus of regexps to evaluate performance metrics of our automata operations, like CPU and HEAP required to parse the regexp and determinize. Does anyone know of such an existing, hopefully compatibly licensed, corpus? Probably we would add these benchmarks to {{luceneutil}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9983) Stop sorting determinize powersets unnecessarily
[ https://issues.apache.org/jira/browse/LUCENE-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355642#comment-17355642 ] Michael McCandless commented on LUCENE-9983: In the sort of "opposite extreme" case, where someone calls det on an already "happens to be determinized" NFA (I think we already catch if someone tries to det an Automaton that we already previously det'd, and skip it?), I think we would see much more balanced {{incr}}/{{decr}} versus {{freeze}}? {quote}The algorithmic complexity is one thing but if these sets are short (and they will be, right?) then it's a small constant. {quote} Yeah, +1, they will "usually" be very short sets, I think, in the non-adversarial cases. I think we are badly missing a "representative" set of "real-world" regexp to use as a benchmarking corpus, to make decisions about optimizations like this. I love that this adversarial regexp go s much faster with [~zhai7631]'s PR, but I'm worried that it might then make the more normal, real-world, non-adversarial cases slower. Does anyone know of an existing "corpus" of "real-world" regexps by any chance ;) I will open a dedicated issue for this. > Stop sorting determinize powersets unnecessarily > > > Key: LUCENE-9983 > URL: https://issues.apache.org/jira/browse/LUCENE-9983 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Spinoff from LUCENE-9981. > Today, our {{Operations.determinize}} implementation builds powersets of all > subsets of NFA states that "belong" in the same determinized state, using > [this algorithm|https://en.wikipedia.org/wiki/Powerset_construction]. > To hold each powerset, we use a malleable {{SortedIntSet}} and periodically > freeze it to a {{FrozenIntSet}}, also sorted. We pay a high price to keep > these growing maps of int key, int value sorted by key, e.g. upgrading to a > {{TreeMap}} once the map is large enough (> 30 entries). > But I think sorting is entirely unnecessary here! Really all we need is the > ability to add/delete keys from the map, and hashCode / equals (by key only – > ignoring value!), and to freeze the map (a small optimization that we could > skip initially). We only use these maps to lookup in the (growing) > determinized automaton whether this powerset has already been seen. > Maybe we could simply poach the {{IntIntScatterMap}} implementation from > [HPPC|https://github.com/carrotsearch/hppc]? And then change its > {{hashCode}}/{{equals }}to only use keys (not values). > This change should be a big speedup for the kinds of (admittedly adversarial) > regexps we saw on LUCENE-9981. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] glawson0 commented on pull request #157: LUCENE-9963 Fix issue with FlattenGraphFilter throwing exceptions from holes
glawson0 commented on pull request #157: URL: https://github.com/apache/lucene/pull/157#issuecomment-852856814 I've fleshed out the comments for the 4 change areas explaining each area and what tests exercise them. Do they help? are there areas you feel aren't fully explained or that I could be clearer on? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] glawson0 commented on a change in pull request #157: LUCENE-9963 Fix issue with FlattenGraphFilter throwing exceptions from holes
glawson0 commented on a change in pull request #157: URL: https://github.com/apache/lucene/pull/157#discussion_r643770929 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -362,6 +378,40 @@ public boolean incrementToken() throws IOException { } } + private OutputNode recoverFromHole(InputNode src, int startOffset) { +// This means the "from" node of this token was never seen as a "to" node, +// which should only happen if we just crossed a hole. This is a challenging +// case for us because we normally rely on the full dependencies expressed +// by the arcs to assign outgoing node IDs. It would be better if tokens +// were never dropped but instead just marked deleted with a new +// TermDeletedAttribute (boolean valued) ... but until that future, we have +// a hack here to forcefully jump the output node ID: +assert src.outputNode == -1; +src.node = inputFrom; + +int maxOutIndex = outputNodes.getMaxPos(); +OutputNode outSrc = outputNodes.get(maxOutIndex); +// There are two types of holes, neighbor holes and consumed holes. A neighbor hole is between Review comment: I've added some ascII graphs into the comment. do those help? They're a little weird since I have tokens in node positions which isn't quite right. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9983) Stop sorting determinize powersets unnecessarily
[ https://issues.apache.org/jira/browse/LUCENE-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355600#comment-17355600 ] Bruno Roustant commented on LUCENE-9983: How many states are manipulated? If the states are numbered from 0 to N, and we keep most of the states during the computation, or N is not too high, then should we use an array instead of a map? With array[state] is the "reference count". We wouldn't have to sort the set of states for equality check because it would be directly the array order (skipping states with 0 reference). > Stop sorting determinize powersets unnecessarily > > > Key: LUCENE-9983 > URL: https://issues.apache.org/jira/browse/LUCENE-9983 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Spinoff from LUCENE-9981. > Today, our {{Operations.determinize}} implementation builds powersets of all > subsets of NFA states that "belong" in the same determinized state, using > [this algorithm|https://en.wikipedia.org/wiki/Powerset_construction]. > To hold each powerset, we use a malleable {{SortedIntSet}} and periodically > freeze it to a {{FrozenIntSet}}, also sorted. We pay a high price to keep > these growing maps of int key, int value sorted by key, e.g. upgrading to a > {{TreeMap}} once the map is large enough (> 30 entries). > But I think sorting is entirely unnecessary here! Really all we need is the > ability to add/delete keys from the map, and hashCode / equals (by key only – > ignoring value!), and to freeze the map (a small optimization that we could > skip initially). We only use these maps to lookup in the (growing) > determinized automaton whether this powerset has already been seen. > Maybe we could simply poach the {{IntIntScatterMap}} implementation from > [HPPC|https://github.com/carrotsearch/hppc]? And then change its > {{hashCode}}/{{equals }}to only use keys (not values). > This change should be a big speedup for the kinds of (admittedly adversarial) > regexps we saw on LUCENE-9981. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #2503: Re-introduce ant precommit github action in 8x branch
janhoy opened a new pull request #2503: URL: https://github.com/apache/lucene-solr/pull/2503 This PR re-introduces the `ant precommit` github action for branch_8x, which was removed when "wiping" master branch after the split. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9985) Upgrade Jetty to 9.4.41
[ https://issues.apache.org/jira/browse/LUCENE-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355533#comment-17355533 ] Jan Høydahl edited comment on LUCENE-9985 at 6/2/21, 7:49 AM: -- I tag this change in Lucene's CHANGES under 8.9 section, since SOLR-15316 backport will also upgrade Lucene Replicator's jetty version. This PR will only be merged to lucene/main and will thus not need a separate backport, since the lucene CHANGES entry is also part of the solr-lucene backport, see https://github.com/apache/lucene-solr/pull/2502 was (Author: janhoy): I tag this change in Lucene's CHANGES under 8.9 section, since SOLR-15316 backport will also upgrade Lucene Replicator's jetty version. This PR will only be merged to lucene/main and will thus not need a separate backport, even if CHANGES entry is for 8.9. Any objections? > Upgrade Jetty to 9.4.41 > --- > > Key: LUCENE-9985 > URL: https://issues.apache.org/jira/browse/LUCENE-9985 > Project: Lucene - Core > Issue Type: Task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As Solr is upgrading jetty dependency in 8.9 (shared with lucene), Lucene > main should also do the same -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #2502: SOLR-15316 Update Jetty to 9.4.41 (backport 8x)
janhoy opened a new pull request #2502: URL: https://github.com/apache/lucene-solr/pull/2502 See https://issues.apache.org/jira/browse/SOLR-15316 This is a backport of SOLR-15316 with mostly ivy changes. But in this 8x branch, the upgrade also affects lucene-replicator module. So I filed LUCENE-9985 to make sure lucene 9 (main) does not downgrade jetty again for that module :) Therefore I also added the LUCENE-9985 changes entry to this PR, since Lucene 8.9 is the first that has jetty 9.4.41 for the replicator... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9985) Upgrade Jetty to 9.4.41
[ https://issues.apache.org/jira/browse/LUCENE-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355533#comment-17355533 ] Jan Høydahl commented on LUCENE-9985: - I tag this change in Lucene's CHANGES under 8.9 section, since SOLR-15316 backport will also upgrade Lucene Replicator's jetty version. This PR will only be merged to lucene/main and will thus not need a separate backport, even if CHANGES entry is for 8.9. Any objections? > Upgrade Jetty to 9.4.41 > --- > > Key: LUCENE-9985 > URL: https://issues.apache.org/jira/browse/LUCENE-9985 > Project: Lucene - Core > Issue Type: Task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As Solr is upgrading jetty dependency in 8.9 (shared with lucene), Lucene > main should also do the same -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy opened a new pull request #165: LUCENE-9985 Upgrade Jetty to 9.4.41
janhoy opened a new pull request #165: URL: https://github.com/apache/lucene/pull/165 See https://issues.apache.org/jira/browse/LUCENE-9985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9985) Upgrade Jetty to 9.4.41
Jan Høydahl created LUCENE-9985: --- Summary: Upgrade Jetty to 9.4.41 Key: LUCENE-9985 URL: https://issues.apache.org/jira/browse/LUCENE-9985 Project: Lucene - Core Issue Type: Task Reporter: Jan Høydahl Assignee: Jan Høydahl As Solr is upgrading jetty dependency in 8.9 (shared with lucene), Lucene main should also do the same -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9976) WANDScorer assertion error in ensureConsistent
[ https://issues.apache.org/jira/browse/LUCENE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355511#comment-17355511 ] Dawid Weiss commented on LUCENE-9976: - Hi [~zacharymorn]! Hmm... I optimistically assumed it's going to reproduce on that seed because it did it the first time I re-run it... but indeed, it's not reproducible. I do have a good ratio of failures with tests.iters though: {code} gradlew test -Ptests.iters=10 --tests TestExpressionSorts.testQueries -Dtests.seed=FF571CE915A0955 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.asserts=true -p lucene/expressions/ {code} results in (sample): {code} 10 tests completed, 2 failed > Task :lucene:expressions:test FAILED ERROR: The following test(s) have failed: - org.apache.lucene.expressions.TestExpressionSorts.testQueries {seed=[FF571CE915A0955:537BBD158B33BCFB]} (:lucene:expressions) Test output: C:\Work\apache\lucene\main\lucene\expressions\build\test-results\test\outputs\OUTPUT-org.apache.lucene.expressions.TestExpressionSorts.txt Reproduce with: gradlew :lucene:expressions:test --tests "org.apache.lucene.expressions.TestExpressionSorts.testQueries {seed=[FF571CE915A0955:537BBD158B33BCFB]}" -Ptests.jvms=12 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=FF571CE915A0955 -Ptests.iters=10 -Ptests.multiplier=2 -Ptests.nightly=true -Ptests.file.encoding=ISO-8859-1 - org.apache.lucene.expressions.TestExpressionSorts.testQueries {seed=[FF571CE915A0955:C25B270A7CC74D2E]} (:lucene:expressions) Test output: C:\Work\apache\lucene\main\lucene\expressions\build\test-results\test\outputs\OUTPUT-org.apache.lucene.expressions.TestExpressionSorts.txt Reproduce with: gradlew :lucene:expressions:test --tests "org.apache.lucene.expressions.TestExpressionSorts.testQueries {seed=[FF571CE915A0955:C25B270A7CC74D2E]}" -Ptests.jvms=12 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=FF571CE915A0955 -Ptests.iters=10 -Ptests.multiplier=2 -Ptests.nightly=true -Ptests.file.encoding=ISO-8859-1 {code} So something is definitely going on there. :( > WANDScorer assertion error in ensureConsistent > -- > > Key: LUCENE-9976 > URL: https://issues.apache.org/jira/browse/LUCENE-9976 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Priority: Major > > Build fails and is reproducible: > https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/283/console > {code} > ./gradlew test --tests TestExpressionSorts.testQueries > -Dtests.seed=FF571CE915A0955 -Dtests.multiplier=2 -Dtests.nightly=true > -Dtests.slow=true -Dtests.asserts=true -p lucene/expressions/ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] glawson0 commented on a change in pull request #157: LUCENE-9963 Fix issue with FlattenGraphFilter throwing exceptions from holes
glawson0 commented on a change in pull request #157: URL: https://github.com/apache/lucene/pull/157#discussion_r643674911 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java ## @@ -193,14 +194,25 @@ private boolean releaseBufferedToken() { } if (inputNode.tokens.size() == 0) { assert inputNode.nextOut == 0; - assert output.nextOut == 0; // Hole dest nodes should never be merged since 1) we always // assign them to a new output position, and 2) since they never // have arriving tokens they cannot be pushed: - assert output.inputNodes.size() == 1 : output.inputNodes.size(); + // skip hole sources, but don't free until every input is checked + if (output.inputNodes.size() > 1) { +output.inputNodes.remove(output.nextOut); +if (output.nextOut < output.inputNodes.size()) { + continue; +} + } + outputFrom++; - inputNodes.freeBefore(output.inputNodes.get(0)); + int freeBefore = Collections.min(output.inputNodes); + assert outputNodes.get(outputFrom).inputNodes.stream().filter(n -> freeBefore < n).count() Review comment: You're correct. The test as written here just checks if at least one node is ok instead of all nodes, then prints the wrong message. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org