date:20210526

[GitHub] [lucene] sqshq edited a comment on pull request #149: LUCENE-9971: SortedSetDocValuesFacetCounts throws exception in case of unseen dimension (unlike other Facet implementations)

2021-05-26 Thread GitBox



sqshq edited a comment on pull request #149:
URL: https://github.com/apache/lucene/pull/149#issuecomment-849309776


   @gsmiller I've got your point, and certainly don't mind making the Taxonomy 
part consistent for this edge case as well. 
   Please take a look: 
https://github.com/apache/lucene/pull/149/commits/fd01e65c45b16cf94098aa2b199bd34494d192be


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] sqshq commented on pull request #149: LUCENE-9971: SortedSetDocValuesFacetCounts throws exception in case of unseen dimension (unlike other Facet implementations)

2021-05-26 Thread GitBox



sqshq commented on pull request #149:
URL: https://github.com/apache/lucene/pull/149#issuecomment-849309776


   @gsmiller I got your point, and certainly don't mind making the Taxonomy 
part consistent for this edge case as well. 
   Please take a look: 
https://github.com/apache/lucene/pull/149/commits/fd01e65c45b16cf94098aa2b199bd34494d192be


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on pull request #128: LUCENE-9662: CheckIndex should be concurrent - parallelizing index parts check within each segment

2021-05-26 Thread GitBox



zacharymorn commented on pull request #128:
URL: https://github.com/apache/lucene/pull/128#issuecomment-849250243


   Here's the latest console output I got from running against a local small 
index
   
   ```
   > Task :lucene:core:CheckIndex.main()
   -threadCount currently only supports up to 11 threads. Value higher than 
that will be capped.
   
   NOTE: testing will be more thorough if you run java with 
'-ea:org.apache.lucene...', so assertions are enabled
   
   Opening index @ 
/Users/xichen/IdeaProjects/benchmarks/data/tmp/index-msmacropassages/
   
   Checking index with async threadCount: 11
   0.00% total deletions; 8841823 documents; 0 deleteions
   Segments file=segments_2 numSegments=1 version=9.0.0 
id=arx9mawlpijzmvgvi7ehv0ret
   1 of 1: name=_a maxDoc=8841823
   version=9.0.0
   id=arx9mawlpijzmvgvi7ehv0req
   codec=Lucene90
   compound=false
   numFiles=12
   size (MB)=2,658.426
   diagnostics = {source=merge, java.vendor=AdoptOpenJDK, 
os.version=10.15.5, mergeMaxNumSegments=1, java.version=11.0.9, 
java.vm.version=11.0.9+11, lucene.version=9.0.0, timestamp=1622049007198, 
os=Mac OS X, java.runtime.version=11.0.9+11, mergeFactor=10, os.arch=x86_64}
   no deletions
   test: open reader.OK [took 0.051 sec]
   test: check integrity.OK [took 12.599 sec]
   test: field infos.OK [2 fields] [took 0.000 sec]
   test: check live docs.OK [took 0.000 sec]
   test: term vectorsOK [0 total term vector count; avg 0.0 
term/freq vector fields per doc] [took 0.000 sec]
   test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 
SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] [took 0.000 sec]
   test: points..OK [0 fields, 0 points] [took 0.000 sec]
   test: vectors.OK [0 fields, 0 vectors] [took 0.000 sec]
   test: field norms.OK [1 fields] [took 0.382 sec]
   test: stored fields...OK [17683646 total field count; avg 2.0 fields 
per doc] [took 25.261 sec]
   test: terms, freq, prox...OK [11795964 terms; 363490228 terms/docs 
pairs; 514795921 tokens] [took 82.775 sec]
   
   No problems were detected with this index.
   
   Took 95.669 sec total.
   ```
   
   I'll search for the nightly benchmark index to test next.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9448) Make an equivalent to Ant's "run" target for Luke module

2021-05-26 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352206#comment-17352206
 ] 

Tomoko Uchida commented on LUCENE-9448:
---

A question - how Luke will be distributed as of lucene 9.0.0.

Its MANIFEST says: 
{code}

Main-Class: org.apache.lucene.luke.app.desktop.LukeMain
Class-Path: lucene-codecs-9.0.0-SNAPSHOT.jar lucene-backward-codecs-9.0.
 0-SNAPSHOT.jar lucene-analysis-icu-9.0.0-SNAPSHOT.jar lucene-analysis-k
 uromoji-9.0.0-SNAPSHOT.jar lucene-analysis-morfologik-9.0.0-SNAPSHOT.ja
 r lucene-analysis-nori-9.0.0-SNAPSHOT.jar lucene-analysis-opennlp-9.0.0
 -SNAPSHOT.jar lucene-analysis-phonetic-9.0.0-SNAPSHOT.jar lucene-analys
 is-smartcn-9.0.0-SNAPSHOT.jar lucene-analysis-stempel-9.0.0-SNAPSHOT.ja
 r lucene-suggest-9.0.0-SNAPSHOT.jar lucene-analysis-common-9.0.0-SNAPSH
 OT.jar lucene-queryparser-9.0.0-SNAPSHOT.jar lucene-highlighter-9.0.0-S
 NAPSHOT.jar lucene-sandbox-9.0.0-SNAPSHOT.jar lucene-queries-9.0.0-SNAP
 SHOT.jar lucene-misc-9.0.0-SNAPSHOT.jar lucene-memory-9.0.0-SNAPSHOT.ja
 r lucene-core-9.0.0-SNAPSHOT.jar log4j-core-2.13.2.jar icu4j-68.2.jar c
 ommons-codec-1.13.jar log4j-api-2.13.2.jar opennlp-tools-1.9.1.jar morf
 ologik-polish-2.1.5.jar morfologik-stemming-2.1.5.jar morfologik-fsa-2.
 1.5.jar morfologik-ukrainian-search-4.9.1.jar

{code}

In brief, this means we will distribute luke as a standalone app 
("lucene/luke/build/distributions/lucene-luke-9.0.0-SNAPSHOT-standalone.tgz") 
that is separated from lucene binary package (Am I missing something)?

> Make an equivalent to Ant's "run" target for Luke module
> 
>
> Key: LUCENE-9448
> URL: https://issues.apache.org/jira/browse/LUCENE-9448
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Minor
> Fix For: main (9.0)
>
> Attachments: LUCENE-9448.patch, LUCENE-9448.patch
>
>
> With Ant build, Luke Swing app can be launched by "ant run" after checking 
> out the source code. "ant run" allows developers to immediately see the 
> effects of UI changes without creating the whole zip/tgz package (originally, 
> it was suggested when integrating Luke to Lucene).
> In Gradle, {{:lucene:luke:run}} task would be easily implemented with 
> {{JavaExec}}, I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on pull request #149: LUCENE-9971: SortedSetDocValuesFacetCounts throws exception in case of unseen dimension (unlike other Facet implementations)

2021-05-26 Thread GitBox



gsmiller commented on pull request #149:
URL: https://github.com/apache/lucene/pull/149#issuecomment-849160773


   Thanks for diving into this @sqshq! Yeah, I think I agree with all this. I 
suppose the subtle difference here is that in taxonomy-based facet counting, 
`FacetsConfig` would never have been told about a custom index field (because 
presumably it's not a valid dimension). You could get in a weird scenario 
though where you're using some custom index field for all your taxonomy 
faceting (and have never indexed anything in `$facets`). In other words, you've 
specified one or more custom index fields to hold the doc values via 
`FacetsConfig` and your index doesn't contain the `$facets` field at all. In 
this (somewhat unusual) case, if calling code were to ask for a dimension that 
was never indexed (same scenario you're describing in SSDVFC), you'd still get 
an `IllegalArgumentException` in `TaxonomyFacets#verifyDim` (because it would 
try to look up the configured index field for the dimension, wouldn't find it, 
fall back to `$facets` and then also not find that). That said, I don't think 
 fixing that as well necessarily needs to be part of this change. It would be 
nice though if that could also consistently return `null`.
   
   Well, +1 for making the change you proposed, particularly since it's 
consistent with the Javadoc as you point out as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9204) Move span queries to the queries module

2021-05-26 Thread Alan Woodward (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9204.
---
Fix Version/s: main (9.0)
   Resolution: Fixed

> Move span queries to the queries module
> ---
>
> Key: LUCENE-9204
> URL: https://issues.apache.org/jira/browse/LUCENE-9204
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have a slightly odd situation currently, with two parallel query 
> structures for building complex positional queries: the long-standing span 
> queries, in core; and interval queries, in the queries module.  Given that 
> interval queries solve at least some of the problems we've had with Spans, I 
> think we should be pushing users more towards these implementations.  It's 
> counter-intuitive to do that when Spans are in core though.  I've opened this 
> issue to discuss moving the spans package as a whole to the queries module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9545) Remove Analyzer.get/setVersion()

2021-05-26 Thread Alan Woodward (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9545.
---
Fix Version/s: main (9.0)
   Resolution: Fixed

> Remove Analyzer.get/setVersion()
> 
>
> Key: LUCENE-9545
> URL: https://issues.apache.org/jira/browse/LUCENE-9545
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: main (9.0)
>
>
> In days of yore, some lucene Analyzers would change their behaviour depending 
> on a version constant, so you could say 'use this analyzer in the way that it 
> would have worked in lucene 2.1'.  However, we have no Analyzers that make 
> use of this in the 9x or 8x lines, and I think it's pretty confusing 
> behaviour anyway.  We have factories to configure analyzers, and 
> version-specific behaviour can reside there if we really need it.  We should 
> just remove this functionality from Analyzer altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9949) Flaky test in TestCachePurging.testBackgroundPurges

2021-05-26 Thread Gautam Worah (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352068#comment-17352068
 ] 

Gautam Worah commented on LUCENE-9949:
--

It does not. I retried it with `./gradlew check 
-Ptests.seed=4DDD0C97DB8E50CC:5D664A5C8BA9CD8C ` and `./gradlew test --tests 
TestCachePurging.testBackgroundPurges -Dtests.seed=4DDD0C97DB8E50CC 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=shi-Tfng`, but both 
commands passed successfully.

> Flaky test in TestCachePurging.testBackgroundPurges
> ---
>
> Key: LUCENE-9949
> URL: https://issues.apache.org/jira/browse/LUCENE-9949
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: main (9.0)
> Environment: Ubuntu 18.04.5 LTS
> Java 11
>Reporter: Gautam Worah
>Priority: Minor
>
> While executing the `./gradlew check` command on an unrelated change, my 
> tests crashed on this test case with the following error log:
> > 
> org.apache.lucene.monitor.TestCachePurging > testBackgroundPurges FAILED
>  java.lang.AssertionError: expected:<-1> but was:<21196529334563>
>  at __randomizedtesting.SeedInfo.seed([4DDD0C97DB8E50CC:5D664A5C8BA9CD8C]:0)
>  at org.junit.Assert.fail(Assert.java:89)
>  at org.junit.Assert.failNotEquals(Assert.java:835)
>  at org.junit.Assert.assertEquals(Assert.java:647)
>  at org.junit.Assert.assertEquals(Assert.java:633)
>  at 
> org.apache.lucene.monitor.TestCachePurging.testBackgroundPurges(TestCachePurging.java:142)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>  at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>  at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>  at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>  at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>  at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>  at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>  at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>  at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>  at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>  at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>  at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>  at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>  at 
>

[jira] [Commented] (LUCENE-8143) Remove SpanBoostQuery

2021-05-26 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352054#comment-17352054
 ] 

David Smiley commented on LUCENE-8143:
--

I agree.  If inner boosting is broken, SpanBoostQuery is trap-py and not 
providing value.

> Remove SpanBoostQuery
> -
>
> Key: LUCENE-8143
> URL: https://issues.apache.org/jira/browse/LUCENE-8143
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Alan Woodward
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I initially added it so that span queries could still be boosted, but this 
> was actually a mistake: boosts are ignored on inner span queries, only the 
> boost of the top-level span query, the one that performs scoring, is not 
> ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9949) Flaky test in TestCachePurging.testBackgroundPurges

2021-05-26 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352045#comment-17352045
 ] 

Michael McCandless commented on LUCENE-9949:


Hmm does the failure reproduce [~gworah]?

> Flaky test in TestCachePurging.testBackgroundPurges
> ---
>
> Key: LUCENE-9949
> URL: https://issues.apache.org/jira/browse/LUCENE-9949
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: main (9.0)
> Environment: Ubuntu 18.04.5 LTS
> Java 11
>Reporter: Gautam Worah
>Priority: Minor
>
> While executing the `./gradlew check` command on an unrelated change, my 
> tests crashed on this test case with the following error log:
> > 
> org.apache.lucene.monitor.TestCachePurging > testBackgroundPurges FAILED
>  java.lang.AssertionError: expected:<-1> but was:<21196529334563>
>  at __randomizedtesting.SeedInfo.seed([4DDD0C97DB8E50CC:5D664A5C8BA9CD8C]:0)
>  at org.junit.Assert.fail(Assert.java:89)
>  at org.junit.Assert.failNotEquals(Assert.java:835)
>  at org.junit.Assert.assertEquals(Assert.java:647)
>  at org.junit.Assert.assertEquals(Assert.java:633)
>  at 
> org.apache.lucene.monitor.TestCachePurging.testBackgroundPurges(TestCachePurging.java:142)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>  at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>  at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>  at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>  at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>  at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>  at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>  at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>  at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>  at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>  at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>  at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>  at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>  at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>  at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
>  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>  at 
>

[GitHub] [lucene] sqshq commented on pull request #149: LUCENE-9971: SortedSetDocValuesFacetCounts throws exception in case of unseen dimension (unlike other Facet implementations)

2021-05-26 Thread GitBox



sqshq commented on pull request #149:
URL: https://github.com/apache/lucene/pull/149#issuecomment-848952461


   Hi Greg,
   You are right, [`TaxonomyFacets` fail with 
`IllegalArgumentException`](https://github.com/sqshq/lucene/blob/7f8b7ffbcad2265b047a5e2195f76cc924028063/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/TaxonomyFacets.java#L121)
 in the case when we are trying to search for a field that has a different 
`indexFieldName`. Although here we are not even checking that the `dimension` 
is present for the configured `indexFieldName` - this error is only related to 
the field mismatch.
   
   That behavior actually looks more or less consistent with SSDVFF 
implementation. There, if we try to create a `SortedSetDocValuesReaderState` 
for a field that was never indexed with facets, [we fail with 
`IllegalArgumentException`](https://github.com/sqshq/lucene/blob/7f8b7ffbcad2265b047a5e2195f76cc924028063/lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java#L73).
   
   Although those `IllegalArgumentExceptions` are thrown in slightly different 
situations, I think in both cases they are pointing to a field name 
misconfiguration (or a bug in the code), rather than "user requested unseen 
dimension" scenario. As far as I understand in most of the cases the facet 
field name is a single/constant set of values, so most likely we indeed want to 
make sure that user does not ignore it. Also, this behavior can't be easily 
changed for SSDVFF, since the field check happens in 
`SortedSetDocValuesReaderState` constructor - we can't return null there. 
   
   @gsmiller let me know what do you think!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9949) Flaky test in TestCachePurging.testBackgroundPurges

2021-05-26 Thread Gautam Worah (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Worah updated LUCENE-9949:
-
Issue Type: Bug  (was: Improvement)

> Flaky test in TestCachePurging.testBackgroundPurges
> ---
>
> Key: LUCENE-9949
> URL: https://issues.apache.org/jira/browse/LUCENE-9949
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: main (9.0)
> Environment: Ubuntu 18.04.5 LTS
> Java 11
>Reporter: Gautam Worah
>Priority: Minor
>
> While executing the `./gradlew check` command on an unrelated change, my 
> tests crashed on this test case with the following error log:
> > 
> org.apache.lucene.monitor.TestCachePurging > testBackgroundPurges FAILED
>  java.lang.AssertionError: expected:<-1> but was:<21196529334563>
>  at __randomizedtesting.SeedInfo.seed([4DDD0C97DB8E50CC:5D664A5C8BA9CD8C]:0)
>  at org.junit.Assert.fail(Assert.java:89)
>  at org.junit.Assert.failNotEquals(Assert.java:835)
>  at org.junit.Assert.assertEquals(Assert.java:647)
>  at org.junit.Assert.assertEquals(Assert.java:633)
>  at 
> org.apache.lucene.monitor.TestCachePurging.testBackgroundPurges(TestCachePurging.java:142)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>  at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>  at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>  at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>  at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>  at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>  at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>  at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>  at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>  at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>  at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>  at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>  at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>  at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>  at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>  at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>  at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
>  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>  at 
>

[GitHub] [lucene] dsmiley commented on pull request #154: LUCENE-9454: Remove version field on Analyzer

2021-05-26 Thread GitBox



dsmiley commented on pull request #154:
URL: https://github.com/apache/lucene/pull/154#issuecomment-848947225


   @mikemccand I'm curious why you say:
   > This was an overly complex approach to backwards compatibility!
   
   I think it's rather simple, and the PR was simple as well showing it was a 
simple idea.  If this approach is complex, what simple approach do you 
recommend instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9454) Upgrade hamcrest to version 2.2

2021-05-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351923#comment-17351923
 ] 

ASF subversion and git services commented on LUCENE-9454:
-

Commit 1e7d8146fff95522674c720099a52b9efd102881 in lucene's branch 
refs/heads/main from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1e7d814 ]

LUCENE-9454: Remove version field on Analyzer (#154)

Version switching on Analyzer behaviour should be implemented
in the various component factories, rather than on a mutable
setting on Analyzer itself.

> Upgrade hamcrest to version 2.2
> ---
>
> Key: LUCENE-9454
> URL: https://issues.apache.org/jira/browse/LUCENE-9454
> Project: Lucene - Core
>  Issue Type: Task
>Affects Versions: main (9.0)
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek merged pull request #154: LUCENE-9454: Remove version field on Analyzer

2021-05-26 Thread GitBox



romseygeek merged pull request #154:
URL: https://github.com/apache/lucene/pull/154


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on pull request #154: LUCENE-9454: Remove version field on Analyzer

2021-05-26 Thread GitBox



romseygeek commented on pull request #154:
URL: https://github.com/apache/lucene/pull/154#issuecomment-848901996


   > +1, this is 9.0 only?
   
   Yes, I should have made that clear, sorry!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9448) Make an equivalent to Ant's "run" target for Luke module

2021-05-26 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351857#comment-17351857
 ] 

Tomoko Uchida commented on LUCENE-9448:
---

Ok thank you [~uschindler] for your feedback, too.

> Make an equivalent to Ant's "run" target for Luke module
> 
>
> Key: LUCENE-9448
> URL: https://issues.apache.org/jira/browse/LUCENE-9448
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Minor
> Fix For: main (9.0)
>
> Attachments: LUCENE-9448.patch, LUCENE-9448.patch
>
>
> With Ant build, Luke Swing app can be launched by "ant run" after checking 
> out the source code. "ant run" allows developers to immediately see the 
> effects of UI changes without creating the whole zip/tgz package (originally, 
> it was suggested when integrating Luke to Lucene).
> In Gradle, {{:lucene:luke:run}} task would be easily implemented with 
> {{JavaExec}}, I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9448) Make an equivalent to Ant's "run" target for Luke module

2021-05-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351849#comment-17351849
 ] 

Uwe Schindler commented on LUCENE-9448:
---

No fat jars, please! It is fine with shading sometimes to prevent 
incompatibilities, but not just to allow Java -jar.

> Make an equivalent to Ant's "run" target for Luke module
> 
>
> Key: LUCENE-9448
> URL: https://issues.apache.org/jira/browse/LUCENE-9448
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Minor
> Fix For: main (9.0)
>
> Attachments: LUCENE-9448.patch, LUCENE-9448.patch
>
>
> With Ant build, Luke Swing app can be launched by "ant run" after checking 
> out the source code. "ant run" allows developers to immediately see the 
> effects of UI changes without creating the whole zip/tgz package (originally, 
> it was suggested when integrating Luke to Lucene).
> In Gradle, {{:lucene:luke:run}} task would be easily implemented with 
> {{JavaExec}}, I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9448) Make an equivalent to Ant's "run" target for Luke module

2021-05-26 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351845#comment-17351845
 ] 

Tomoko Uchida commented on LUCENE-9448:
---

Ok thank you [~dweiss] for your comment.

> Make an equivalent to Ant's "run" target for Luke module
> 
>
> Key: LUCENE-9448
> URL: https://issues.apache.org/jira/browse/LUCENE-9448
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Minor
> Fix For: main (9.0)
>
> Attachments: LUCENE-9448.patch, LUCENE-9448.patch
>
>
> With Ant build, Luke Swing app can be launched by "ant run" after checking 
> out the source code. "ant run" allows developers to immediately see the 
> effects of UI changes without creating the whole zip/tgz package (originally, 
> it was suggested when integrating Luke to Lucene).
> In Gradle, {{:lucene:luke:run}} task would be easily implemented with 
> {{JavaExec}}, I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9448) Make an equivalent to Ant's "run" target for Luke module

2021-05-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351843#comment-17351843
 ] 

Dawid Weiss commented on LUCENE-9448:
-

I prefer if it's an explicit - JAR + proper manifest. This makes it essentially 
the same functionally (you can run it with java -jar luke.jar) and you can see 
which dependencies it needs/ uses. Fat jars are making things more obscure than 
they need to be.

> Make an equivalent to Ant's "run" target for Luke module
> 
>
> Key: LUCENE-9448
> URL: https://issues.apache.org/jira/browse/LUCENE-9448
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Minor
> Fix For: main (9.0)
>
> Attachments: LUCENE-9448.patch, LUCENE-9448.patch
>
>
> With Ant build, Luke Swing app can be launched by "ant run" after checking 
> out the source code. "ant run" allows developers to immediately see the 
> effects of UI changes without creating the whole zip/tgz package (originally, 
> it was suggested when integrating Luke to Lucene).
> In Gradle, {{:lucene:luke:run}} task would be easily implemented with 
> {{JavaExec}}, I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9448) Make an equivalent to Ant's "run" target for Luke module

2021-05-26 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351840#comment-17351840
 ] 

Tomoko Uchida commented on LUCENE-9448:
---

I cleaned up the obsoleted sh and bat scripts. Now luke is a standalone JAR - I 
am wondering if it makes sense that we provide luke as a FAT JAR; there is a 
gradle plugin to do so: https://imperceptiblethoughts.com/shadow/introduction/ 
(I have never used it, though.)


> Make an equivalent to Ant's "run" target for Luke module
> 
>
> Key: LUCENE-9448
> URL: https://issues.apache.org/jira/browse/LUCENE-9448
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Minor
> Fix For: main (9.0)
>
> Attachments: LUCENE-9448.patch, LUCENE-9448.patch
>
>
> With Ant build, Luke Swing app can be launched by "ant run" after checking 
> out the source code. "ant run" allows developers to immediately see the 
> effects of UI changes without creating the whole zip/tgz package (originally, 
> it was suggested when integrating Luke to Lucene).
> In Gradle, {{:lucene:luke:run}} task would be easily implemented with 
> {{JavaExec}}, I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9448) Make an equivalent to Ant's "run" target for Luke module

2021-05-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351830#comment-17351830
 ] 

ASF subversion and git services commented on LUCENE-9448:
-

Commit 16104090fb0a6ebfca946635a2587419e6d8e466 in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1610409 ]

LUCENE-9448: clean up unused start scripts for luke.


> Make an equivalent to Ant's "run" target for Luke module
> 
>
> Key: LUCENE-9448
> URL: https://issues.apache.org/jira/browse/LUCENE-9448
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Minor
> Fix For: main (9.0)
>
> Attachments: LUCENE-9448.patch, LUCENE-9448.patch
>
>
> With Ant build, Luke Swing app can be launched by "ant run" after checking 
> out the source code. "ant run" allows developers to immediately see the 
> effects of UI changes without creating the whole zip/tgz package (originally, 
> it was suggested when integrating Luke to Lucene).
> In Gradle, {{:lucene:luke:run}} task would be easily implemented with 
> {{JavaExec}}, I think.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] janhoy commented on pull request #136: LUCENE-9589 Swedish Minimal Stemmer

2021-05-26 Thread GitBox



janhoy commented on pull request #136:
URL: https://github.com/apache/lucene/pull/136#issuecomment-848803076


   Since release 8.9 is in feature freeze I now target this at 9.0.0. I moved 
CHANGES entry and @since tags. Will commit later this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] janhoy commented on a change in pull request #136: LUCENE-9589 Swedish Minimal Stemmer

2021-05-26 Thread GitBox



janhoy commented on a change in pull request #136:
URL: https://github.com/apache/lucene/pull/136#discussion_r639745969



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/sv/SwedishMinimalStemmer.java
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.sv;
+
+/*
+ * This algorithm is updated based on code located at:
+ * http://members.unine.ch/jacques.savoy/clef/
+ *
+ * Full copyright for that code follows:
+ */
+
+/*
+ * Copyright (c) 2005, Jacques Savoy

Review comment:
   Hmm, I see a ton of other stemmers with exactly the same headers. So 
I'll leave them as is in this PR and rather do a separate copyright cleanup.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija opened a new pull request #2501: SOLR-15090: Allow backup storage in GCS

2021-05-26 Thread GitBox



gerlowskija opened a new pull request #2501:
URL: https://github.com/apache/lucene-solr/pull/2501


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9975) Don't require artifact signing for local maven artifact publishing

2021-05-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351738#comment-17351738
 ] 

Dawid Weiss commented on LUCENE-9975:
-

Oh, sorry for being dim. You're right! I'll clean this up.

> Don't require artifact signing for local maven artifact publishing
> --
>
> Key: LUCENE-9975
> URL: https://issues.apache.org/jira/browse/LUCENE-9975
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: image-2021-05-26-11-58-45-713.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSideways approach

2021-05-26 Thread GitBox



gsmiller commented on pull request #142:
URL: https://github.com/apache/lucene/pull/142#issuecomment-848700105


   Thanks for the feedback @msokolov!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a change in pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSidew

2021-05-26 Thread GitBox



gsmiller commented on a change in pull request #142:
URL: https://github.com/apache/lucene/pull/142#discussion_r639646070



##
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java
##
@@ -40,21 +41,65 @@
 // TODO change the way DrillSidewaysScorer is used, this query does not work
 // with filter caching
 class DrillSidewaysQuery extends Query {
+
   final Query baseQuery;
-  final Collector drillDownCollector;
-  final Collector[] drillSidewaysCollectors;
+
+  // The caller must either directly provide FacetsCollectors for the drill 
down and sideways
+  // collecting, or provide FacetsCollectorManagers used to create the 
FacetsCollectors. If
+  // this query will be executed concurrently, FacetsCollectorManagers should 
be used to ensure
+  // multiple threads aren't collecting into the same FacetsCollector.
+  final FacetsCollector drillDownCollector;
+  final FacetsCollector[] drillSidewaysCollectors;
+  final FacetsCollectorManager drillDownCollectorManager;
+  final FacetsCollectorManager[] drillSidewaysCollectorManagers;
+  final List managedDrillDownCollectors;
+  final List managedDrillSidewaysCollectors;
+
   final Query[] drillDownQueries;
+
   final boolean scoreSubDocsAtOnce;
 
+  /**
+   * Construct a new {@code DrillSidewaysQuery} that will directly use the 
provided {@link
+   * FacetsCollector}s. Use this if you're certain that the query will not be 
executed concurrently.

Review comment:
   Since this class is package-private and I think it makes sense to 
consolidate it to always use `FacetsCollectorManager` (as per your suggestion), 
I think I'll add some detailed documentation on the `DrillSideways` class to 
describe the trade-off of providing an `ExecutorService` or not. This is what 
triggers the "concurrent drill sideways" implementation. The short version 
though is that the concurrent approach to DS will do lots of duplicate work but 
may be slightly faster, where as the "sequential" version doesn't do any 
duplicate query processing/computation. It gets extra confusing because both 
the "sequential" and "concurrent" approaches can still work with concurrent 
query execution if using a `CollectorManager` + `IndexSearcher` with an 
`Executor`. It's a little tricky to describe but I'll take a shot at it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a change in pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSidew

2021-05-26 Thread GitBox



gsmiller commented on a change in pull request #142:
URL: https://github.com/apache/lucene/pull/142#discussion_r639641681



##
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java
##
@@ -40,21 +41,65 @@
 // TODO change the way DrillSidewaysScorer is used, this query does not work
 // with filter caching
 class DrillSidewaysQuery extends Query {
+
   final Query baseQuery;
-  final Collector drillDownCollector;
-  final Collector[] drillSidewaysCollectors;
+
+  // The caller must either directly provide FacetsCollectors for the drill 
down and sideways
+  // collecting, or provide FacetsCollectorManagers used to create the 
FacetsCollectors. If
+  // this query will be executed concurrently, FacetsCollectorManagers should 
be used to ensure

Review comment:
   I think this problem goes away by always using `FacetsCollectorManager`. 
I think that's a reasonable simplification here. Appreciate the suggestion!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a change in pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSidew

2021-05-26 Thread GitBox



gsmiller commented on a change in pull request #142:
URL: https://github.com/apache/lucene/pull/142#discussion_r639641131



##
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java
##
@@ -142,7 +212,19 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
 new ConstantScoreScorer(drillDowns[dim], 0f, scoreMode, 
DocIdSetIterator.empty());
   }
 
-  dims[dim] = new DrillSidewaysScorer.DocsAndCost(scorer, 
drillSidewaysCollectors[dim]);
+  // If the caller directly provided FacetsCollectors to use for the 
sideways dimensions,

Review comment:
   Thanks for this feedback! Makes a lot of sense to me. I'll consolidate 
to always use `FacetsCollectorManager` in this class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9975) Don't require artifact signing for local maven artifact publishing

2021-05-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351680#comment-17351680
 ] 

Uwe Schindler commented on LUCENE-9975:
---

{quote}I think this issue could be elegantly solved by creating two different 
publications: jars and unsignedJars, then the "local" maven publishing tasks 
wouldn't be signed, that's it. And nexus publications would have to be signed.
{quote}

+1

> Don't require artifact signing for local maven artifact publishing
> --
>
> Key: LUCENE-9975
> URL: https://issues.apache.org/jira/browse/LUCENE-9975
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: image-2021-05-26-11-58-45-713.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9975) Don't require artifact signing for local maven artifact publishing

2021-05-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351678#comment-17351678
 ] 

Uwe Schindler edited comment on LUCENE-9975 at 5/26/21, 9:59 AM:
-

There are no signatures (*.asc) files. Those are not pinned, they are standard 
snapshot builds. Your code works correctly there.

The prerelease builds in the other issue have a manual version number (from 
jenkins build variables), but those just pass "-x signJarsPublication". They 
are published somewhere else: 
[https://nightlies.apache.org/solr/lucene-prereleases/]

See Jenkins config:
 !image-2021-05-26-11-58-45-713.png|width=565,height=528!


was (Author: thetaphi):
There are no signatures (*.asc) files. Those are not pinned, they are standard 
snapshot builds. Your code works correctly there.

The prerelease builds in the other issue have a manual version number (from 
jenkins build variables), but those just pass "-x signJarsSomething". They are 
published somewhere else: https://nightlies.apache.org/solr/lucene-prereleases/

> Don't require artifact signing for local maven artifact publishing
> --
>
> Key: LUCENE-9975
> URL: https://issues.apache.org/jira/browse/LUCENE-9975
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: image-2021-05-26-11-58-45-713.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9975) Don't require artifact signing for local maven artifact publishing

2021-05-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351678#comment-17351678
 ] 

Uwe Schindler edited comment on LUCENE-9975 at 5/26/21, 9:55 AM:
-

There are no signatures (*.asc) files. Those are not pinned, they are standard 
snapshot builds. Your code works correctly there.

The prerelease builds in the other issue have a manual version number (from 
jenkins build variables), but those just pass "-x signJarsSomething". They are 
published somewhere else: https://nightlies.apache.org/solr/lucene-prereleases/


was (Author: thetaphi):
There are no signatures (*.asc) files. Those are not pinned, they are standard 
snapshot builds. Your code works correctly there.

The prerelease builds in the other issue have a manual version number (from 
jenkins build variables), but those just pass "-x signJarsSomething".

> Don't require artifact signing for local maven artifact publishing
> --
>
> Key: LUCENE-9975
> URL: https://issues.apache.org/jira/browse/LUCENE-9975
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9975) Don't require artifact signing for local maven artifact publishing

2021-05-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351678#comment-17351678
 ] 

Uwe Schindler commented on LUCENE-9975:
---

There are no signatures (*.asc) files. Those are not pinned, they are standard 
snapshot builds. Your code works correctly there.

The prerelease builds in the other issue have a manual version number (from 
jenkins build variables), but those just pass "-x signJarsSomething".

> Don't require artifact signing for local maven artifact publishing
> --
>
> Key: LUCENE-9975
> URL: https://issues.apache.org/jira/browse/LUCENE-9975
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9545) Remove Analyzer.get/setVersion()

2021-05-26 Thread Alan Woodward (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351675#comment-17351675
 ] 

Alan Woodward commented on LUCENE-9545:
---

Nudging this one back to life again with a PR: 
https://github.com/apache/lucene/pull/154/files

> Remove Analyzer.get/setVersion()
> 
>
> Key: LUCENE-9545
> URL: https://issues.apache.org/jira/browse/LUCENE-9545
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>
> In days of yore, some lucene Analyzers would change their behaviour depending 
> on a version constant, so you could say 'use this analyzer in the way that it 
> would have worked in lucene 2.1'.  However, we have no Analyzers that make 
> use of this in the 9x or 8x lines, and I think it's pretty confusing 
> behaviour anyway.  We have factories to configure analyzers, and 
> version-specific behaviour can reside there if we really need it.  We should 
> just remove this functionality from Analyzer altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek opened a new pull request #154: LUCENE-9454: Remove version field on Analyzer

2021-05-26 Thread GitBox



romseygeek opened a new pull request #154:
URL: https://github.com/apache/lucene/pull/154


   Version switching on Analyzer behaviour should be implemented
   in the various component factories, rather than on a mutable
   setting on Analyzer itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9975) Don't require artifact signing for local maven artifact publishing

2021-05-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351667#comment-17351667
 ] 

Dawid Weiss commented on LUCENE-9975:
-

I'm a bit confused. The build currently says we don't sign snapshots:
{code}
  signing {
required { !version.endsWith("SNAPSHOT") }
sign publishing.publications.jars
  }
{code}

but apache nexus clearly has signatures for pinned shapshot versions:
https://repository.apache.org/content/repositories/snapshots/org/apache/lucene/lucene-core/9.0.0-SNAPSHOT/

[~uschindler] - what's the version number override the jenkins build does 
before it publishes to apache nexus? 

I think this issue could be elegantly solved by creating two different 
publications: jars and unsignedJars, then the "local" maven publishing tasks 
wouldn't be signed, that's it. And nexus publications would have to be signed.

> Don't require artifact signing for local maven artifact publishing
> --
>
> Key: LUCENE-9975
> URL: https://issues.apache.org/jira/browse/LUCENE-9975
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-8143) Remove SpanBoostQuery

2021-05-26 Thread Alan Woodward (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward reassigned LUCENE-8143:
-

Assignee: Alan Woodward

> Remove SpanBoostQuery
> -
>
> Key: LUCENE-8143
> URL: https://issues.apache.org/jira/browse/LUCENE-8143
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Alan Woodward
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I initially added it so that span queries could still be boosted, but this 
> was actually a mistake: boosts are ignored on inner span queries, only the 
> boost of the top-level span query, the one that performs scoring, is not 
> ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8143) Remove SpanBoostQuery

2021-05-26 Thread Alan Woodward (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351662#comment-17351662
 ] 

Alan Woodward commented on LUCENE-8143:
---

Now that Spans are in the queries module this seems like a good time to revisit 
this.  [~dsmiley] I think that removing is the correct call here? If we change 
things so that boosts get applied differently then existing queries are going 
to silently change their scores in unexpected ways, whereas if we remove then 
users will get compilation errors when upgrading and we can point them to a 
vanilla BoostQuery for top-level boosts.

> Remove SpanBoostQuery
> -
>
> Key: LUCENE-8143
> URL: https://issues.apache.org/jira/browse/LUCENE-8143
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I initially added it so that span queries could still be boosted, but this 
> was actually a mistake: boosts are ignored on inner span queries, only the 
> boost of the top-level span query, the one that performs scoring, is not 
> ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek merged pull request #152: LUCENE 9204: Move SpanQuery and subclasses to the queries module

2021-05-26 Thread GitBox



romseygeek merged pull request #152:
URL: https://github.com/apache/lucene/pull/152


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9977) Gradle's RAT task has missing inputs, so it can't figure out when to run

2021-05-26 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-9977:
--
Issue Type: Bug  (was: Task)

> Gradle's RAT task has missing inputs, so it can't figure out when to run
> 
>
> Key: LUCENE-9977
> URL: https://issues.apache.org/jira/browse/LUCENE-9977
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Dawid Weiss
>Priority: Major
>
> This also affects Solr!
> [~romseygeek] wrote:
> {quote}
> There’s a subject line I never thought I’d type :)
> Firstly: can I say how much I appreciate all the work that’s gone into the 
> gradle build? I’ve been doing lots of small PRs for the spans-to-queries work 
> and being able to run checks multiple times in an extremely efficient manner 
> has been a life saver.  Massive thanks to Dawid, and also to Robert for all 
> the work on speeding up tests.
> I think may have found a bug in the input configuration for our license 
> header checks.  Thanks to the new build, I have been running `./gradlew 
> check` before pushing code, but it has let through files with missing headers 
> a few times, which were subsequently caught by the GitHub action running on 
> the PR.
> So I tried the following:
> - start a new git branch
> - run ./gradlew rat -> everything should pass
> - edit one of the files to remove the license header
> - run ./gradlew rat -> still passes!
> - run ./gradlew clean
> - run ./gradlew rat -> now I get an error
> This looks to me like the fileset that the rat task is looking at is not set 
> up correctly, but I don’t know enough gradle to actually work out what is 
> wrong and what the fix should be.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9977) Gradle's RAT task has missing inputs, so it can't figure out when to run

2021-05-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351656#comment-17351656
 ] 

Dawid Weiss commented on LUCENE-9977:
-

I'll take a look later, thanks Uwe.

> Gradle's RAT task has missing inputs, so it can't figure out when to run
> 
>
> Key: LUCENE-9977
> URL: https://issues.apache.org/jira/browse/LUCENE-9977
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/build
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Dawid Weiss
>Priority: Major
>
> This also affects Solr!
> [~romseygeek] wrote:
> {quote}
> There’s a subject line I never thought I’d type :)
> Firstly: can I say how much I appreciate all the work that’s gone into the 
> gradle build? I’ve been doing lots of small PRs for the spans-to-queries work 
> and being able to run checks multiple times in an extremely efficient manner 
> has been a life saver.  Massive thanks to Dawid, and also to Robert for all 
> the work on speeding up tests.
> I think may have found a bug in the input configuration for our license 
> header checks.  Thanks to the new build, I have been running `./gradlew 
> check` before pushing code, but it has let through files with missing headers 
> a few times, which were subsequently caught by the GitHub action running on 
> the PR.
> So I tried the following:
> - start a new git branch
> - run ./gradlew rat -> everything should pass
> - edit one of the files to remove the license header
> - run ./gradlew rat -> still passes!
> - run ./gradlew clean
> - run ./gradlew rat -> now I get an error
> This looks to me like the fileset that the rat task is looking at is not set 
> up correctly, but I don’t know enough gradle to actually work out what is 
> wrong and what the fix should be.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9977) Gradle's RAT task has missing inputs, so it can't figure out when to run

2021-05-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351658#comment-17351658
 ] 

Uwe Schindler commented on LUCENE-9977:
---

Solr issue: SOLR-15436

> Gradle's RAT task has missing inputs, so it can't figure out when to run
> 
>
> Key: LUCENE-9977
> URL: https://issues.apache.org/jira/browse/LUCENE-9977
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/build
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Dawid Weiss
>Priority: Major
>
> This also affects Solr!
> [~romseygeek] wrote:
> {quote}
> There’s a subject line I never thought I’d type :)
> Firstly: can I say how much I appreciate all the work that’s gone into the 
> gradle build? I’ve been doing lots of small PRs for the spans-to-queries work 
> and being able to run checks multiple times in an extremely efficient manner 
> has been a life saver.  Massive thanks to Dawid, and also to Robert for all 
> the work on speeding up tests.
> I think may have found a bug in the input configuration for our license 
> header checks.  Thanks to the new build, I have been running `./gradlew 
> check` before pushing code, but it has let through files with missing headers 
> a few times, which were subsequently caught by the GitHub action running on 
> the PR.
> So I tried the following:
> - start a new git branch
> - run ./gradlew rat -> everything should pass
> - edit one of the files to remove the license header
> - run ./gradlew rat -> still passes!
> - run ./gradlew clean
> - run ./gradlew rat -> now I get an error
> This looks to me like the fileset that the rat task is looking at is not set 
> up correctly, but I don’t know enough gradle to actually work out what is 
> wrong and what the fix should be.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9977) Gradle's RAT task has missing inputs, so it can't figure out when to run

2021-05-26 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-9977:
--
Component/s: general/build

> Gradle's RAT task has missing inputs, so it can't figure out when to run
> 
>
> Key: LUCENE-9977
> URL: https://issues.apache.org/jira/browse/LUCENE-9977
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/build
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Dawid Weiss
>Priority: Major
>
> This also affects Solr!
> [~romseygeek] wrote:
> {quote}
> There’s a subject line I never thought I’d type :)
> Firstly: can I say how much I appreciate all the work that’s gone into the 
> gradle build? I’ve been doing lots of small PRs for the spans-to-queries work 
> and being able to run checks multiple times in an extremely efficient manner 
> has been a life saver.  Massive thanks to Dawid, and also to Robert for all 
> the work on speeding up tests.
> I think may have found a bug in the input configuration for our license 
> header checks.  Thanks to the new build, I have been running `./gradlew 
> check` before pushing code, but it has let through files with missing headers 
> a few times, which were subsequently caught by the GitHub action running on 
> the PR.
> So I tried the following:
> - start a new git branch
> - run ./gradlew rat -> everything should pass
> - edit one of the files to remove the license header
> - run ./gradlew rat -> still passes!
> - run ./gradlew clean
> - run ./gradlew rat -> now I get an error
> This looks to me like the fileset that the rat task is looking at is not set 
> up correctly, but I don’t know enough gradle to actually work out what is 
> wrong and what the fix should be.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9977) Gradle's RAT task has missing inputs, so it can't figure out when to run

2021-05-26 Thread Uwe Schindler (Jira)

Uwe Schindler created LUCENE-9977:
-

 Summary: Gradle's RAT task has missing inputs, so it can't figure 
out when to run
 Key: LUCENE-9977
 URL: https://issues.apache.org/jira/browse/LUCENE-9977
 Project: Lucene - Core
  Issue Type: Task
Affects Versions: main (9.0)
Reporter: Uwe Schindler
Assignee: Dawid Weiss


This also affects Solr!

[~romseygeek] wrote:
{quote}
There’s a subject line I never thought I’d type :)

Firstly: can I say how much I appreciate all the work that’s gone into the 
gradle build? I’ve been doing lots of small PRs for the spans-to-queries work 
and being able to run checks multiple times in an extremely efficient manner 
has been a life saver.  Massive thanks to Dawid, and also to Robert for all the 
work on speeding up tests.

I think may have found a bug in the input configuration for our license header 
checks.  Thanks to the new build, I have been running `./gradlew check` before 
pushing code, but it has let through files with missing headers a few times, 
which were subsequently caught by the GitHub action running on the PR.

So I tried the following:
- start a new git branch
- run ./gradlew rat -> everything should pass
- edit one of the files to remove the license header
- run ./gradlew rat -> still passes!
- run ./gradlew clean
- run ./gradlew rat -> now I get an error

This looks to me like the fileset that the rat task is looking at is not set up 
correctly, but I don’t know enough gradle to actually work out what is wrong 
and what the fix should be.
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9977) Gradle's RAT task has missing inputs, so it can't figure out when to run

2021-05-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351654#comment-17351654
 ] 

Uwe Schindler commented on LUCENE-9977:
---

Hi,

I tried to fix the problem but gave up because of limited time.

The problem is that this task is global per project and not split into 
different ones, it’s all mixed together. It works if I add a @InputDirectory 
with the projectDir, but this leads to strange exceptions, because it also 
tries to hash files from build directory.

IMHO, the correct way to fix this is:
-   Generate a generic RatTask that extends SourceTask (do NOT extend 
DefaultTask!). This brings a source directory and include/exclude automatically 
so it’s easiy to configure. All you need is to use the converter when executing 
the task, that changes a FileCollection to an ANT fileset: 
this.getSource().addToAntBuilder(antTaskDeclaration, ”fileset”, 
FileCollection.AntType.FileSet)
-   Create a separate task for each affected sourceset and also one for the 
base project dir, each with correct includes/excludes

I gave up, as my time was limited and I was not able to quickly split the task 
into one for each surceset

Uwe


> Gradle's RAT task has missing inputs, so it can't figure out when to run
> 
>
> Key: LUCENE-9977
> URL: https://issues.apache.org/jira/browse/LUCENE-9977
> Project: Lucene - Core
>  Issue Type: Task
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Dawid Weiss
>Priority: Major
>
> This also affects Solr!
> [~romseygeek] wrote:
> {quote}
> There’s a subject line I never thought I’d type :)
> Firstly: can I say how much I appreciate all the work that’s gone into the 
> gradle build? I’ve been doing lots of small PRs for the spans-to-queries work 
> and being able to run checks multiple times in an extremely efficient manner 
> has been a life saver.  Massive thanks to Dawid, and also to Robert for all 
> the work on speeding up tests.
> I think may have found a bug in the input configuration for our license 
> header checks.  Thanks to the new build, I have been running `./gradlew 
> check` before pushing code, but it has let through files with missing headers 
> a few times, which were subsequently caught by the GitHub action running on 
> the PR.
> So I tried the following:
> - start a new git branch
> - run ./gradlew rat -> everything should pass
> - edit one of the files to remove the license header
> - run ./gradlew rat -> still passes!
> - run ./gradlew clean
> - run ./gradlew rat -> now I get an error
> This looks to me like the fileset that the rat task is looking at is not set 
> up correctly, but I don’t know enough gradle to actually work out what is 
> wrong and what the fix should be.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9589) Swedish Minimal Stemmer

2021-05-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/LUCENE-9589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351653#comment-17351653
 ] 

Jan Høydahl commented on LUCENE-9589:
-

I got feedback in PR (thanks [~karl.wettin]), intending to clean up review 
comments and then target 9.0 in a few days.

> Swedish Minimal Stemmer
> ---
>
> Key: LUCENE-9589
> URL: https://issues.apache.org/jira/browse/LUCENE-9589
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Swedish has a {{SwedishLightStemmer}} but lacks a Minimal stemmer that would 
> only stem singular/plural.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] glawson0 commented on a change in pull request #146: LUCENE-9963 Add tests for alternate path failures in FlattenGraphFilter

2021-05-26 Thread GitBox



glawson0 commented on a change in pull request #146:
URL: https://github.com/apache/lucene/pull/146#discussion_r639529557



##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestFlattenGraphFilter.java
##
@@ -314,5 +314,116 @@ public void testTwoLongParallelPaths() throws Exception {
 11);
   }
 
+  // The end node the long path is supposed to flatten over doesn't exist
+  @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-9963;)
+  public void testAltPathFirstStepHole() throws Exception {
+TokenStream in =
+new CannedTokenStream(
+0,
+3,
+new Token[] {token("abc", 1, 3, 0, 3), token("b", 1, 1, 1, 2), 
token("c", 1, 1, 2, 3)});
+
+TokenStream out = new FlattenGraphFilter(in);
+
+assertTokenStreamContents(

Review comment:
   I couldn't find any tools within Lucene for visualizing `dot` graphs. 
There are a few tests that print out `dot` graphs when `tests.verbose=true`. I 
assume the idea is for the user to invoke graphviz themselves. We could 
continue with this pattern with the random and specific case tests. Is that 
what you're thinking about?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9976) WANDScorer assertion error in ensureConsistent

2021-05-26 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-9976:
---

 Summary: WANDScorer assertion error in ensureConsistent
 Key: LUCENE-9976
 URL: https://issues.apache.org/jira/browse/LUCENE-9976
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Dawid Weiss


Build fails and is reproducible:
https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/283/console

{code}
./gradlew test --tests TestExpressionSorts.testQueries 
-Dtests.seed=FF571CE915A0955 -Dtests.multiplier=2 -Dtests.nightly=true 
-Dtests.slow=true -Dtests.asserts=true -p lucene/expressions/
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9974) The test-framework module should apply the test ruleset for forbidden APIs.

2021-05-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351627#comment-17351627
 ] 

ASF subversion and git services commented on LUCENE-9974:
-

Commit 5912e65434ab26cc0c9b12dbea4b4f59d4089308 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5912e65 ]

LUCENE-9974: The test-framework module should apply the test ruleset for 
forbidden APIs. (#153)



> The test-framework module should apply the test ruleset for forbidden APIs.
> ---
>
> Key: LUCENE-9974
> URL: https://issues.apache.org/jira/browse/LUCENE-9974
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9974) The test-framework module should apply the test ruleset for forbidden APIs.

2021-05-26 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9974.
-
Fix Version/s: main (9.0)
   Resolution: Fixed

> The test-framework module should apply the test ruleset for forbidden APIs.
> ---
>
> Key: LUCENE-9974
> URL: https://issues.apache.org/jira/browse/LUCENE-9974
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss merged pull request #153: LUCENE-9974: The test-framework module should apply the test ruleset for forbidden APIs

2021-05-26 Thread GitBox



dweiss merged pull request #153:
URL: https://github.com/apache/lucene/pull/153


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

2021-05-26 Thread GitBox



zacharymorn commented on pull request #128:
URL: https://github.com/apache/lucene/pull/128#issuecomment-848533196


   I just tried to run `CheckIndex#checkIndex` via command line with index 
built for other task, but it failed from `Version` check (even without my 
latest changes):
   ```
   if (major > 255 || major < 0) {
   throw new IllegalArgumentException("Illegal major version: " + major);
   }
   ```
   I'll dig into that more tomorrow. 
   
   From the unit test `TestCheckIndex#testCheckIndexAllValid`, I'm able to get 
this log output:
   
   ```
   0.00% total deletions; 5 documents; 0 deleteions
   Segments file=segments_1 numSegments=1 version=9.0.0 
id=bqdz36uoui8ymz6pd2tp9wmet
   1 of 1: name=_0 maxDoc=5
   version=9.0.0
   id=bqdz36uoui8ymz6pd2tp9wmeq
   codec=Asserting(Lucene90)
   compound=false
   numFiles=26
   sort=!
   size (MB)=0.004
   diagnostics = {source=flush, os.arch=x86_64, 
java.runtime.version=11.0.9+11, os.version=10.15.5, java.vendor=AdoptOpenJDK, 
os=Mac OS X, timestamp=1622013701421, java.version=11.0.9, 
java.vm.version=11.0.9+11, lucene.version=9.0.0}
   no deletions
   test: open reader.OK [took 0.004 sec]
   test: check integrity.OK [took 0.001 sec]
   test: check live docs.OK [took 0.000 sec]
   
   test: field infos.OK [8 fields] [took 0.000 sec]
   
   test: field norms.OK [1 fields] [took 0.000 sec]
   
   test: stored fields...OK [8 total field count; avg 1.6 fields per 
doc] [took 0.001 sec]
   
   test: terms, freq, prox...OK [5 terms; 8 terms/docs pairs; 12 tokens] 
[took 0.004 sec]
   
   test: docvalues...OK [2 docvalues fields; 0 BINARY; 2 NUMERIC; 0 
SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] [took 0.002 sec]
   
   test: term vectorsOK [4 total term vector count; avg 1.0 
term/freq vector fields per doc] [took 0.006 sec]
   
   test: vectors..OK [2 fields, 8 vectors] [took 0.001 sec]
   
   test: index sort..OK [took 0.002 sec]
   
   test: check soft deletes.
   test: points..OK [1 fields, 4 points] [took 0.005 sec]
   
   
   No problems were detected with this index.
   
   Took 0.046 sec total.
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

2021-05-26 Thread GitBox



zacharymorn commented on a change in pull request #128:
URL: https://github.com/apache/lucene/pull/128#discussion_r639460784



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -1058,16 +1261,14 @@ public Status checkIndex(List onlySegments) 
throws IOException {
 
   msg(
   infoStream,
+  segmentId + partId,
   String.format(
   Locale.ROOT,
   "OK [%d fields] [took %.3f sec]",
   status.totFields,
   nsToSec(System.nanoTime() - startNS)));
 } catch (Throwable e) {
-  if (failFast) {
-throw IOUtils.rethrowAlways(e);
-  }
-  msg(infoStream, "ERROR [" + String.valueOf(e.getMessage()) + "]");
+  msg(infoStream, segmentId + partId, "ERROR [" + 
String.valueOf(e.getMessage()) + "]");

Review comment:
   I've removed the use of `segmentId` and `partId` as part of 57f542f.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

2021-05-26 Thread GitBox



zacharymorn commented on a change in pull request #128:
URL: https://github.com/apache/lucene/pull/128#discussion_r639460089



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -731,74 +810,173 @@ public Status checkIndex(List onlySegments) 
throws IOException {
 }
 
 if (checksumsOnly == false) {
+  // This redundant assignment is done to make compiler happy
+  SegmentReader finalReader = reader;
+
   // Test Livedocs
-  segInfoStat.liveDocStatus = testLiveDocs(reader, infoStream, 
failFast);
+  CompletableFuture testliveDocs =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testLiveDocs(finalReader, infoStream, segmentId),
+  liveDocStatus -> segInfoStat.liveDocStatus = liveDocStatus);
 
   // Test Fieldinfos
-  segInfoStat.fieldInfoStatus = testFieldInfos(reader, infoStream, 
failFast);
+  CompletableFuture testFieldInfos =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testFieldInfos(finalReader, infoStream, segmentId),
+  fieldInfoStatus -> segInfoStat.fieldInfoStatus = 
fieldInfoStatus);
 
   // Test Field Norms
-  segInfoStat.fieldNormStatus = testFieldNorms(reader, infoStream, 
failFast);
+  CompletableFuture testFieldNorms =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testFieldNorms(finalReader, infoStream, segmentId),
+  fieldNormStatus -> segInfoStat.fieldNormStatus = 
fieldNormStatus);
 
   // Test the Term Index
-  segInfoStat.termIndexStatus =
-  testPostings(reader, infoStream, verbose, doSlowChecks, 
failFast);
+  CompletableFuture testTermIndex =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testPostings(finalReader, infoStream, segmentId, 
verbose, doSlowChecks),
+  termIndexStatus -> segInfoStat.termIndexStatus = 
termIndexStatus);
 
   // Test Stored Fields
-  segInfoStat.storedFieldStatus = testStoredFields(reader, infoStream, 
failFast);
+  CompletableFuture testStoredFields =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testStoredFields(finalReader, infoStream, segmentId),
+  storedFieldStatus -> segInfoStat.storedFieldStatus = 
storedFieldStatus);
 
   // Test Term Vectors
-  segInfoStat.termVectorStatus =
-  testTermVectors(reader, infoStream, verbose, doSlowChecks, 
failFast);
+  CompletableFuture testTermVectors =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testTermVectors(finalReader, infoStream, segmentId, 
verbose, doSlowChecks),
+  termVectorStatus -> segInfoStat.termVectorStatus = 
termVectorStatus);
 
   // Test Docvalues
-  segInfoStat.docValuesStatus = testDocValues(reader, infoStream, 
failFast);
+  CompletableFuture testDocValues =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testDocValues(finalReader, infoStream, segmentId),
+  docValuesStatus -> segInfoStat.docValuesStatus = 
docValuesStatus);
 
   // Test PointValues
-  segInfoStat.pointsStatus = testPoints(reader, infoStream, failFast);
+  CompletableFuture testPointvalues =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testPoints(finalReader, infoStream, segmentId),
+  pointsStatus -> segInfoStat.pointsStatus = pointsStatus);
 
   // Test VectorValues
-  segInfoStat.vectorValuesStatus = testVectors(reader, infoStream, 
failFast);
+  CompletableFuture testVectors =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testVectors(finalReader, infoStream, segmentId),
+  vectorValuesStatus -> segInfoStat.vectorValuesStatus = 
vectorValuesStatus);
 
   // Test index sort
-  segInfoStat.indexSortStatus = testSort(reader, indexSort, 
infoStream, failFast);
+  CompletableFuture testSort =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testSort(finalReader, indexSort, infoStream, 
segmentId),
+  indexSortStatus -> segInfoStat.indexSortStatus = 
indexSortStatus);
+
+  CompletableFuture testSoftDeletes = null;
+  final String softDeletesField = 
reader.getFieldInfos().getSoftDeletesField();
+  if (softDeletesField != null) {
+testSoftDeletes =
+runAsyncSegmentPartCheck(
+executorService,
+() ->

[GitHub] [lucene] zacharymorn commented on a change in pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

2021-05-26 Thread GitBox



zacharymorn commented on a change in pull request #128:
URL: https://github.com/apache/lucene/pull/128#discussion_r639459734



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -731,74 +810,173 @@ public Status checkIndex(List onlySegments) 
throws IOException {
 }
 
 if (checksumsOnly == false) {
+  // This redundant assignment is done to make compiler happy
+  SegmentReader finalReader = reader;
+
   // Test Livedocs
-  segInfoStat.liveDocStatus = testLiveDocs(reader, infoStream, 
failFast);
+  CompletableFuture testliveDocs =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testLiveDocs(finalReader, infoStream, segmentId),
+  liveDocStatus -> segInfoStat.liveDocStatus = liveDocStatus);
 
   // Test Fieldinfos
-  segInfoStat.fieldInfoStatus = testFieldInfos(reader, infoStream, 
failFast);
+  CompletableFuture testFieldInfos =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testFieldInfos(finalReader, infoStream, segmentId),
+  fieldInfoStatus -> segInfoStat.fieldInfoStatus = 
fieldInfoStatus);
 
   // Test Field Norms
-  segInfoStat.fieldNormStatus = testFieldNorms(reader, infoStream, 
failFast);
+  CompletableFuture testFieldNorms =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testFieldNorms(finalReader, infoStream, segmentId),
+  fieldNormStatus -> segInfoStat.fieldNormStatus = 
fieldNormStatus);
 
   // Test the Term Index
-  segInfoStat.termIndexStatus =
-  testPostings(reader, infoStream, verbose, doSlowChecks, 
failFast);
+  CompletableFuture testTermIndex =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testPostings(finalReader, infoStream, segmentId, 
verbose, doSlowChecks),
+  termIndexStatus -> segInfoStat.termIndexStatus = 
termIndexStatus);
 
   // Test Stored Fields
-  segInfoStat.storedFieldStatus = testStoredFields(reader, infoStream, 
failFast);
+  CompletableFuture testStoredFields =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testStoredFields(finalReader, infoStream, segmentId),
+  storedFieldStatus -> segInfoStat.storedFieldStatus = 
storedFieldStatus);
 
   // Test Term Vectors
-  segInfoStat.termVectorStatus =
-  testTermVectors(reader, infoStream, verbose, doSlowChecks, 
failFast);
+  CompletableFuture testTermVectors =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testTermVectors(finalReader, infoStream, segmentId, 
verbose, doSlowChecks),
+  termVectorStatus -> segInfoStat.termVectorStatus = 
termVectorStatus);
 
   // Test Docvalues
-  segInfoStat.docValuesStatus = testDocValues(reader, infoStream, 
failFast);
+  CompletableFuture testDocValues =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testDocValues(finalReader, infoStream, segmentId),
+  docValuesStatus -> segInfoStat.docValuesStatus = 
docValuesStatus);
 
   // Test PointValues
-  segInfoStat.pointsStatus = testPoints(reader, infoStream, failFast);
+  CompletableFuture testPointvalues =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testPoints(finalReader, infoStream, segmentId),
+  pointsStatus -> segInfoStat.pointsStatus = pointsStatus);
 
   // Test VectorValues
-  segInfoStat.vectorValuesStatus = testVectors(reader, infoStream, 
failFast);
+  CompletableFuture testVectors =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testVectors(finalReader, infoStream, segmentId),
+  vectorValuesStatus -> segInfoStat.vectorValuesStatus = 
vectorValuesStatus);
 
   // Test index sort
-  segInfoStat.indexSortStatus = testSort(reader, indexSort, 
infoStream, failFast);
+  CompletableFuture testSort =
+  runAsyncSegmentPartCheck(
+  executorService,
+  () -> testSort(finalReader, indexSort, infoStream, 
segmentId),
+  indexSortStatus -> segInfoStat.indexSortStatus = 
indexSortStatus);
+
+  CompletableFuture testSoftDeletes = null;
+  final String softDeletesField = 
reader.getFieldInfos().getSoftDeletesField();
+  if (softDeletesField != null) {
+testSoftDeletes =
+runAsyncSegmentPartCheck(
+executorService,
+() ->

[GitHub] [lucene] zacharymorn commented on a change in pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

2021-05-26 Thread GitBox



zacharymorn commented on a change in pull request #128:
URL: https://github.com/apache/lucene/pull/128#discussion_r639457972



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -700,29 +771,37 @@ public Status checkIndex(List onlySegments) 
throws IOException {
 
 if (reader.hasDeletions()) {
   if (reader.numDocs() != info.info.maxDoc() - info.getDelCount()) {
-throw new RuntimeException(
+throw new CheckIndexException(
+segmentId,
+"",

Review comment:
   I've removed the use of `segmentId` and `partId` as part of 
https://github.com/apache/lucene/pull/128/commits/57f542f48ee08ec2bd63520c43deb0734455bd28.
 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

2021-05-26 Thread GitBox



zacharymorn commented on a change in pull request #128:
URL: https://github.com/apache/lucene/pull/128#discussion_r639455850



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -926,17 +1100,19 @@ public Status checkIndex(List onlySegments) 
throws IOException {
* @lucene.experimental
*/
   public static Status.LiveDocStatus testLiveDocs(
-  CodecReader reader, PrintStream infoStream, boolean failFast) throws 
IOException {
+  CodecReader reader, PrintStream infoStream, String segmentId) {
 long startNS = System.nanoTime();
+String segmentPartId = segmentId + "[LiveDocs]";
 final Status.LiveDocStatus status = new Status.LiveDocStatus();
 
 try {
-  if (infoStream != null) infoStream.print("test: check live 
docs.");
+  if (infoStream != null) infoStream.print(segmentPartId + "test: 
check live docs.");

Review comment:
   I've implemented it here 
https://github.com/apache/lucene/pull/128/commits/57f542f48ee08ec2bd63520c43deb0734455bd28
 . The per part messages should be printed as soon as each concurrent check 
finishes, and without locking since the shared `PrintStream` object already 
locks internally.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

2021-05-26 Thread GitBox



zacharymorn commented on a change in pull request #128:
URL: https://github.com/apache/lucene/pull/128#discussion_r639455850



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -926,17 +1100,19 @@ public Status checkIndex(List onlySegments) 
throws IOException {
* @lucene.experimental
*/
   public static Status.LiveDocStatus testLiveDocs(
-  CodecReader reader, PrintStream infoStream, boolean failFast) throws 
IOException {
+  CodecReader reader, PrintStream infoStream, String segmentId) {
 long startNS = System.nanoTime();
+String segmentPartId = segmentId + "[LiveDocs]";
 final Status.LiveDocStatus status = new Status.LiveDocStatus();
 
 try {
-  if (infoStream != null) infoStream.print("test: check live 
docs.");
+  if (infoStream != null) infoStream.print(segmentPartId + "test: 
check live docs.");

Review comment:
   I've implemented it here 
https://github.com/apache/lucene/pull/128/commits/57f542f48ee08ec2bd63520c43deb0734455bd28
 . The per part messages should be printed as soon as each concurrent check 
finishes, and without locking since the shared `PrintStream` object already 
handles locking internally.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn edited a comment on pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

2021-05-26 Thread GitBox



zacharymorn edited a comment on pull request #128:
URL: https://github.com/apache/lucene/pull/128#issuecomment-848482687


   Thanks @mikemccand for the feedback comment, as well as running the index 
check and posting the results here! I was planning to do that next after adding 
some more tests, but you beat me to it.
   
   > More tests are always welcome! But I don't think that should block this 
change -- we can open a follow-on issue to get better direct unit tests for 
CheckIndex? Those would be fun tests to write: make a healthy index, then make 
a random single bit change to one of its files, and then see if CheckIndex 
catches it. Hmm I think we have such a test somewhere :) But not apparently in 
BaseTestCheckIndex...
   
   Sounds good. Yes I think I came across them before, but didn't recall now 
exactly where they are now...but will work on them in a follow-up PR.
   
   
   > I think the issue is that opts.threadCount is 0 if you don't explicitly 
set the thread count. Can we fix it to default to number of cores on the box, 
maybe capped at a maximum (4?
   8?), when CheckIndex is invoked interactively from the command-line?
   
   Ah sorry about this (embarrassing) bug! There was a default 1 for 
threadCount set in the code, but when it was not provided via the command 
-line, the default was overwritten by 0, causing this exception to be thrown. 
I'll fix it and cap at 4.
   
   
   > I think we should try to remove the duplicate segment/partId (e.g.[Segment 
614][StoredFields]) in some lines? 
   
   > But the output is jumbled, I think because we are missing newlines 
somewhere, or maybe necessary locking?
   
   Yes these repeated segment / part ids are due to concurrent threads printing 
messages without newlines:
   
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L934.
 I also took a look at the implementation of `PrintStream#println` and it is 
synchronized on `this`, so it shouldn't require additional locking (assuming 
its sub classes are also similar). 
   
   I think this is indeed a problem with this approach to use segment / part id 
to organize messages, as it still requires certain way of printing the 
messages. I'll switch over to the other approach then to use per part buffer. 
   
   > Hmm, also why are we calling it Segment 614 when its name is _h2? Hmm, is 
that the decimal translation of the base36 value?
   
   This was due to `segmentName` was used for id there, but it should actually 
use `info.info.name` . I'll fix that.
   
   
   
   > Finally I ran CheckIndex -threadCount 128
   
   > It went a bit faster! (203 down to 176 seconds).
   
   Glad to see it actually improved the speed there :D ! I think 128 might be 
too big of a threadCount though for the current implementation, as it only 
parallelize (up to 11) part checking in each segment at a time. I'll cap the 
threadCount to 11 and print out a message to alert the user if big value was 
passed in as well 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

60 matches

Mail list logo