[GitHub] [lucene-solr] magibney closed pull request #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

2022-04-07 Thread GitBox


magibney closed pull request #892: LUCENE-8972: Add ICUTransformCharFilter, to 
support pre-tokenizer ICU text transformation
URL: https://github.com/apache/lucene-solr/pull/892


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] magibney commented on pull request #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

2022-04-07 Thread GitBox


magibney commented on PR #892:
URL: https://github.com/apache/lucene-solr/pull/892#issuecomment-1092433057

   superseded by: https://github.com/apache/lucene/pull/15


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-07 Thread GitBox


gautamworah96 commented on PR #762:
URL: https://github.com/apache/lucene/pull/762#issuecomment-1092327314

   Ooof this new commit was quite a journey. The test case sporadically started 
failing after I added the two use cases (of testing both the DTR and ARDTR). 
This led me down to debugging and figuring out that the test-framework randomly 
adds files and folders to empty directories just to chaos test the setup. It 
then also randomly disables deletes in directories through VirusChecker and 
WindowsFS. Accounting for these factors finally made the test case work.
   
   We now also check explictly for the older label "a" and ensure that the new 
commit label "b" can't be found.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-07 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r845651281


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {
+final Path taxoPath1 = createTempDir("dir1");
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir("commit1");
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  commit1.copyFrom(dir1, file, file, IOContext.READ);
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+// using a DirectoryTaxonomyReader here will cause the test to fail and 
throw a AIOOB exception

Review Comment:
   Added in 
[09b8b51](https://github.com/apache/lucene/pull/762/commits/09b8b51fe6e21f3d852b29e763eb56034c99bedf)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-07 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r845651072


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/AlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+/**
+ * A modified DirectoryTaxonomyReader that always recreates a new {@link
+ * AlwaysRefreshDirectoryTaxonomyReader} instance when {@link
+ * AlwaysRefreshDirectoryTaxonomyReader#doOpenIfChanged()} is called. This 
enables us to easily go
+ * forward or backward in time by re-computing the ordinal space during each 
refresh.

Review Comment:
   Fixed in 
[09b8b51](https://github.com/apache/lucene/pull/762/commits/09b8b51fe6e21f3d852b29e763eb56034c99bedf)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-07 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r845651006


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {
+final Path taxoPath1 = createTempDir("dir1");
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir("commit1");
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  commit1.copyFrom(dir1, file, file, IOContext.READ);
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+// using a DirectoryTaxonomyReader here will cause the test to fail and 
throw a AIOOB exception
+// in maybeRefresh()
+final DirectoryTaxonomyReader dtr1 = new 
AlwaysRefreshDirectoryTaxonomyReader(dir1);
+final SearcherTaxonomyManager mgr = new SearcherTaxonomyManager(dr1, dtr1, 
null);
+
+final FacetsConfig config = new FacetsConfig();
+final SearcherTaxonomyManager.SearcherAndTaxonomy pair = mgr.acquire();
+final FacetsCollector sfc = new FacetsCollector();
+/**
+ * the call flow here initializes {@link 
DirectoryTaxonomyReader#taxoArrays}. These reused
+ * `taxoArrays` form the basis of the inconsistency *
+ */
+getTaxonomyFacetCounts(pair.taxonomyReader, config, sfc);
+
+// now try to go back to checkpoint 1 and refresh the 
SearcherTaxonomyManager
+
+// delete all files from commit2
+for (String file : dir1.listAll()) {
+  dir1.deleteFile(file);
+}
+
+// copy all index files from commit1
+for (String file : commit1.listAll()) {
+  dir1.copyFrom(commit1, file, file, IOContext.READ);
+}
+
+mgr.maybeRefresh();
+IOUtils.close(mgr, dtr1, dr1);

Review Comment:
   Added in commit 
[09b8b51](https://github.com/apache/lucene/pull/762/commits/09b8b51fe6e21f3d852b29e763eb56034c99bedf)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()

2022-04-07 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519232#comment-17519232
 ] 

Chris M. Hostetter edited comment on LUCENE-10292 at 4/8/22 12:11 AM:
--

{quote} ... except a typo in tests "testDurringReBuild" should probably be 
"testDuringRebuild"? 
{quote}
HA! .. yeah, thanks.  I renamed them all {{testLookupsDuringReBuild()}}

 
I'll plan to commit to main & backport to 9x tomorrow unless there is any other 
feedback.
 


was (Author: hossman):
{quote} ... except a typo in tests "testDurringReBuild" should probably be 
"testDuringRebuild"? 

HA! .. yeah, thanks.  I renamed them all {{testLookupsDuringReBuild()}}
{quote}
 
I'll plan to commit to main & backport to 9x tomorrow unless there is any other 
feedback.
 

> AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
> 
>
> Key: LUCENE-10292
> URL: https://issues.apache.org/jira/browse/LUCENE-10292
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, 
> LUCENE-10292-3.patch, LUCENE-10292.patch
>
>
> I'm filing this based on anecdotal information from a Solr user w/o 
> experiencing it first hand (and I don't have a test case to demonstrate it) 
> but based on a reading of the code the underlying problem seems self 
> evident...
> With all other Lookup implementations I've examined, it is possible to call 
> {{lookup()}} regardless of whether another thread is concurrently calling 
> {{build()}} – in all cases I've seen, it is even possible to call 
> {{lookup()}} even if {{build()}} has never been called: the result is just an 
> "empty" {{List}} 
> Typically this is works because the {{build()}} method uses temporary 
> datastructures until it's "build logic" is complete, at which point it 
> atomically replaces the datastructures used by the {{lookup()}} method.   In 
> the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method 
> starts by closing & null'ing out the {{protected SearcherManager 
> searcherMgr}} (which it only populates again once it's completed building up 
> it's index) and then the lookup method starts with...
> {code:java}
> if (searcherMgr == null) {
>   throw new IllegalStateException("suggester was not built");
> }
> {code}
> ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any 
> situation where another thread may be calling 
> {{AnalyzingInfixSuggester.build()}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()

2022-04-07 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated LUCENE-10292:

Attachment: LUCENE-10292-3.patch
Status: Open  (was: Open)

{quote} ... except a typo in tests "testDurringReBuild" should probably be 
"testDuringRebuild"? 

HA! .. yeah, thanks.  I renamed them all {{testLookupsDuringReBuild()}}
{quote}
 
I'll plan to commit to main & backport to 9x tomorrow unless there is any other 
feedback.
 

> AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
> 
>
> Key: LUCENE-10292
> URL: https://issues.apache.org/jira/browse/LUCENE-10292
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, 
> LUCENE-10292-3.patch, LUCENE-10292.patch
>
>
> I'm filing this based on anecdotal information from a Solr user w/o 
> experiencing it first hand (and I don't have a test case to demonstrate it) 
> but based on a reading of the code the underlying problem seems self 
> evident...
> With all other Lookup implementations I've examined, it is possible to call 
> {{lookup()}} regardless of whether another thread is concurrently calling 
> {{build()}} – in all cases I've seen, it is even possible to call 
> {{lookup()}} even if {{build()}} has never been called: the result is just an 
> "empty" {{List}} 
> Typically this is works because the {{build()}} method uses temporary 
> datastructures until it's "build logic" is complete, at which point it 
> atomically replaces the datastructures used by the {{lookup()}} method.   In 
> the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method 
> starts by closing & null'ing out the {{protected SearcherManager 
> searcherMgr}} (which it only populates again once it's completed building up 
> it's index) and then the lookup method starts with...
> {code:java}
> if (searcherMgr == null) {
>   throw new IllegalStateException("suggester was not built");
> }
> {code}
> ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any 
> situation where another thread may be calling 
> {{AnalyzingInfixSuggester.build()}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()

2022-04-07 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519219#comment-17519219
 ] 

Michael Sokolov commented on LUCENE-10292:
--

It's surprising it took so long for someone to stumble over this. Maybe because 
most (as I used to do) rebuild their suggesters off line as part of a batch 
build process? Anyway I looked over the patch and didn't see any problem except 
a typo in tests "testDurringReBuild" should probably be "testDuringRebuild"? 
Thanks for the nice tests!

> AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
> 
>
> Key: LUCENE-10292
> URL: https://issues.apache.org/jira/browse/LUCENE-10292
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, 
> LUCENE-10292.patch
>
>
> I'm filing this based on anecdotal information from a Solr user w/o 
> experiencing it first hand (and I don't have a test case to demonstrate it) 
> but based on a reading of the code the underlying problem seems self 
> evident...
> With all other Lookup implementations I've examined, it is possible to call 
> {{lookup()}} regardless of whether another thread is concurrently calling 
> {{build()}} – in all cases I've seen, it is even possible to call 
> {{lookup()}} even if {{build()}} has never been called: the result is just an 
> "empty" {{List}} 
> Typically this is works because the {{build()}} method uses temporary 
> datastructures until it's "build logic" is complete, at which point it 
> atomically replaces the datastructures used by the {{lookup()}} method.   In 
> the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method 
> starts by closing & null'ing out the {{protected SearcherManager 
> searcherMgr}} (which it only populates again once it's completed building up 
> it's index) and then the lookup method starts with...
> {code:java}
> if (searcherMgr == null) {
>   throw new IllegalStateException("suggester was not built");
> }
> {code}
> ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any 
> situation where another thread may be calling 
> {{AnalyzingInfixSuggester.build()}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] Yuti-G commented on pull request #778: LUCENE-10495: Fix return statement of siblingsLoaded() in TaxonomyFacets

2022-04-07 Thread GitBox


Yuti-G commented on PR #778:
URL: https://github.com/apache/lucene/pull/778#issuecomment-1092196648

   Thanks @gsmiller! I resolved the conflicts and added an entry to 
CHANGES.txt. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10495) Fix return statement of siblingsLoaded() in TaxonomyFacets

2022-04-07 Thread Yuting Gan (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuting Gan updated LUCENE-10495:

Summary: Fix return statement of siblingsLoaded() in TaxonomyFacets  (was: 
Fix bug in TaxonomyFacets)

> Fix return statement of siblingsLoaded() in TaxonomyFacets
> --
>
> Key: LUCENE-10495
> URL: https://issues.apache.org/jira/browse/LUCENE-10495
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Yuting Gan
>Priority: Minor
> Attachments: Screen Shot 2022-03-30 at 8.02.15 PM.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Found a bug in TaxonomyFacets when trying to use the siblingsLoaded function. 
> siblingsLoaded() should return siblings != null and it returns children != 
> null currently. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #778: LUCENE-10495: Fix bug in TaxonomyFacets

2022-04-07 Thread GitBox


gsmiller commented on PR #778:
URL: https://github.com/apache/lucene/pull/778#issuecomment-1091998032

   @Yuti-G looks like a conflict needs to be resolved when you get a chance. 
Also, could you please add an entry to CHANGES.txt noting the bug fix please? 
We'll backport this to 9.2, so it would make sense to add it in the 9.2 
section. Thanks again for finding this and fixing it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #778: LUCENE-10495: Fix bug in TaxonomyFacets

2022-04-07 Thread GitBox


gsmiller commented on code in PR #778:
URL: https://github.com/apache/lucene/pull/778#discussion_r845387164


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/TaxonomyFacets.java:
##
@@ -109,7 +109,7 @@ public boolean childrenLoaded() {
* @lucene.experimental
*/
   public boolean siblingsLoaded() {
-return children != null;
+return siblings != null;

Review Comment:
   Thanks @Yuti-G . Looks good!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10507) Should it be more likely to search concurrently in tests?

2022-04-07 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519004#comment-17519004
 ] 

Adrien Grand commented on LUCENE-10507:
---

+1

> Should it be more likely to search concurrently in tests?
> -
>
> Key: LUCENE-10507
> URL: https://issues.apache.org/jira/browse/LUCENE-10507
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> As part of LUCENE-10002 we are migrating test usages of 
> IndexSearcher#search(Query, Collector) to use the corresponding search method 
> that takes a CollectorManager in place of a Collector. As part of such 
> changes, I've been paying attention to whether searchers are created through 
> LuceneTestCase#newSearcher and migrating to it when possible.
> This caused some recent test failures following test changes, which were in 
> most cases test issues, although they were quite rare due to the fact that we 
> only rarely exercise the concurrent code-path in tests.
> One recent failure uncovered LUCENE-10500, which was an actual bug that 
> affected concurrent searches only, and was uncovered by a test run that 
> indexed a considerable amount of docs and was lucky enough to get an executor 
> set to its index searcher as well as get multiple slices.
> LuceneTestCase#newIndexSearcher(IndexReader) uses threads only rarely, and 
> even when useThreads is true, the searcher may not get an executor set. Also, 
> it can often happen that despite an executor is set, the searcher will hold 
> only one slice, as not enough documents are indexed. Some nightly tests index 
> enough documents, and LuceneTestCase also lowers the slice limits but only 
> 50% of the times and only when wrapWithAssertions is false. Also I wonder if 
> the lower limits are low enough:
> {code:java}
> int maxDocPerSlice = 1 + random.nextInt(10);
> int maxSegmentsPerSlice = 1 + random.nextInt(20);
> {code}
> All in all, I wonder if we should make it more likely for real concurrent 
> searches to happen while testing across multiple slices. It seems like it 
> could be useful especially as we'd like users to use collector managers 
> instead of collectors (although that does not necessarily translate to 
> concurrent search).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10444) Support alternate aggregation functions in association facets

2022-04-07 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller resolved LUCENE-10444.
--
Fix Version/s: 9.2
   Resolution: Fixed

> Support alternate aggregation functions in association facets
> -
>
> Key: LUCENE-10444
> URL: https://issues.apache.org/jira/browse/LUCENE-10444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We currently only support {{sum}} aggregations in the various association 
> facet implementations. I'd be really interested in extending the association 
> facet implementations to support other aggregations, starting with {{max}} 
> and {{min}} (in addition to {{{}sum{}}}). 
> I've been sketching up a prototype of this and I think I have a reasonable 
> way to introduce this idea. Will get a PR out for feedback soon.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10488) Optimize Facets#getTopDims across Facets implementations

2022-04-07 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518939#comment-17518939
 ] 

Greg Miller commented on LUCENE-10488:
--

Very exciting. Thanks [~yutinggan]! Also, please note that the refactoring 
change I mentioned above for association facets is now merged (LUCENE-10444), 
so it should be easy now to move forward with optimizations there as well if 
you're interested (or if anyone else is interested). Thanks again!

> Optimize Facets#getTopDims across Facets implementations
> 
>
> Key: LUCENE-10488
> URL: https://issues.apache.org/jira/browse/LUCENE-10488
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> LUCENE-10325 added a new {{getTopDims}} API, allowing users to specify the 
> number of "top" dimensions they want. The default implementation just 
> delegates to {{getAllDims}} and returns the number of top dims requested, but 
> some Facets sub-classes can do this more optimally. LUCENE-10325 demonstrated 
> this in {{SortedSetDocValueFacetCounts}}, but we can take it further. There's 
> at least some opportunity to do better in:
> * {{ConcurrentSortedSetDocValuesFacetCounts}}
> * {{FastTaxonomyFacetCounts}}
> * {{TaxonomyFacetSumFloatAssociations}}
> * {{TaxonomyFacetSumIntAssociations}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10507) Should it be more likely to search concurrently in tests?

2022-04-07 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518935#comment-17518935
 ] 

Greg Miller commented on LUCENE-10507:
--

+1. I think this is a great idea!

> Should it be more likely to search concurrently in tests?
> -
>
> Key: LUCENE-10507
> URL: https://issues.apache.org/jira/browse/LUCENE-10507
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> As part of LUCENE-10002 we are migrating test usages of 
> IndexSearcher#search(Query, Collector) to use the corresponding search method 
> that takes a CollectorManager in place of a Collector. As part of such 
> changes, I've been paying attention to whether searchers are created through 
> LuceneTestCase#newSearcher and migrating to it when possible.
> This caused some recent test failures following test changes, which were in 
> most cases test issues, although they were quite rare due to the fact that we 
> only rarely exercise the concurrent code-path in tests.
> One recent failure uncovered LUCENE-10500, which was an actual bug that 
> affected concurrent searches only, and was uncovered by a test run that 
> indexed a considerable amount of docs and was lucky enough to get an executor 
> set to its index searcher as well as get multiple slices.
> LuceneTestCase#newIndexSearcher(IndexReader) uses threads only rarely, and 
> even when useThreads is true, the searcher may not get an executor set. Also, 
> it can often happen that despite an executor is set, the searcher will hold 
> only one slice, as not enough documents are indexed. Some nightly tests index 
> enough documents, and LuceneTestCase also lowers the slice limits but only 
> 50% of the times and only when wrapWithAssertions is false. Also I wonder if 
> the lower limits are low enough:
> {code:java}
> int maxDocPerSlice = 1 + random.nextInt(10);
> int maxSegmentsPerSlice = 1 + random.nextInt(20);
> {code}
> All in all, I wonder if we should make it more likely for real concurrent 
> searches to happen while testing across multiple slices. It seems like it 
> could be useful especially as we'd like users to use collector managers 
> instead of collectors (although that does not necessarily translate to 
> concurrent search).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10444) Support alternate aggregation functions in association facets

2022-04-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518933#comment-17518933
 ] 

ASF subversion and git services commented on LUCENE-10444:
--

Commit 9e10ba02ec350f267458926035bb172ea82291b9 in lucene's branch 
refs/heads/branch_9x from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9e10ba02ec3 ]

LUCENE-10444: Support alternate aggregation functions in association facets 
(#719)



> Support alternate aggregation functions in association facets
> -
>
> Key: LUCENE-10444
> URL: https://issues.apache.org/jira/browse/LUCENE-10444
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We currently only support {{sum}} aggregations in the various association 
> facet implementations. I'd be really interested in extending the association 
> facet implementations to support other aggregations, starting with {{max}} 
> and {{min}} (in addition to {{{}sum{}}}). 
> I've been sketching up a prototype of this and I think I have a reasonable 
> way to introduce this idea. Will get a PR out for feedback soon.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #719: LUCENE-10444 BACKPORT: Support alternate aggregation functions in association facets

2022-04-07 Thread GitBox


gsmiller merged PR #719:
URL: https://github.com/apache/lucene/pull/719


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta opened a new pull request, #801: LUCENE-10493: Unify token Type enum in kuromoji and nori

2022-04-07 Thread GitBox


mocobeta opened a new pull request, #801:
URL: https://github.com/apache/lucene/pull/801

   Both `JapaneseTokenizer.Type` and `KoreanTokenizer.Type` enums are identical 
and they should be placed in the `o.a.l.a.morph` package in analysis-common (so 
that `o.a.l.a.morph.Token` can have the type of this token as its property).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10493) Can we unify the viterbi search logic in the tokenizers of kuromoji and nori?

2022-04-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518852#comment-17518852
 ] 

ASF subversion and git services commented on LUCENE-10493:
--

Commit 9aa8ec9d06a2b271559ec0a93e1405239bbb6af2 in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9aa8ec9d06a ]

LUCENE-10493: Unify TokenInfoFST in kuromoji and nori (#795)



> Can we unify the viterbi search logic in the tokenizers of kuromoji and nori?
> -
>
> Key: LUCENE-10493
> URL: https://issues.apache.org/jira/browse/LUCENE-10493
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We now have common dictionary interfaces for kuromoji and nori 
> ([LUCENE-10393]). A natural question would be: is it possible to unify the 
> Japanese/Korean tokenizers? 
> The core methods of the two tokenizers are `parse()` and `backtrace()` to 
> calculate the minimum cost path by Viterbi search. I'd set the goal of this 
> issue to factoring out them into a separate class (in analysis-common) that 
> is shared between JapaneseTokenizer and KoreanTokenizer. 
> The algorithm to solve the minimum cost path itself is of course 
> language-agnostic, so I think it should be theoretically possible; the most 
> difficult part here might be the N-best path calculation - which is supported 
> only by JapaneseTokenizer and not by KoreanTokenizer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10493) Can we unify the viterbi search logic in the tokenizers of kuromoji and nori?

2022-04-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518851#comment-17518851
 ] 

ASF subversion and git services commented on LUCENE-10493:
--

Commit 4d2b08554a1908d4ec90ed2cb91bab4f4b29b2d3 in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4d2b08554a1 ]

LUCENE-10493: add 'backWordPos' array to JapaneseTokenizer.Position (#793)



> Can we unify the viterbi search logic in the tokenizers of kuromoji and nori?
> -
>
> Key: LUCENE-10493
> URL: https://issues.apache.org/jira/browse/LUCENE-10493
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We now have common dictionary interfaces for kuromoji and nori 
> ([LUCENE-10393]). A natural question would be: is it possible to unify the 
> Japanese/Korean tokenizers? 
> The core methods of the two tokenizers are `parse()` and `backtrace()` to 
> calculate the minimum cost path by Viterbi search. I'd set the goal of this 
> issue to factoring out them into a separate class (in analysis-common) that 
> is shared between JapaneseTokenizer and KoreanTokenizer. 
> The algorithm to solve the minimum cost path itself is of course 
> language-agnostic, so I think it should be theoretically possible; the most 
> difficult part here might be the N-best path calculation - which is supported 
> only by JapaneseTokenizer and not by KoreanTokenizer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta merged pull request #795: LUCENE-10493: Unify TokenInfoFST in kuromoji and nori

2022-04-07 Thread GitBox


mocobeta merged PR #795:
URL: https://github.com/apache/lucene/pull/795


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta merged pull request #793: LUCENE-10493: add 'backWordPos' array to JapaneseTokenizer.Position

2022-04-07 Thread GitBox


mocobeta merged PR #793:
URL: https://github.com/apache/lucene/pull/793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10507) Should it be more likely to search concurrently in tests?

2022-04-07 Thread Luca Cavanna (Jira)
Luca Cavanna created LUCENE-10507:
-

 Summary: Should it be more likely to search concurrently in tests?
 Key: LUCENE-10507
 URL: https://issues.apache.org/jira/browse/LUCENE-10507
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Luca Cavanna


As part of LUCENE-10002 we are migrating test usages of 
IndexSearcher#search(Query, Collector) to use the corresponding search method 
that takes a CollectorManager in place of a Collector. As part of such changes, 
I've been paying attention to whether searchers are created through 
LuceneTestCase#newSearcher and migrating to it when possible.

This caused some recent test failures following test changes, which were in 
most cases test issues, although they were quite rare due to the fact that we 
only rarely exercise the concurrent code-path in tests.

One recent failure uncovered LUCENE-10500, which was an actual bug that 
affected concurrent searches only, and was uncovered by a test run that indexed 
a considerable amount of docs and was lucky enough to get an executor set to 
its index searcher as well as get multiple slices.

LuceneTestCase#newIndexSearcher(IndexReader) uses threads only rarely, and even 
when useThreads is true, the searcher may not get an executor set. Also, it can 
often happen that despite an executor is set, the searcher will hold only one 
slice, as not enough documents are indexed. Some nightly tests index enough 
documents, and LuceneTestCase also lowers the slice limits but only 50% of the 
times and only when wrapWithAssertions is false. Also I wonder if the lower 
limits are low enough:


{code:java}
int maxDocPerSlice = 1 + random.nextInt(10);
int maxSegmentsPerSlice = 1 + random.nextInt(20);
{code}


All in all, I wonder if we should make it more likely for real concurrent 
searches to happen while testing across multiple slices. It seems like it could 
be useful especially as we'd like users to use collector managers instead of 
collectors (although that does not necessarily translate to concurrent search).




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova opened a new pull request, #800: Make constructor for QueryOffsetRange public

2022-04-07 Thread GitBox


mayya-sharipova opened a new pull request, #800:
URL: https://github.com/apache/lucene/pull/800

   QueryOffsetRange is a public class and is used in other classes
   (e.g. FieldValueHighlighters needs it).
   Make it constructor public as well to be used in other packages


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #799: LUCENE-10506: change visibility of ProfilerCollector#deriveCollectorName to protected

2022-04-07 Thread GitBox


javanna opened a new pull request, #799:
URL: https://github.com/apache/lucene/pull/799

   This allows subclasses to extend how the inner collector name is derived.
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10506) ProfilerCollector to support customizing how name is derived

2022-04-07 Thread Luca Cavanna (Jira)
Luca Cavanna created LUCENE-10506:
-

 Summary: ProfilerCollector to support customizing how name is 
derived
 Key: LUCENE-10506
 URL: https://issues.apache.org/jira/browse/LUCENE-10506
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/sandbox
Reporter: Luca Cavanna


ProfilerCollector (part of the sandbox) has a private method called 
deriveCollectorName that extracts the class simple name from the provided 
collector and sets it as the name of the collector which becomes part of the 
profile results later.

While the default behaviour is reasonable, there are cases where it would be 
useful to extend this logic, and perhaps not use class names, or enhance that 
with more context that the collectors could provide. This could be achieved by 
making the deriveCollectorName method protected.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10505) Cleanup oal.util.Constants to use java.lang.Runtime.Version and remove outdated versions

2022-04-07 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10505:
---
Description: 
In oal.util.Constants we have some constants about the 64 bitness and Java 
version and vendor info. Especially theres also parsing of system properties to 
get major and minor Java version.

We should change this in main and 9.x to use 
[https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Runtime.html#version()]
 and the corresponding {{Runtime.Version}} class. The {{Runtime.Version}} class 
also allows to compare in a safe way. The good thing is that you also get minor 
bugfix info, so code could disable stuff exactly at specific versions that are 
buggy.

We should also cleanup the constants. In 9.x and main we still have 
{{JRE_IS_MINIMUM_JAVA8}}! We should remove this constants (+ deprecate in 9.x 
and set to true) and change all code relying on the to execute code for java 8. 
Same for 11 and 17.

  was:
In oal.util.Constants we have some constants about the 64 bitness and Java 
version and vendor info. Especially theres also parsing of system properties to 
get major and minor Java version.

We should change this in main and 9.x to use 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Runtime.html#version()
 and the version class. The version class also allows to compare in a safe way. 
The good thing is that you also get minor bugfix info, so code could disable 
stuff exactly at specific versions that are buggy.

We should also cleanup the constants. In 9.x and main we still have 
`JRE_IS_MINIMUM_JAVA8`! We should remove this constants (+ deprecate in 9.x and 
set to true) and change all code relying on the to execute code for java 8. 
Same for 11 and 17.


> Cleanup oal.util.Constants to use java.lang.Runtime.Version and remove 
> outdated versions
> 
>
> Key: LUCENE-10505
> URL: https://issues.apache.org/jira/browse/LUCENE-10505
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: 9.0, 9.1, 10.0 (main)
>Reporter: Uwe Schindler
>Priority: Major
>  Labels: cleanup
>
> In oal.util.Constants we have some constants about the 64 bitness and Java 
> version and vendor info. Especially theres also parsing of system properties 
> to get major and minor Java version.
> We should change this in main and 9.x to use 
> [https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Runtime.html#version()]
>  and the corresponding {{Runtime.Version}} class. The {{Runtime.Version}} 
> class also allows to compare in a safe way. The good thing is that you also 
> get minor bugfix info, so code could disable stuff exactly at specific 
> versions that are buggy.
> We should also cleanup the constants. In 9.x and main we still have 
> {{JRE_IS_MINIMUM_JAVA8}}! We should remove this constants (+ deprecate in 9.x 
> and set to true) and change all code relying on the to execute code for java 
> 8. Same for 11 and 17.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10436) Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery into a single FieldExistsQuery?

2022-04-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518661#comment-17518661
 ] 

ASF subversion and git services commented on LUCENE-10436:
--

Commit c7619544c5a39ee9ce7b6084328b38d21ef18709 in lucene's branch 
refs/heads/branch_9x from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c7619544c5a ]

LUCENE-10436: (Backport) Remove usage of DocValuesFieldExistsQuery, 
NormsFieldExistsQuery and KnnVectorFieldExistsQuery (#798)



> Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and 
> KnnVectorFieldExistsQuery into a single FieldExistsQuery?
> --
>
> Key: LUCENE-10436
> URL: https://issues.apache.org/jira/browse/LUCENE-10436
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Now that we require consistency across data structures, we could merge 
> DocValuesFieldExistsQuery, NormsFieldExistsQuery and 
> KnnVectorFieldExistsQuery together into a FieldExistsQuery that would require 
> that the field indexes either norms, doc values or vectors?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn merged pull request #798: LUCENE-10436: (Backport) Remove usage of DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-07 Thread GitBox


zacharymorn merged PR #798:
URL: https://github.com/apache/lucene/pull/798


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn merged pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-07 Thread GitBox


zacharymorn merged PR #790:
URL: https://github.com/apache/lucene/pull/790


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-07 Thread GitBox


zacharymorn commented on PR #790:
URL: https://github.com/apache/lucene/pull/790#issuecomment-1091228578

   Thanks @jpountz !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10505) Cleanup oal.util.Constants to use java.lang.Runtime.Version and remove outdated versions

2022-04-07 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10505:
---
Labels: cleanup  (was: )

> Cleanup oal.util.Constants to use java.lang.Runtime.Version and remove 
> outdated versions
> 
>
> Key: LUCENE-10505
> URL: https://issues.apache.org/jira/browse/LUCENE-10505
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: 9.0, 9.1, 10.0 (main)
>Reporter: Uwe Schindler
>Priority: Major
>  Labels: cleanup
>
> In oal.util.Constants we have some constants about the 64 bitness and Java 
> version and vendor info. Especially theres also parsing of system properties 
> to get major and minor Java version.
> We should change this in main and 9.x to use 
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Runtime.html#version()
>  and the version class. The version class also allows to compare in a safe 
> way. The good thing is that you also get minor bugfix info, so code could 
> disable stuff exactly at specific versions that are buggy.
> We should also cleanup the constants. In 9.x and main we still have 
> `JRE_IS_MINIMUM_JAVA8`! We should remove this constants (+ deprecate in 9.x 
> and set to true) and change all code relying on the to execute code for java 
> 8. Same for 11 and 17.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10505) Cleanup oal.util.Constants to use java.lang.Runtime.Version and remove outdated versions

2022-04-07 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10505:
---
Affects Version/s: 9.1
   9.0
   10.0 (main)

> Cleanup oal.util.Constants to use java.lang.Runtime.Version and remove 
> outdated versions
> 
>
> Key: LUCENE-10505
> URL: https://issues.apache.org/jira/browse/LUCENE-10505
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: 9.0, 9.1, 10.0 (main)
>Reporter: Uwe Schindler
>Priority: Major
>
> In oal.util.Constants we have some constants about the 64 bitness and Java 
> version and vendor info. Especially theres also parsing of system properties 
> to get major and minor Java version.
> We should change this in main and 9.x to use 
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Runtime.html#version()
>  and the version class. The version class also allows to compare in a safe 
> way. The good thing is that you also get minor bugfix info, so code could 
> disable stuff exactly at specific versions that are buggy.
> We should also cleanup the constants. In 9.x and main we still have 
> `JRE_IS_MINIMUM_JAVA8`! We should remove this constants (+ deprecate in 9.x 
> and set to true) and change all code relying on the to execute code for java 
> 8. Same for 11 and 17.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10505) Cleanup oal.util.Constants to use java.lang.Runtime.Version and remove outdated versions

2022-04-07 Thread Uwe Schindler (Jira)
Uwe Schindler created LUCENE-10505:
--

 Summary: Cleanup oal.util.Constants to use 
java.lang.Runtime.Version and remove outdated versions
 Key: LUCENE-10505
 URL: https://issues.apache.org/jira/browse/LUCENE-10505
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Uwe Schindler


In oal.util.Constants we have some constants about the 64 bitness and Java 
version and vendor info. Especially theres also parsing of system properties to 
get major and minor Java version.

We should change this in main and 9.x to use 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Runtime.html#version()
 and the version class. The version class also allows to compare in a safe way. 
The good thing is that you also get minor bugfix info, so code could disable 
stuff exactly at specific versions that are buggy.

We should also cleanup the constants. In 9.x and main we still have 
`JRE_IS_MINIMUM_JAVA8`! We should remove this constants (+ deprecate in 9.x and 
set to true) and change all code relying on the to execute code for java 8. 
Same for 11 and 17.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on a diff in pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-07 Thread GitBox


zacharymorn commented on code in PR #790:
URL: https://github.com/apache/lucene/pull/790#discussion_r844802728


##
lucene/core/src/java/org/apache/lucene/search/UsageTrackingQueryCachingPolicy.java:
##
@@ -58,12 +58,6 @@ private static boolean shouldNeverCache(Query query) {
   return true;
 }
 
-if (query instanceof DocValuesFieldExistsQuery) {
-  // We do not bother caching DocValuesFieldExistsQuery queries since they 
are already plenty
-  // fast.
-  return true;
-}

Review Comment:
   Makes sense. I put it back in 
https://github.com/apache/lucene/pull/790/commits/1f428d50e78d27f4322e54d7170c66dff64d14a0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-07 Thread GitBox


jpountz commented on code in PR #790:
URL: https://github.com/apache/lucene/pull/790#discussion_r844759861


##
lucene/core/src/java/org/apache/lucene/search/UsageTrackingQueryCachingPolicy.java:
##
@@ -58,12 +58,6 @@ private static boolean shouldNeverCache(Query query) {
   return true;
 }
 
-if (query instanceof DocValuesFieldExistsQuery) {
-  // We do not bother caching DocValuesFieldExistsQuery queries since they 
are already plenty
-  // fast.
-  return true;
-}

Review Comment:
   I feel good about not having a benchmark for this. The reasoning is that if 
the index has a data structure that supports running the query very 
efficiently, then we should just use it and skip caching. And we have this for 
doc values, norms and vectors. In contrast, boolean queries for instance need 
to reconcile multiple queries together, which has overhead.
   
   So +1 to exclude FieldExistsQuery from caching entirely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-07 Thread GitBox


zacharymorn commented on PR #790:
URL: https://github.com/apache/lucene/pull/790#issuecomment-1091130105

   > Great. We should backport these changes but the actual removals to 9.x to 
address deprecation warnings.
   
   Thanks for the review! I've created the backporting PR for 9.x here 
https://github.com/apache/lucene/pull/798


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn opened a new pull request, #798: LUCENE-10436: (Backport) Remove usage of DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-07 Thread GitBox


zacharymorn opened a new pull request, #798:
URL: https://github.com/apache/lucene/pull/798

   Backporting PR https://github.com/apache/lucene/pull/790 without removal of 
the deprecated queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10436) Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery into a single FieldExistsQuery?

2022-04-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518601#comment-17518601
 ] 

ASF subversion and git services commented on LUCENE-10436:
--

Commit a42326b9ef90a77910a7dcaf46997b53da6266b1 in lucene's branch 
refs/heads/branch_9x from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a42326b9ef9 ]

LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and 
KnnVectorFieldExistsQuery with FieldExistsQuery (#767) (#791)



> Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and 
> KnnVectorFieldExistsQuery into a single FieldExistsQuery?
> --
>
> Key: LUCENE-10436
> URL: https://issues.apache.org/jira/browse/LUCENE-10436
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Now that we require consistency across data structures, we could merge 
> DocValuesFieldExistsQuery, NormsFieldExistsQuery and 
> KnnVectorFieldExistsQuery together into a FieldExistsQuery that would require 
> that the field indexes either norms, doc values or vectors?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn merged pull request #791: LUCENE-10436: (Backporting) Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery with FieldExistsQuery

2022-04-07 Thread GitBox


zacharymorn merged PR #791:
URL: https://github.com/apache/lucene/pull/791


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on a diff in pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-07 Thread GitBox


zacharymorn commented on code in PR #790:
URL: https://github.com/apache/lucene/pull/790#discussion_r844733404


##
lucene/core/src/java/org/apache/lucene/search/UsageTrackingQueryCachingPolicy.java:
##
@@ -58,12 +58,6 @@ private static boolean shouldNeverCache(Query query) {
   return true;
 }
 
-if (query instanceof DocValuesFieldExistsQuery) {
-  // We do not bother caching DocValuesFieldExistsQuery queries since they 
are already plenty
-  // fast.
-  return true;
-}

Review Comment:
   Oh sorry I should have added a nocommit for this. Given `FieldExistsQuery` 
now supports norms and vectors in addition to doc values, would not caching for 
also norms and vectors here hurt performance, if we were to have similar 
instance of check for `FieldExistsQuery`? I'm also wondering if there's a 
luceneutil like benchmark for these as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-07 Thread GitBox


zacharymorn commented on PR #790:
URL: https://github.com/apache/lucene/pull/790#issuecomment-1091119984

   > > For the change entry, I assume this should go into version 10.0.0?
   > 
   > Yes, we need a CHANGES entry under 10.0.0 and a new entry in 
`lucene/MIGRATE.txt` that recommends replacing `DocValueFieldExistsQuery` and 
others with `FieldExistsQuery`.
   
   Sounds good. Added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org