[jira] [Updated] (SOLR-15178) Non-existent dependency listed in solr-core
[ https://issues.apache.org/jira/browse/SOLR-15178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bence Szabó updated SOLR-15178: --- Description: Solr-core has a dependency, org.apache.solr:*server*, which fails to download. For testing I created a test project here: [https://github.com/bszabo97/solr_master_dep_test] If I run the command {{gradle -q dependencies --configuration solrCore}} the dependency org.apache.solr:server shows up and fails, though if I run {{gradle -q dependencies --configuration solrCore8}} it doesn't show up at all. was: Solr-core has a dependency, org.apache.solr:*server*, which fails to download. For testing I created a test project here: https://github.com/bszabo97/solr_master_dep_test If I run the command {{gradle -q dependencies --configuration solrCore}} the dependency org.apache.solr:server shows up and fails, though if I run {{gradle -q dependencies --configuration solrCore8}} it doesn't shop up at all. > Non-existent dependency listed in solr-core > --- > > Key: SOLR-15178 > URL: https://issues.apache.org/jira/browse/SOLR-15178 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (9.0) >Reporter: Bence Szabó >Priority: Major > > Solr-core has a dependency, org.apache.solr:*server*, which fails to download. > For testing I created a test project here: > [https://github.com/bszabo97/solr_master_dep_test] > If I run the command {{gradle -q dependencies --configuration solrCore}} the > dependency org.apache.solr:server shows up and fails, though if I run > {{gradle -q dependencies --configuration solrCore8}} it doesn't show up at > all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15178) Non-existent dependency listed in solr-core
Bence Szabó created SOLR-15178: -- Summary: Non-existent dependency listed in solr-core Key: SOLR-15178 URL: https://issues.apache.org/jira/browse/SOLR-15178 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: master (9.0) Reporter: Bence Szabó Solr-core has a dependency, org.apache.solr:*server*, which fails to download. For testing I created a test project here: https://github.com/bszabo97/solr_master_dep_test If I run the command {{gradle -q dependencies --configuration solrCore}} the dependency org.apache.solr:server shows up and fails, though if I run {{gradle -q dependencies --configuration solrCore8}} it doesn't shop up at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9798) Fix looping bug when calculating full KNN results in KnnGraphTester
Nitiraj Rathore created LUCENE-9798: --- Summary: Fix looping bug when calculating full KNN results in KnnGraphTester Key: LUCENE-9798 URL: https://issues.apache.org/jira/browse/LUCENE-9798 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Nitiraj Rathore There is a minor looping bug when generating Full KNN results for comparison with HNSW in KnnGraphTester. Basically, the result in [this line|https://github.com/apache/lucene-solr/blob/a53e8e722884e5655206292590da67bb71efc34d/lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java#L551] should be calculated after the while loop finishes. Without this fix, vector files with ~2GB will work fine but if size of files are more than that, it may cause incorrect Full KNN results. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2356: SOLR-15152: Export Tool should export nested docs cleanly in .json, .jsonl, and javabin
dsmiley commented on a change in pull request #2356: URL: https://github.com/apache/lucene-solr/pull/2356#discussion_r579854438 ## File path: solr/core/src/java/org/apache/solr/util/ExportTool.java ## @@ -319,6 +369,91 @@ private Object constructDateStr(Object field) { return field; } } + + static class JsonlSink extends DocsSink { +private CharArr charArr = new CharArr(1024 * 2); +JSONWriter jsonWriter = new JSONWriter(charArr, -1); +private Writer writer; + +public JsonlSink(Info info) { + this.info = info; +} + +@Override +public void start() throws IOException { + fos = new FileOutputStream(info.out); + if(info.out.endsWith(".jsonl.gz")) { +fos = new GZIPOutputStream(fos); + } + if (info.bufferSize > 0) { +fos = new BufferedOutputStream(fos, info.bufferSize); + } + writer = new OutputStreamWriter(fos, StandardCharsets.UTF_8); + +} + +@Override +public void end() throws IOException { + writer.flush(); + fos.flush(); + fos.close(); +} + +@Override +@SuppressWarnings({"unchecked", "rawtypes"}) +public synchronized void accept(SolrDocument doc) throws IOException { + charArr.reset(); + int mapSize = doc._size(); + if(doc.hasChildDocuments()) { +mapSize ++; + } + Map m = new LinkedHashMap(mapSize); + + doc.forEach((s, field) -> { +if (s.equals("_version_") || s.equals("_root_")) return; +if (field instanceof List) { + if (((List) field).size() == 1) { +field = ((List) field).get(0); + } +} +field = constructDateStr(field); +if (field instanceof List) { + List list = (List) field; + if (hasdate(list)) { +ArrayList listCopy = new ArrayList<>(list.size()); +for (Object o : list) listCopy.add(constructDateStr(o)); +field = listCopy; + } +} +m.put(s, field); + }); + if (doc.hasChildDocuments()) { +m.put("_childDocuments_", doc.getChildDocuments()); + } + jsonWriter.write(m); + writer.write(charArr.getArray(), charArr.getStart(), charArr.getEnd()); + writer.append('\n'); + super.accept(doc); +} + +private boolean hasdate(@SuppressWarnings({"rawtypes"})List list) { + boolean hasDate = false; + for (Object o : list) { +if(o instanceof Date){ + hasDate = true; + break; +} + } + return hasDate; +} + +private Object constructDateStr(Object field) { + if (field instanceof Date) { +field = DateTimeFormatter.ISO_INSTANT.format(Instant.ofEpochMilli(((Date) field).getTime())); Review comment: date.toInstant().toString() is equivalent. Oh yeah; this is copy-pasted code :-/ change or not as you wish. ## File path: solr/core/src/java/org/apache/solr/util/ExportTool.java ## @@ -129,23 +130,51 @@ public void setLimit(String maxDocsStr) { } public void setOutFormat(String out, String format) { - this.format = format; - if (format == null) format = "jsonl"; + if (format == null) { +format = "json"; + } if (!formats.contains(format)) { throw new IllegalArgumentException("format must be one of :" + formats); } + this.format = format; this.out = out; if (this.out == null) { -this.out = JAVABIN.equals(format) ? -coll + ".javabin" : -coll + ".json"; +this.out = coll + getOutExtension(); } } + +String getOutExtension() { + String extension = null; + switch (format) { +case JAVABIN: + extension = ".javabin"; + break; +case JSON: + extension = ".json"; + break; +case "jsonl": + extension = ".jsonl"; + break; + } + return extension; +} DocsSink getSink() { - return JAVABIN.equals(format) ? new JavabinSink(this) : new JsonSink(this); + DocsSink docSink = null; + switch (format) { +case JAVABIN: Review comment: I think I'd prefer that these "case" values be consistently represented -- either string literals or references to something. It's okay if they are all string literals... I'm not a believer in the school of thought that all literals must be constantly defined; it's hard to read such. ## File path: solr/solr-ref-guide/src/solr-control-script-reference.adoc ## @@ -876,8 +876,8 @@ Examples of this command: == Exporting Documents to a File -The `export` command will allow you to export documents from a collection in either JSON or Javabin format. -All documents can be exported, or only those that match a query. +The `export` command will allow you to export documents from a collection in JSON, https://jsonlines.org/[J
[jira] [Commented] (SOLR-15164) Task Management Interface
[ https://issues.apache.org/jira/browse/SOLR-15164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288171#comment-17288171 ] Atri Sharma commented on SOLR-15164: [~epugh] Yes, we could use sync requestID to track the long running task and invoke cancellation on the Async cancellable future associated with the ID. > Task Management Interface > - > > Key: SOLR-15164 > URL: https://issues.apache.org/jira/browse/SOLR-15164 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Atri Sharma >Assignee: Atri Sharma >Priority: Major > Attachments: RoughOutlineTaskManagementInterface.png > > Time Spent: 20m > Remaining Estimate: 0h > > This Jira talks about the task management interface capability in Solr. > > The task management interface allows existing tasks to declare that they are > cancellable and trackable using newly defined parameters. Once a task is > started with these parameters defined, task management interface allows the > following operations: > 1. List all active cancellable tasks currently running. > 2. Cancel a specific task. > 3. Query the status of a specific task. > > Query UUID can be autogenerated or a custom UUID can be specified when > starting the task. > > At the moment, this is supported for queries. > > Attached is an outline of the framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9722) Aborted merge can leak readers if the output is empty
[ https://issues.apache.org/jira/browse/LUCENE-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nhat Nguyen updated LUCENE-9722: Fix Version/s: (was: 8.0.1) Affects Version/s: (was: 8.7) 8.8 > Aborted merge can leak readers if the output is empty > - > > Key: LUCENE-9722 > URL: https://issues.apache.org/jira/browse/LUCENE-9722 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: master (9.0), 8.8 >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > Fix For: master (9.0), 8.9 > > Time Spent: 1h > Remaining Estimate: 0h > > We fail to close the merged readers of an aborted merge if its output segment > contains no document. > This bug was discovered by a test in Elasticsearch > ([elastic/elasticsearch#67884|https://github.com/elastic/elasticsearch/issues/67884]). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf commented on pull request #2411: SOLR-13696 Simplify routed alias tests to avoid flakiness, improve debugging
gus-asf commented on pull request #2411: URL: https://github.com/apache/lucene-solr/pull/2411#issuecomment-782967326 @tflobbe Please review what I did to make enable the uploading of config sets within tests possible. When I tried to re-enable this test it was failing it was untrusted and no auth (changes for https://issues.apache.org/jira/browse/SOLR-14663 ) ... certainly open to other suggestions, but it didn't seem reasonable to be trying to set up authentication just for a test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13696) DimensionalRoutedAliasUpdateProcessorTest / RoutedAliasUpdateProcessorTest failures due commitWithin/openSearcher delays
[ https://issues.apache.org/jira/browse/SOLR-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288112#comment-17288112 ] Gus Heck commented on SOLR-13696: - Finally coming back to this. In retrospect I think this test was overzealous. "Commit within" is a feature that is really orthogonal to routed aliases and there's no good reason to believe that it would succeed or fail differently than a regular commit. Removing this aspect of the test simplifies the code, makes the test faster and probably costs us little or nothing in terms of safety. > DimensionalRoutedAliasUpdateProcessorTest / RoutedAliasUpdateProcessorTest > failures due commitWithin/openSearcher delays > > > Key: SOLR-13696 > URL: https://issues.apache.org/jira/browse/SOLR-13696 > Project: Solr > Issue Type: Test >Reporter: Chris M. Hostetter >Assignee: Gus Heck >Priority: Major > Attachments: thetaphi_Lucene-Solr-8.x-MacOSX_272.log.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Recent jenkins failure... > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-MacOSX/272/ > Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC > {noformat} > Stack Trace: > java.lang.AssertionError: expected:<16> but was:<15> > at > __randomizedtesting.SeedInfo.seed([DB6DC28D5560B1D2:E295833E1541FDB9]:0) > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.assertCatTimeInvariants(DimensionalRoutedAliasUpdateProcessorTest.java:677 > ) > at > org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.testTimeCat(DimensionalRoutedAliasUpdateProcessorTest.java:282) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > {noformat} > Digging into the logs, the problem appears to be in the way the test > verifies/assumes docs have been committed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf opened a new pull request #2411: SOLR-13696 Simplify routed alias tests to avoid flakiness, improve debugging
gus-asf opened a new pull request #2411: URL: https://github.com/apache/lucene-solr/pull/2411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on pull request #2186: LUCENE-9334 Consistency of field data structures
mayya-sharipova commented on pull request #2186: URL: https://github.com/apache/lucene-solr/pull/2186#issuecomment-782960621 @jpountz Adrien, thank a lot for the initial review. Sorry for the delay, the patch turned out to be much more involved than I expected. There are still few tests failing, but I will try to fix them soon. > Can you give a bit more details about how this PR works at a high level? E.g. how does it handle the case when two threads are concurrently trying to add a field with different schemas? --- **Segment level**: ensuring a field has the same schema across all the documents of the segment. - We use `FieldSchema` for this, that's an internal field of `PerField`. It represents a schema of the field in the current document. - With every new document we reset `FieldSchema`. As the document fields are processed, we update the schema with options encountered in this document. Once the processing for the document is done, we compare the built `FieldSchema` of the current document with the corresponding `FieldInfo `(FieldInfo is built on a first document in which we encounter this field). Relevant code in `IndexingChaing:processDocument`: ```java ... if (pf.fieldGen != fieldGen) { // first time we see this field in this document pf.reset(docID); } ... for (int i = 0; i < fieldCount; i++) { PerField pf = fields[i]; if (pf.fieldInfo == null) { // the first time we see this field in this segment initializeFieldInfo(pf); } else { pf.schema.assertSameSchema(pf.fieldInfo); } } ``` --- **Index level**: ensuring a field has the same schema across all the documents of the index. - This check is done in `FieldInfos`. - When we encounter a new field in a segment, we try to initialize it in `IndexingChain::initializeFieldInfo` with the options from the current `FieldSchema`. This calls `FieldInfos.Builder::add` -> `FieldInfos.Builder::addField`. - The first thing in `FieldInfos.Builder::addField` is to call `globalFieldNumbers.addOrGet` with the given schema options. `globalFieldNumbers.addOrGet` is a synchronized method, and as I understood `FieldNumbers` are shared across all indexing threads of the same index. - `globalFieldNumbers.addOrGet` for a field it sees the first time, will initialize all its maps `indexOptions`, `docValuesType` etc. If the field already exists, it will check that the given in parameters schema options are the same as stored in its maps for the given field. - As `globalFieldNumbers.addOrGet` is a synchronized method only a single thread will be able to initialize schema options for a field. All other threads that deal with the same field, must confirm with the same field schema. @jpountz Is my assumption and logic correct here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2186: LUCENE-9334 Consistency of field data structures
mayya-sharipova commented on a change in pull request #2186: URL: https://github.com/apache/lucene-solr/pull/2186#discussion_r579885622 ## File path: lucene/core/src/java/org/apache/lucene/index/FieldInfo.java ## @@ -130,127 +167,255 @@ public boolean checkConsistency() { } } -if (pointDimensionCount < 0) { +if (docValuesType == null) { + throw new IllegalStateException("DocValuesType must not be null (field: '" + name + "')"); +} +if (dvGen != -1 && docValuesType == DocValuesType.NONE) { throw new IllegalStateException( - "pointDimensionCount must be >= 0; got " + pointDimensionCount); + "field '" + + name + + "' cannot have a docvalues update generation without having docvalues"); } +if (pointDimensionCount < 0) { + throw new IllegalStateException( + "pointDimensionCount must be >= 0; got " + + pointDimensionCount + + " (field: '" + + name + + "')"); +} if (pointIndexDimensionCount < 0) { throw new IllegalStateException( - "pointIndexDimensionCount must be >= 0; got " + pointIndexDimensionCount); + "pointIndexDimensionCount must be >= 0; got " + + pointIndexDimensionCount + + " (field: '" + + name + + "')"); } - if (pointNumBytes < 0) { - throw new IllegalStateException("pointNumBytes must be >= 0; got " + pointNumBytes); + throw new IllegalStateException( + "pointNumBytes must be >= 0; got " + pointNumBytes + " (field: '" + name + "')"); } if (pointDimensionCount != 0 && pointNumBytes == 0) { throw new IllegalStateException( - "pointNumBytes must be > 0 when pointDimensionCount=" + pointDimensionCount); + "pointNumBytes must be > 0 when pointDimensionCount=" + + pointDimensionCount + + " (field: '" + + name + + "')"); } - if (pointIndexDimensionCount != 0 && pointDimensionCount == 0) { throw new IllegalStateException( - "pointIndexDimensionCount must be 0 when pointDimensionCount=0"); + "pointIndexDimensionCount must be 0 when pointDimensionCount=0" + + " (field: '" + + name + + "')"); } - if (pointNumBytes != 0 && pointDimensionCount == 0) { throw new IllegalStateException( - "pointDimensionCount must be > 0 when pointNumBytes=" + pointNumBytes); + "pointDimensionCount must be > 0 when pointNumBytes=" + + pointNumBytes + + " (field: '" + + name + + "')"); } -if (dvGen != -1 && docValuesType == DocValuesType.NONE) { +if (vectorSearchStrategy == null) { throw new IllegalStateException( - "field '" - + name - + "' cannot have a docvalues update generation without having docvalues"); + "Vector search strategy must not be null (field: '" + name + "')"); } - if (vectorDimension < 0) { - throw new IllegalStateException("vectorDimension must be >=0; got " + vectorDimension); + throw new IllegalStateException( + "vectorDimension must be >=0; got " + vectorDimension + " (field: '" + name + "')"); } - if (vectorDimension == 0 && vectorSearchStrategy != VectorValues.SearchStrategy.NONE) { throw new IllegalStateException( - "vector search strategy must be NONE when dimension = 0; got " + vectorSearchStrategy); + "vector search strategy must be NONE when dimension = 0; got " + + vectorSearchStrategy + + " (field: '" + + name + + "')"); } - return true; } - // should only be called by FieldInfos#addOrUpdate - void update( - boolean storeTermVector, + void verifySameSchema( + IndexOptions indexOptions, boolean omitNorms, boolean storePayloads, - IndexOptions indexOptions, - Map attributes, + boolean storeTermVector, + DocValuesType docValuesType, + long dvGen, int dimensionCount, int indexDimensionCount, - int dimensionNumBytes) { -if (indexOptions == null) { - throw new NullPointerException("IndexOptions must not be null (field: \"" + name + "\")"); -} -// System.out.println("FI.update field=" + name + " indexed=" + indexed + " omitNorms=" + -// omitNorms + " this.omitNorms=" + this.omitNorms); -if (this.indexOptions != indexOptions) { - if (this.indexOptions == IndexOptions.NONE) { -this.indexOptions = indexOptions; - } else if (indexOptions != IndexOptions.NONE) { -throw new IllegalArgumentException( -"cannot change field \"" -+ name -+ "\" from index options=" -+ this.indexOptions -
[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2186: LUCENE-9334 Consistency of field data structures
mayya-sharipova commented on a change in pull request #2186: URL: https://github.com/apache/lucene-solr/pull/2186#discussion_r579885602 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexingChain.java ## @@ -1313,4 +1259,110 @@ public void recycleIntBlocks(int[][] blocks, int offset, int length) { bytesUsed.addAndGet(-(length * (IntBlockPool.INT_BLOCK_SIZE * Integer.BYTES))); } } + + private static final class FieldSchema { Review comment: Addressed in 6e7540ebd0ef79536cffabcf0ddc7a592b792252 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #2186: LUCENE-9334 Consistency of field data structures
mayya-sharipova commented on a change in pull request #2186: URL: https://github.com/apache/lucene-solr/pull/2186#discussion_r579885507 ## File path: lucene/core/src/java/org/apache/lucene/index/FieldInfo.java ## @@ -130,127 +167,255 @@ public boolean checkConsistency() { } } -if (pointDimensionCount < 0) { +if (docValuesType == null) { + throw new IllegalStateException("DocValuesType must not be null (field: '" + name + "')"); +} +if (dvGen != -1 && docValuesType == DocValuesType.NONE) { throw new IllegalStateException( - "pointDimensionCount must be >= 0; got " + pointDimensionCount); + "field '" + + name + + "' cannot have a docvalues update generation without having docvalues"); } +if (pointDimensionCount < 0) { + throw new IllegalStateException( + "pointDimensionCount must be >= 0; got " + + pointDimensionCount + + " (field: '" + + name + + "')"); +} if (pointIndexDimensionCount < 0) { throw new IllegalStateException( - "pointIndexDimensionCount must be >= 0; got " + pointIndexDimensionCount); + "pointIndexDimensionCount must be >= 0; got " + + pointIndexDimensionCount + + " (field: '" + + name + + "')"); } - if (pointNumBytes < 0) { - throw new IllegalStateException("pointNumBytes must be >= 0; got " + pointNumBytes); + throw new IllegalStateException( + "pointNumBytes must be >= 0; got " + pointNumBytes + " (field: '" + name + "')"); } if (pointDimensionCount != 0 && pointNumBytes == 0) { throw new IllegalStateException( - "pointNumBytes must be > 0 when pointDimensionCount=" + pointDimensionCount); + "pointNumBytes must be > 0 when pointDimensionCount=" + + pointDimensionCount + + " (field: '" + + name + + "')"); } - if (pointIndexDimensionCount != 0 && pointDimensionCount == 0) { throw new IllegalStateException( - "pointIndexDimensionCount must be 0 when pointDimensionCount=0"); + "pointIndexDimensionCount must be 0 when pointDimensionCount=0" + + " (field: '" + + name + + "')"); } - if (pointNumBytes != 0 && pointDimensionCount == 0) { throw new IllegalStateException( - "pointDimensionCount must be > 0 when pointNumBytes=" + pointNumBytes); + "pointDimensionCount must be > 0 when pointNumBytes=" + + pointNumBytes + + " (field: '" + + name + + "')"); } -if (dvGen != -1 && docValuesType == DocValuesType.NONE) { +if (vectorSearchStrategy == null) { throw new IllegalStateException( - "field '" - + name - + "' cannot have a docvalues update generation without having docvalues"); + "Vector search strategy must not be null (field: '" + name + "')"); } - if (vectorDimension < 0) { - throw new IllegalStateException("vectorDimension must be >=0; got " + vectorDimension); + throw new IllegalStateException( + "vectorDimension must be >=0; got " + vectorDimension + " (field: '" + name + "')"); } - if (vectorDimension == 0 && vectorSearchStrategy != VectorValues.SearchStrategy.NONE) { throw new IllegalStateException( - "vector search strategy must be NONE when dimension = 0; got " + vectorSearchStrategy); + "vector search strategy must be NONE when dimension = 0; got " + + vectorSearchStrategy + + " (field: '" + + name + + "')"); } - return true; } - // should only be called by FieldInfos#addOrUpdate - void update( - boolean storeTermVector, + void verifySameSchema( + IndexOptions indexOptions, boolean omitNorms, boolean storePayloads, - IndexOptions indexOptions, - Map attributes, + boolean storeTermVector, + DocValuesType docValuesType, + long dvGen, int dimensionCount, int indexDimensionCount, - int dimensionNumBytes) { -if (indexOptions == null) { - throw new NullPointerException("IndexOptions must not be null (field: \"" + name + "\")"); -} -// System.out.println("FI.update field=" + name + " indexed=" + indexed + " omitNorms=" + -// omitNorms + " this.omitNorms=" + this.omitNorms); -if (this.indexOptions != indexOptions) { - if (this.indexOptions == IndexOptions.NONE) { -this.indexOptions = indexOptions; - } else if (indexOptions != IndexOptions.NONE) { -throw new IllegalArgumentException( -"cannot change field \"" -+ name -+ "\" from index options=" -+ this.indexOptions -
[jira] [Commented] (SOLR-14499) New Solr TLP site
[ https://issues.apache.org/jira/browse/SOLR-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288097#comment-17288097 ] Jan Høydahl commented on SOLR-14499: See LUCENE-9797 for the Lucene-site changes that shuold be made when the new Solr site is live, including redirects from old to new. > New Solr TLP site > - > > Key: SOLR-14499 > URL: https://issues.apache.org/jira/browse/SOLR-14499 > Project: Solr > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > # Get setup solr-site repo (start from lucene-site repo) > # Setup a temporary "work in progress" page on solr.apache.org > # Remove all lucene TLP, lucene-core and pylucene content, including > templates and css etc > # Move Solr index.html as main index file > # Simplify folder structure for Pelican > # Publish the new site -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9797) Remove Solr from Lucene website
[ https://issues.apache.org/jira/browse/LUCENE-9797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288096#comment-17288096 ] Jan Høydahl commented on LUCENE-9797: - I have done this work in the branch {{main/lucene}} : [https://github.com/apache/lucene-site/tree/main/lucene] A separate staging site exists here: [https://lucene-new.staged.apache.org/] *NB:* This is built from the main/lucene branch and will not interfere with any changes done in master or production branches. The idea is to use this new branch as the new "main" branch when it is approved. h3. Testing When you review, you can test redirects, which for now will go to the Solr staging site ([https://lucene-solrtlp.staged.apache.org/]). * [https://lucene-new.staged.apache.org/solr/] -> [https://lucene-solrtlp.staged.apache.org/] (will be tp [https://solr.apache.org|https://solr.apache.org/]) * [https://lucene-new.staged.apache.org/solr/downloads.html] -> [https://lucene-solrtlp.staged.apache.org/downloads.html] * etc > Remove Solr from Lucene website > --- > > Key: LUCENE-9797 > URL: https://issues.apache.org/jira/browse/LUCENE-9797 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > * Remove Solr from the website > * Add a information text on TLP page about the move > * Add permanent redirect to new site in .htaccess -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9797) Remove Solr from Lucene website
Jan Høydahl created LUCENE-9797: --- Summary: Remove Solr from Lucene website Key: LUCENE-9797 URL: https://issues.apache.org/jira/browse/LUCENE-9797 Project: Lucene - Core Issue Type: Sub-task Reporter: Jan Høydahl Assignee: Jan Høydahl * Remove Solr from the website * Add a information text on TLP page about the move * Add permanent redirect to new site in .htaccess -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14761) Order new 'solr' git repo
[ https://issues.apache.org/jira/browse/SOLR-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288078#comment-17288078 ] Jan Høydahl commented on SOLR-14761: False alarm, need to wait for confirmation. > Order new 'solr' git repo > - > > Key: SOLR-14761 > URL: https://issues.apache.org/jira/browse/SOLR-14761 > Project: Solr > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > # Order from INFRA using self-serve tool > [https://selfserve.apache.org|https://selfserve.apache.org/] > # Once old lucene-solr repo is frozen (separate JIRA), clone it into the the > new 'solr' repo and start modifying -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (SOLR-14761) Order new 'solr' git repo
[ https://issues.apache.org/jira/browse/SOLR-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reopened SOLR-14761: > Order new 'solr' git repo > - > > Key: SOLR-14761 > URL: https://issues.apache.org/jira/browse/SOLR-14761 > Project: Solr > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > # Order from INFRA using self-serve tool > [https://selfserve.apache.org|https://selfserve.apache.org/] > # Once old lucene-solr repo is frozen (separate JIRA), clone it into the the > new 'solr' repo and start modifying -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14760) Order new Solr mailing lists
[ https://issues.apache.org/jira/browse/SOLR-14760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288071#comment-17288071 ] Anshum Gupta commented on SOLR-14760: - Speaking with Infra and some back and forth and figuring out what works best for the community, so far it seems like the following steps for the user list are a good way forward (copied from my discussion on #asfinfra): - Migrate solr-u...@lucene.apache.org to us...@solr.apache.org - Update the community to change their mail client rules - Everything else remains the same for the Solr user community w.r.t. the mailing list They replied with what seems like a +1 on this approach, but just clarifying before I request for the migration. Will also send an update to the user list about this change once we have confirmation on the path. > Order new Solr mailing lists > > > Key: SOLR-14760 > URL: https://issues.apache.org/jira/browse/SOLR-14760 > Project: Solr > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Anshum Gupta >Priority: Major > > 1. Use self-service tool > [https://selfserve.apache.org|https://selfserve.apache.org/] to create > mailing lists > (x) user@solr > (x) dev@solr > (x) issues@solr > (x) build@solr > 2. Send out an email to the old lists, requesting people to sign up for the > new lists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15160) Update cloud-dev/cloud.sh to work with gradle
[ https://issues.apache.org/jira/browse/SOLR-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288067#comment-17288067 ] ASF subversion and git services commented on SOLR-15160: Commit e420e6c8f6f4bc80575901f4b9adbe77001b5aeb in lucene-solr's branch refs/heads/master from Gus Heck [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e420e6c ] SOLR-15160 update cloud.sh (#2393) > Update cloud-dev/cloud.sh to work with gradle > - > > Key: SOLR-15160 > URL: https://issues.apache.org/jira/browse/SOLR-15160 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: scripts and tools >Reporter: Gus Heck >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Now that the gradle build is a bit more mature, we can update this tool to > smooth the creation of testing clusters on the local machine for master. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf merged pull request #2393: SOLR-15160 update cloud.sh
gus-asf merged pull request #2393: URL: https://github.com/apache/lucene-solr/pull/2393 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2356: SOLR-15152: Export Tool should export nested docs cleanly in .json, .jsonl, and javabin
dsmiley commented on a change in pull request #2356: URL: https://github.com/apache/lucene-solr/pull/2356#discussion_r579854153 ## File path: solr/core/src/java/org/apache/solr/util/ExportTool.java ## @@ -319,6 +369,91 @@ private Object constructDateStr(Object field) { return field; } } + + static class JsonlSink extends DocsSink { +private CharArr charArr = new CharArr(1024 * 2); +JSONWriter jsonWriter = new JSONWriter(charArr, -1); +private Writer writer; + +public JsonlSink(Info info) { + this.info = info; +} + +@Override +public void start() throws IOException { + fos = new FileOutputStream(info.out); + if(info.out.endsWith(".jsonl.gz")) { +fos = new GZIPOutputStream(fos); + } + if (info.bufferSize > 0) { +fos = new BufferedOutputStream(fos, info.bufferSize); + } + writer = new OutputStreamWriter(fos, StandardCharsets.UTF_8); + +} + +@Override +public void end() throws IOException { + writer.flush(); + fos.flush(); + fos.close(); +} + +@Override +@SuppressWarnings({"unchecked", "rawtypes"}) +public synchronized void accept(SolrDocument doc) throws IOException { + charArr.reset(); + int mapSize = doc._size(); + if(doc.hasChildDocuments()) { +mapSize ++; + } + Map m = new LinkedHashMap(mapSize); + + doc.forEach((s, field) -> { +if (s.equals("_version_") || s.equals("_root_")) return; Review comment: constants on IndexSchema This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14762) Fork the git repo into two new 'lucene' and 'solr'
[ https://issues.apache.org/jira/browse/SOLR-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288066#comment-17288066 ] Jan Høydahl commented on SOLR-14762: New solr git repo is created. > Fork the git repo into two new 'lucene' and 'solr' > -- > > Key: SOLR-14762 > URL: https://issues.apache.org/jira/browse/SOLR-14762 > Project: Solr > Issue Type: Sub-task >Reporter: Jan Høydahl >Priority: Major > > Existing git repo (and GitHub project) will be frozen and two new repos used > # Announce on all lists a date when the lucene-solr git repo will be frozen > This date should be e.g. 14 days in the future to allow in-flight commits and > PRs to be pushed > # At the freeze date, make a last commit adding a big announcement to > README.md about the location of the new repositories, then make both asf-git > and github R/O > # Clone 'lucene-solr' into new 'lucene' and 'solr' git repos > Then continue with separate LUCENE and SOLR jiras to prepare the new repos, > builds etc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14761) Order new 'solr' git repo
[ https://issues.apache.org/jira/browse/SOLR-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-14761. Resolution: Fixed > Order new 'solr' git repo > - > > Key: SOLR-14761 > URL: https://issues.apache.org/jira/browse/SOLR-14761 > Project: Solr > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > # Order from INFRA using self-serve tool > [https://selfserve.apache.org|https://selfserve.apache.org/] > # Once old lucene-solr repo is frozen (separate JIRA), clone it into the the > new 'solr' repo and start modifying -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14761) Order new 'solr' git repo
[ https://issues.apache.org/jira/browse/SOLR-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288065#comment-17288065 ] Jan Høydahl commented on SOLR-14761: New repo is created. Closing. > Order new 'solr' git repo > - > > Key: SOLR-14761 > URL: https://issues.apache.org/jira/browse/SOLR-14761 > Project: Solr > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > # Order from INFRA using self-serve tool > [https://selfserve.apache.org|https://selfserve.apache.org/] > # Once old lucene-solr repo is frozen (separate JIRA), clone it into the the > new 'solr' repo and start modifying -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] thelabdude commented on a change in pull request #221: Work with basic-auth enabled SolrCloud clusters
thelabdude commented on a change in pull request #221: URL: https://github.com/apache/lucene-solr-operator/pull/221#discussion_r579845189 ## File path: api/v1beta1/solrcloud_types.go ## @@ -1022,3 +1044,35 @@ type SolrTLSOptions struct { // +optional RestartOnTLSSecretUpdate bool `json:"restartOnTLSSecretUpdate,omitempty"` } + +type SolrSecurityOptions struct { + // Secret containing credentials the operator should use for API requests to secure Solr pods. + // If you provide this secret, then the operator assumes you've also configured your own security.json file and + // uploaded it to Solr. The 'key' of the secret selector is the username. If you change the password for this + // user using the Solr security API, then you *must* update the secret with the new password or the operator will be + // locked out of Solr and API requests will fail, ultimately causing a CrashBackoffLoop for all pods if probe endpoints + // are secured. + // + // If you don't supply this secret, then the operator bootstraps a default security.json file and creates a + // corresponding secret containing the credentials for three users: admin, solr, and k8s-oper. All API requests + // from the operator are made as the 'k8s-oper' user, which is configured with minimal access. The 'solr' user has + // basic read access to Solr resources. Once the security.json is bootstrapped, the operator will not update it! + // You're expected to use the 'admin' user to access the Security API to make further changes. It's strictly a + // bootstrapping operation. + // +optional + BasicAuthSecret *corev1.SecretKeySelector `json:"basicAuthSecret,omitempty"` Review comment: One idea I liking more and more is to have 2 secrets, one that is a `"kubernetes.io/basic-auth"` that holds the creds for the `k8s-oper` user and another `Opaque` that holds the bootstrapped `admin` user and the `security.json`. We can name the latter `-solrcloud-security-bootstrap` (or similar) which makes it clear it's only for bootstrapping security and the former holds the creds for the user needed by the operator. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15157) Refactor: separate Collection API commands from Overseer and message handling logic
[ https://issues.apache.org/jira/browse/SOLR-15157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288032#comment-17288032 ] ASF subversion and git services commented on SOLR-15157: Commit c472be5b8687037cf774098e3ac8f9acec48956d in lucene-solr's branch refs/heads/master from Ilan Ginzburg [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c472be5 ] SOLR-15157: fix wrong assumptions on stats returned by Overseer when cluster state updates are distributed (#2410) > Refactor: separate Collection API commands from Overseer and message handling > logic > --- > > Key: SOLR-15157 > URL: https://issues.apache.org/jira/browse/SOLR-15157 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Affects Versions: master (9.0) >Reporter: Ilan Ginzburg >Assignee: Ilan Ginzburg >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Collection API command execution happens in Overseer. The code dealing with > Overseer specific abstractions (Collection API queue management, executing > threads etc) is mixed with code implementing the Collection API commands. > The goal of this ticket is refactoring the Collection API code to abstract > anything that is related to how the Overseer executes the commands, in order > to enable a future ticket (SOLR-15146) to introduce a distributed execution > mode for the Collection API (and keeping the changes limited). > This ticket does not introduce any changes regarding how the Collection API > commands run in the Overseer. It is only refactoring the call chains to allow > a future separation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc merged pull request #2410: SOLR-15157: fix wrong assumptions on stats returned by Overseer when cluster state updates are distributed
murblanc merged pull request #2410: URL: https://github.com/apache/lucene-solr/pull/2410 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9796) fix SortedDocValues to no longer extend BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288028#comment-17288028 ] Robert Muir commented on LUCENE-9796: - I attached a prototype of what I had in mind: it at least compiles :) Now it becomes explicit in the source code where the term dict lookups are happening as you see clearly the {{lookupOrd()}} call. > fix SortedDocValues to no longer extend BinaryDocValues > --- > > Key: LUCENE-9796 > URL: https://issues.apache.org/jira/browse/LUCENE-9796 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9796_prototype.patch > > > SortedDocValues give ordinals and a way to derefence ordinal as a byte[] > But currently they *extend* BinaryDocValues, which allows directly calling > {{binaryValue()}}. > This allows them to act as a "slow" BinaryDocValues, but it is a performance > trap, especially now that terms bytes may be block-compressed (LUCENE-9663). > I think this should be detangled to prevent performance traps like > LUCENE-9795: SortedDocValues shouldn't have the trappy inherited > {{binaryValue()}} method that implicitly derefs the ord for the doc, then the > term bytes for the ord. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9796) fix SortedDocValues to no longer extend BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9796: Attachment: LUCENE-9796_prototype.patch > fix SortedDocValues to no longer extend BinaryDocValues > --- > > Key: LUCENE-9796 > URL: https://issues.apache.org/jira/browse/LUCENE-9796 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9796_prototype.patch > > > SortedDocValues give ordinals and a way to derefence ordinal as a byte[] > But currently they *extend* BinaryDocValues, which allows directly calling > {{binaryValue()}}. > This allows them to act as a "slow" BinaryDocValues, but it is a performance > trap, especially now that terms bytes may be block-compressed (LUCENE-9663). > I think this should be detangled to prevent performance traps like > LUCENE-9795: SortedDocValues shouldn't have the trappy inherited > {{binaryValue()}} method that implicitly derefs the ord for the doc, then the > term bytes for the ord. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on pull request #2409: Revert LUCENE-9491 changes for profiling.gradle, they totally break functionality
rmuir commented on pull request #2409: URL: https://github.com/apache/lucene-solr/pull/2409#issuecomment-782892686 and pls feel free to just cancel this PR if you know how to fix the real gradle logic issue with master that causes the crazy output... I needed a branch with this thing working to track down LUCENE-9795 perf regression, i figured tracking down the commit is better than just complaining that its broke :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14762) Fork the git repo into two new 'lucene' and 'solr'
[ https://issues.apache.org/jira/browse/SOLR-14762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288018#comment-17288018 ] David Smiley commented on SOLR-14762: - Agreed on keeping the "solr/" folder around for _some time_ after the split for the sole sake of the ref branch. But we should draw a line in the sand at which it disappears... probably that's 9.0. Disappointingly, I haven't heard much information about the status of the ref branch. > Fork the git repo into two new 'lucene' and 'solr' > -- > > Key: SOLR-14762 > URL: https://issues.apache.org/jira/browse/SOLR-14762 > Project: Solr > Issue Type: Sub-task >Reporter: Jan Høydahl >Priority: Major > > Existing git repo (and GitHub project) will be frozen and two new repos used > # Announce on all lists a date when the lucene-solr git repo will be frozen > This date should be e.g. 14 days in the future to allow in-flight commits and > PRs to be pushed > # At the freeze date, make a last commit adding a big announcement to > README.md about the location of the new repositories, then make both asf-git > and github R/O > # Clone 'lucene-solr' into new 'lucene' and 'solr' git repos > Then continue with separate LUCENE and SOLR jiras to prepare the new repos, > builds etc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc opened a new pull request #2410: SOLR-15157: fix wrong assumptions on stats returned by Overseer when cluster state updates are distributed
murblanc opened a new pull request #2410: URL: https://github.com/apache/lucene-solr/pull/2410 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gus Heck resolved SOLR-14787. - Fix Version/s: master (9.0) Resolution: Implemented > Inequality support in Payload Check query parser > > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: master (9.0) > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288007#comment-17288007 ] ASF subversion and git services commented on SOLR-14787: Commit 88ff3cd58d882970b4f9b943396edd6b84b89dc2 in lucene-solr's branch refs/heads/master from Gus Heck [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=88ff3cd ] SOLR-14787 CHANGES.txt entry. > Inequality support in Payload Check query parser > > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288002#comment-17288002 ] ASF subversion and git services commented on SOLR-14787: Commit b298d7fb160a49f552dc3987b83aa53601c7b29a in lucene-solr's branch refs/heads/master from Kevin Watters [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b298d7f ] SOLR-14787 - Adding support to use inequalities to the payload check query parser. (#1954) > Inequality support in Payload Check query parser > > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf merged pull request #1954: SOLR-14787 - Adding support to use inequalities to the payload check query parser.
gus-asf merged pull request #1954: URL: https://github.com/apache/lucene-solr/pull/1954 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] kwatters commented on pull request #1954: SOLR-14787 - Adding support to use inequalities to the payload check query parser.
kwatters commented on pull request #1954: URL: https://github.com/apache/lucene-solr/pull/1954#issuecomment-782886142 @gus-asf Nice additional tests. At least we have defined a behavior here for those test cases. I think this is good to go. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9796) fix SortedDocValues to no longer extend BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287999#comment-17287999 ] Robert Muir commented on LUCENE-9796: - There are quite a few abusers in tests, grouping, etc here, so the issue isn't particularly easy. But in order to properly support compression for the SortedDocValues, I think we should fix it, so that there aren't traps in the API. Even if we have to deprecate stuff and do it iteratively. > fix SortedDocValues to no longer extend BinaryDocValues > --- > > Key: LUCENE-9796 > URL: https://issues.apache.org/jira/browse/LUCENE-9796 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > > SortedDocValues give ordinals and a way to derefence ordinal as a byte[] > But currently they *extend* BinaryDocValues, which allows directly calling > {{binaryValue()}}. > This allows them to act as a "slow" BinaryDocValues, but it is a performance > trap, especially now that terms bytes may be block-compressed (LUCENE-9663). > I think this should be detangled to prevent performance traps like > LUCENE-9795: SortedDocValues shouldn't have the trappy inherited > {{binaryValue()}} method that implicitly derefs the ord for the doc, then the > term bytes for the ord. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9795) investigate large checkindex/grouping regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287997#comment-17287997 ] Robert Muir commented on LUCENE-9795: - I experimented locally with a quick hack patch for the followup issue LUCENE-9796, and can confirm that fixing the API issue will expose all the other slow stuff out there with similar problems (e.g. grouping) I'll resolve this issue as it addresses the specific CheckIndex problem, but the fixes to grouping and other slowness in tests should be fixed on that followup issue as it is more involved. > investigate large checkindex/grouping regression in nightly benchmarks > -- > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9795.patch, > Screen_Shot_2021-02-21_at_09.17.53.png, Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9795) investigate large checkindex/grouping regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-9795. - Fix Version/s: master (9.0) Resolution: Fixed > investigate large checkindex/grouping regression in nightly benchmarks > -- > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-9795.patch, > Screen_Shot_2021-02-21_at_09.17.53.png, Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9795) investigate large checkindex/grouping regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287996#comment-17287996 ] ASF subversion and git services commented on LUCENE-9795: - Commit 107926e486f8cd6bbfc8abb055c9f58fe56f9cbb in lucene-solr's branch refs/heads/master from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=107926e ] LUCENE-9795: fix CheckIndex not to validate SortedDocValues as if they were BinaryDocValues CheckIndex already validates SortedDocValues properly: reads every document's ordinal and validates derefing all the ordinals back to bytes from the terms dictionary. It should not do an additional (very slow) pass where it treats the field as if it were binary (doc -> ord -> byte[]), this is slow and doesn't validate any additional index data. Now that the term dictionary of SortedDocValues may be compressed, it is especially slow to misuse the docvalues field in this way. > investigate large checkindex/grouping regression in nightly benchmarks > -- > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9795.patch, > Screen_Shot_2021-02-21_at_09.17.53.png, Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9795) investigate large checkindex/grouping regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287995#comment-17287995 ] Michael McCandless commented on LUCENE-9795: +1 for that simple patch to fix {{CheckIndex}}! I double checked and confirmed that {{checkSortedDocValues}} itself is already (more efficiently) stepping through all ordinals, confirming it can retrieve the {{BytesRef}} for each. > investigate large checkindex/grouping regression in nightly benchmarks > -- > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9795.patch, > Screen_Shot_2021-02-21_at_09.17.53.png, Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9795) investigate large checkindex/grouping regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287991#comment-17287991 ] Robert Muir commented on LUCENE-9795: - I opened LUCENE-9796 as a followup to fix the API trap that invites this stuff. But for this issue I think the attached 1-line patch suffices. > investigate large checkindex/grouping regression in nightly benchmarks > -- > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9795.patch, > Screen_Shot_2021-02-21_at_09.17.53.png, Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf commented on pull request #1954: SOLR-14787 - Adding support to use inequalities to the payload check query parser.
gus-asf commented on pull request #1954: URL: https://github.com/apache/lucene-solr/pull/1954#issuecomment-782881341 @kwatters assuming you agree with my doc tweaks I think this is good to go. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9796) fix SortedDocValues to no longer extend BinaryDocValues
Robert Muir created LUCENE-9796: --- Summary: fix SortedDocValues to no longer extend BinaryDocValues Key: LUCENE-9796 URL: https://issues.apache.org/jira/browse/LUCENE-9796 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir SortedDocValues give ordinals and a way to derefence ordinal as a byte[] But currently they *extend* BinaryDocValues, which allows directly calling {{binaryValue()}}. This allows them to act as a "slow" BinaryDocValues, but it is a performance trap, especially now that terms bytes may be block-compressed (LUCENE-9663). I think this should be detangled to prevent performance traps like LUCENE-9795: SortedDocValues shouldn't have the trappy inherited {{binaryValue()}} method that implicitly derefs the ord for the doc, then the term bytes for the ord. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9795) investigate large checkindex/grouping regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9795: Attachment: LUCENE-9795.patch > investigate large checkindex/grouping regression in nightly benchmarks > -- > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9795.patch, > Screen_Shot_2021-02-21_at_09.17.53.png, Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9795) investigate large checkindex/grouping regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287988#comment-17287988 ] Robert Muir commented on LUCENE-9795: - OK, I think i can explain the checkindex stuff. When profiling unit tests, I do see this stack as top CPU user: {noformat} java.nio.ByteBuffer#get() at java.nio.DirectByteBuffer#get() at org.apache.lucene.store.ByteBufferGuard#getBytes() at org.apache.lucene.store.ByteBufferIndexInput#readBytes() at org.apache.lucene.store.MockIndexInputWrapper#readBytes() at org.apache.lucene.util.compress.LZ4#decompress() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict#decompressBlock() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict#next() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict#seekExact() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$BaseSortedDocValues#lookupOrd() at org.apache.lucene.index.SortedDocValues#binaryValue() at org.apache.lucene.index.CheckIndex#checkBinaryDocValues() {noformat} I don't think checkindex should test retrieving every SORTED doc's bytes as if it were BINARY. Looks to me like a leftover actually. I will upload a simple patch. The grouping stuff should maybe be a separate issue, I suspect grouping logic may be inefficiently doing similar stuff (reading tons of terms bytes instead of using ordinals or something). > investigate large checkindex/grouping regression in nightly benchmarks > -- > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: Screen_Shot_2021-02-21_at_09.17.53.png, > Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #2186: LUCENE-9334 Consistency of field data structures
jpountz commented on a change in pull request #2186: URL: https://github.com/apache/lucene-solr/pull/2186#discussion_r579820029 ## File path: lucene/core/src/java/org/apache/lucene/index/FieldInfo.java ## @@ -130,127 +167,255 @@ public boolean checkConsistency() { } } -if (pointDimensionCount < 0) { +if (docValuesType == null) { + throw new IllegalStateException("DocValuesType must not be null (field: '" + name + "')"); +} +if (dvGen != -1 && docValuesType == DocValuesType.NONE) { throw new IllegalStateException( - "pointDimensionCount must be >= 0; got " + pointDimensionCount); + "field '" + + name + + "' cannot have a docvalues update generation without having docvalues"); } +if (pointDimensionCount < 0) { + throw new IllegalStateException( + "pointDimensionCount must be >= 0; got " + + pointDimensionCount + + " (field: '" + + name + + "')"); +} if (pointIndexDimensionCount < 0) { throw new IllegalStateException( - "pointIndexDimensionCount must be >= 0; got " + pointIndexDimensionCount); + "pointIndexDimensionCount must be >= 0; got " + + pointIndexDimensionCount + + " (field: '" + + name + + "')"); } - if (pointNumBytes < 0) { - throw new IllegalStateException("pointNumBytes must be >= 0; got " + pointNumBytes); + throw new IllegalStateException( + "pointNumBytes must be >= 0; got " + pointNumBytes + " (field: '" + name + "')"); } if (pointDimensionCount != 0 && pointNumBytes == 0) { throw new IllegalStateException( - "pointNumBytes must be > 0 when pointDimensionCount=" + pointDimensionCount); + "pointNumBytes must be > 0 when pointDimensionCount=" + + pointDimensionCount + + " (field: '" + + name + + "')"); } - if (pointIndexDimensionCount != 0 && pointDimensionCount == 0) { throw new IllegalStateException( - "pointIndexDimensionCount must be 0 when pointDimensionCount=0"); + "pointIndexDimensionCount must be 0 when pointDimensionCount=0" + + " (field: '" + + name + + "')"); } - if (pointNumBytes != 0 && pointDimensionCount == 0) { throw new IllegalStateException( - "pointDimensionCount must be > 0 when pointNumBytes=" + pointNumBytes); + "pointDimensionCount must be > 0 when pointNumBytes=" + + pointNumBytes + + " (field: '" + + name + + "')"); } -if (dvGen != -1 && docValuesType == DocValuesType.NONE) { +if (vectorSearchStrategy == null) { throw new IllegalStateException( - "field '" - + name - + "' cannot have a docvalues update generation without having docvalues"); + "Vector search strategy must not be null (field: '" + name + "')"); } - if (vectorDimension < 0) { - throw new IllegalStateException("vectorDimension must be >=0; got " + vectorDimension); + throw new IllegalStateException( + "vectorDimension must be >=0; got " + vectorDimension + " (field: '" + name + "')"); } - if (vectorDimension == 0 && vectorSearchStrategy != VectorValues.SearchStrategy.NONE) { throw new IllegalStateException( - "vector search strategy must be NONE when dimension = 0; got " + vectorSearchStrategy); + "vector search strategy must be NONE when dimension = 0; got " + + vectorSearchStrategy + + " (field: '" + + name + + "')"); } - return true; } - // should only be called by FieldInfos#addOrUpdate - void update( - boolean storeTermVector, + void verifySameSchema( + IndexOptions indexOptions, boolean omitNorms, boolean storePayloads, - IndexOptions indexOptions, - Map attributes, + boolean storeTermVector, + DocValuesType docValuesType, + long dvGen, int dimensionCount, int indexDimensionCount, - int dimensionNumBytes) { -if (indexOptions == null) { - throw new NullPointerException("IndexOptions must not be null (field: \"" + name + "\")"); -} -// System.out.println("FI.update field=" + name + " indexed=" + indexed + " omitNorms=" + -// omitNorms + " this.omitNorms=" + this.omitNorms); -if (this.indexOptions != indexOptions) { - if (this.indexOptions == IndexOptions.NONE) { -this.indexOptions = indexOptions; - } else if (indexOptions != IndexOptions.NONE) { -throw new IllegalArgumentException( -"cannot change field \"" -+ name -+ "\" from index options=" -+ this.indexOptions -
[GitHub] [lucene-solr] jpountz commented on a change in pull request #2392: LUCENE-9705: Create Lucene90DocValuesFormat and Lucene90NormsFormat
jpountz commented on a change in pull request #2392: URL: https://github.com/apache/lucene-solr/pull/2392#discussion_r579818440 ## File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene80/Lucene80DocValuesFormat.java ## @@ -163,6 +163,11 @@ public Lucene80DocValuesFormat(Mode mode) { this.mode = Objects.requireNonNull(mode); } + /** + * Note: although this format is only used on older versions, we need to keep the write logic in + * addition to the read logic. It's possible for doc values on older segments to be written to + * through in-place doc values updates. Review comment: `in-place` might be a bit misleading since doc-value updates are not in-place but create a new generation of a segment? ```suggestion * through doc values updates. ``` ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesFormat.java ## @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene90; + +import java.io.IOException; +import java.util.Objects; +import org.apache.lucene.codecs.DocValuesConsumer; +import org.apache.lucene.codecs.DocValuesFormat; +import org.apache.lucene.codecs.DocValuesProducer; +import org.apache.lucene.index.DocValuesType; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.index.SegmentReadState; +import org.apache.lucene.index.SegmentWriteState; +import org.apache.lucene.store.DataOutput; +import org.apache.lucene.util.SmallFloat; +import org.apache.lucene.util.packed.DirectWriter; + +/** + * Lucene 8.0 DocValues format. Review comment: ```suggestion * Lucene 9.0 DocValues format. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #2395: LUCENE-9616: Add developer docs on how to update a format.
jpountz commented on a change in pull request #2395: URL: https://github.com/apache/lucene-solr/pull/2395#discussion_r579817121 ## File path: lucene/backward-codecs/README.md ## @@ -0,0 +1,54 @@ +# Index backwards compatibility + +This README describes the approach to maintaining compatibility with indices +from previous versions and gives guidelines for making format changes. + +## Compatibility strategy + +Codecs and file formats are versioned according to the minor version in which +they were created. For example Lucene87Codec represents the codec used for +creating Lucene 8.7 indices, and potentially later index versions too. Each +segment records the codec version that was used to write it. Review comment: I think that the following would be more accurate? ```suggestion segment records the codec name that was used to write it. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9795) investigate large checkindex/grouping regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9795: Summary: investigate large checkindex/grouping regression in nightly benchmarks (was: investigate large checkindex regression in nightly benchmarks) > investigate large checkindex/grouping regression in nightly benchmarks > -- > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: Screen_Shot_2021-02-21_at_09.17.53.png, > Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9795) investigate large checkindex regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9795: Attachment: Screen_Shot_2021-02-21_at_09.30.30.png > investigate large checkindex regression in nightly benchmarks > - > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: Screen_Shot_2021-02-21_at_09.17.53.png, > Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9795) investigate large checkindex regression in nightly benchmarks
[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287976#comment-17287976 ] Robert Muir commented on LUCENE-9795: - This seems to also impact some, but not all, of the grouping queries too: (Term1M etc). !Screen_Shot_2021-02-21_at_09.30.30.png! > investigate large checkindex regression in nightly benchmarks > - > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: Screen_Shot_2021-02-21_at_09.17.53.png, > Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9795) investigate large checkindex regression in nightly benchmarks
Robert Muir created LUCENE-9795: --- Summary: investigate large checkindex regression in nightly benchmarks Key: LUCENE-9795 URL: https://issues.apache.org/jira/browse/LUCENE-9795 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: Screen_Shot_2021-02-21_at_09.17.53.png In the nightly benchmark, checkindex times increased more than 4x on the 2/16 datapoint Looking at the commits on 2/15, most obvious thing to look into is docvalues terms dict compression: LUCENE-9663 Will try to pinpoint it more, my concern is some perf bug such as every single term causing decompression of the whole block repeatedly (missing seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on pull request #2409: Revert LUCENE-9491 changes for profiling.gradle, they totally break functionality
rmuir commented on pull request #2409: URL: https://github.com/apache/lucene-solr/pull/2409#issuecomment-782863061 @dweiss i'm not sure what changes broke it here, maybe it was just a typo? I only know i was able to get her running again with a git checkout command :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir opened a new pull request #2409: Revert LUCENE-9491 changes for profiling.gradle, they totally break functionality
rmuir opened a new pull request #2409: URL: https://github.com/apache/lucene-solr/pull/2409 currently in master `tests.profile` gives totally broken output. Reverting the last changes to `profiling.gradle` gets it working again This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org