[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
jaisonbi commented on a change in pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r564272513 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -791,6 +806,107 @@ private void addTermsDict(SortedSetDocValues values) throws IOException { writeTermsIndex(values); } + private void addCompressedTermsDict(SortedSetDocValues values) throws IOException { +final long size = values.getValueCount(); +meta.writeVLong(size); +meta.writeInt(Lucene80DocValuesFormat.TERMS_DICT_BLOCK_LZ4_CODE); + +ByteBuffersDataOutput addressBuffer = new ByteBuffersDataOutput(); +ByteBuffersIndexOutput addressOutput = +new ByteBuffersIndexOutput(addressBuffer, "temp", "temp"); +meta.writeInt(DIRECT_MONOTONIC_BLOCK_SHIFT); +long numBlocks = +(size + Lucene80DocValuesFormat.TERMS_DICT_BLOCK_LZ4_MASK) +>>> Lucene80DocValuesFormat.TERMS_DICT_BLOCK_LZ4_SHIFT; +DirectMonotonicWriter writer = +DirectMonotonicWriter.getInstance( +meta, addressOutput, numBlocks, DIRECT_MONOTONIC_BLOCK_SHIFT); + +LZ4.FastCompressionHashTable ht = new LZ4.FastCompressionHashTable(); +ByteArrayDataOutput bufferedOutput = new ByteArrayDataOutput(termsDictBuffer); +long ord = 0; +long start = data.getFilePointer(); +int maxLength = 0; +TermsEnum iterator = values.termsEnum(); +int maxBlockLength = 0; +BytesRefBuilder previous = new BytesRefBuilder(); +for (BytesRef term = iterator.next(); term != null; term = iterator.next()) { + int termLength = term.length; + if ((ord & Lucene80DocValuesFormat.TERMS_DICT_BLOCK_LZ4_MASK) == 0) { +if (bufferedOutput.getPosition() > 0) { + int uncompressedLength = bufferedOutput.getPosition(); + data.writeVInt(uncompressedLength); + maxBlockLength = Math.max(maxBlockLength, uncompressedLength); + long before = data.getFilePointer(); + // Compress block + LZ4.compress(termsDictBuffer, 0, uncompressedLength, data, ht); + int compressedLength = (int) (data.getFilePointer() - before); + // Corner case: Compressed length might be bigger than un-compressed length. + maxBlockLength = Math.max(maxBlockLength, compressedLength); + bufferedOutput.reset(termsDictBuffer); +} + +writer.add(data.getFilePointer() - start); +data.writeVInt(termLength); +data.writeBytes(term.bytes, term.offset, termLength); + } else { +final int prefixLength = StringHelper.bytesDifference(previous.get(), term); +final int suffixLength = term.length - prefixLength; +assert suffixLength > 0; // terms are unique +int reservedSize = suffixLength + 11; // 1 byte + 2 vint. +bufferedOutput = maybeGrowBuffer(bufferedOutput, reservedSize); +bufferedOutput.writeByte( +(byte) (Math.min(prefixLength, 15) | (Math.min(15, suffixLength - 1) << 4))); + +if (prefixLength >= 15) { + bufferedOutput.writeVInt(prefixLength - 15); +} +if (suffixLength >= 16) { + bufferedOutput.writeVInt(suffixLength - 16); +} +bufferedOutput.writeBytes(term.bytes, term.offset + prefixLength, suffixLength); + } + maxLength = Math.max(maxLength, termLength); + previous.copyBytes(term); + ++ord; +} + +// Compress and write out the last block +if (bufferedOutput.getPosition() > 0) { + int uncompressedLength = bufferedOutput.getPosition(); + data.writeVInt(uncompressedLength); + maxBlockLength = Math.max(maxBlockLength, uncompressedLength); + long before = data.getFilePointer(); + LZ4.compress(termsDictBuffer, 0, uncompressedLength, data, ht); + int compressedLength = (int) (data.getFilePointer() - before); + maxBlockLength = Math.max(maxBlockLength, compressedLength); +} + +writer.finish(); +meta.writeInt(maxLength); +// Write one more int for storing max block length. For compressed terms dict only. +meta.writeInt(maxBlockLength); +meta.writeLong(start); +meta.writeLong(data.getFilePointer() - start); +start = data.getFilePointer(); +addressBuffer.copyTo(data); +meta.writeLong(start); +meta.writeLong(data.getFilePointer() - start); + +// Now write the reverse terms index +writeTermsIndex(values); + } + + private ByteArrayDataOutput maybeGrowBuffer(ByteArrayDataOutput bufferedOutput, int termLength) { +int pos = bufferedOutput.getPosition(), originalLength = termsDictBuffer.length; +if (pos + termLength >= originalLength - 1) { + int newLength = (originalLength + termLength) << 1; Review comment: Makes sense. Call ArrayUtil.grow is enough. This is an automated message from the Apache Git Service. To
[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
jaisonbi commented on a change in pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r564271912 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException { } private static class TermsDict extends BaseTermsEnum { +static final int PADDING_LENGTH = 7; Review comment: ok..will rename and add this comment:) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9406) Make it simpler to track IndexWriter's events
[ https://issues.apache.org/jira/browse/LUCENE-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271878#comment-17271878 ] Zach Chen commented on LUCENE-9406: --- Makes sense! For the initial interface proposal, I’m thinking something along the same line with what you had in the previous PR for event metrics collection: {code:java} interface EventMetrics { Map providesMetrics(); } interface IndexWriterEvent extends EventMetrics { public void beginPointInTimeMerge(MergeTrigger); public void completePointInTimeMerge(MergeTrigger); ... } {code} The implementation for IndexWriterEvent can be set into IndexWriterConfig / LiveIndexWriterConfig, and used in IndexWriter’s key event points just like in previous PR. For event metrics consumption, I’m considering something similar to Dropwizard’s metrics reporter: {code:java} interface EventMetricsReporter { public void report(EventMetrics); // calls EventMetrics.provideMetrics() to get data } {code} such that application can provide custom implementation for data consumption: {code:java} class FileBasedEventReporter extends EventMetricsReporter {} class NetworkBasedEventReporter extends EventMetricsReporter {} ...{code} What do you think ? > Make it simpler to track IndexWriter's events > - > > Key: LUCENE-9406 > URL: https://issues.apache.org/jira/browse/LUCENE-9406 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > > This is the second spinoff from a [controversial PR to add a new index-time > feature to Lucene to merge small segments during > commit|https://github.com/apache/lucene-solr/pull/1552]. That change can > substantially reduce the number of small index segments to search. > In that PR, there was a new proposed interface, {{IndexWriterEvents}}, giving > the application a chance to track when {{IndexWriter}} kicked off merges > during commit, how many, how long it waited, how often it gave up waiting, > etc. > Such telemetry from production usage is really helpful when tuning settings > like which merges (e.g. a size threshold) to attempt on commit, and how long > to wait during commit, etc. > I am splitting out this issue to explore possible approaches to do this. > E.g. [~simonw] proposed using a statistics class instead, but if I understood > that correctly, I think that would put the role of aggregation inside > {{IndexWriter}}, which is not ideal. > Many interesting events, e.g. how many merges are being requested, how large > are they, how long did they take to complete or fail, etc., can be gleaned by > wrapping expert Lucene classes like {{MergePolicy}} and {{MergeScheduler}}. > But for those events that cannot (e.g. {{IndexWriter}} stopped waiting for > merges during commit), it would be very helpful to have some simple way to > track so applications can better tune. > It is also possible to subclass {{IndexWriter}} and override key methods, but > I think that is inherently risky as {{IndexWriter}}'s protected methods are > not considered to be a stable API, and the synchronization used by > {{IndexWriter}} is confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9476) Add a bulk ordinal->FacetLabel API
[ https://issues.apache.org/jira/browse/LUCENE-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271862#comment-17271862 ] Gautam Worah commented on LUCENE-9476: -- Submitted a WIP PR [here|https://github.com/apache/lucene-solr/pull/2246] > Add a bulk ordinal->FacetLabel API > -- > > Key: LUCENE-9476 > URL: https://issues.apache.org/jira/browse/LUCENE-9476 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6.1 >Reporter: Gautam Worah >Priority: Minor > Labels: performance > > This issue is a spillover from the > [PR|https://github.com/apache/lucene-solr/pull/1733/files] for LUCENE 9450 > The idea here is to share a single {{BinaryDocValues}} instance per leaf per > query instead of creating a new one each time in the > {{DirectoryTaxonomyReader}}. > Suggested by [~mikemccand] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gautamworah96 opened a new pull request #2247: WIP: LUCENE-9476 Add basic functionality, basic tests
gautamworah96 opened a new pull request #2247: URL: https://github.com/apache/lucene-solr/pull/2247 # Description In [LUCENE-9450](https://issues.apache.org/jira/browse/LUCENE-9450) we switched the Taxonomy index from Stored Fields to `BinaryDocValues.` In the resulting implementation of the `getPath` code, we create a new `BinaryDocValues`'s values instance for each ordinal. It may happen that we may traverse over the same nodes over and over again if the `getPath` API is called multiple times for ordinals in the same segment/with the same `readerIndex`. This PR takes advantage of that fact by sorting ordinals and then trying to find out if some of the ordinals are present in the same segment/have the same `readerIndex` (by trying to `advanceExact` to the correct position and not failing) thereby allowing us to reuse the previous `BinaryDocValues` object. # Solution Steps: 1. Sort all ordinals and remember their position so as to store the path in the correct position 2. Try to `advanceExact` to the correct position with the previously calculated `readerIndex`. If the operation fails, try to find the correct segment for the ordinal and then `advanceExact` to the desired position. 3. Store this position for future ordinals. # Tests Added a new test for the API that compares the individual `getPath` results from ordinals with the bulk FacetLabels returned by the `getBulkPath` API # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9476) Add a bulk ordinal->FacetLabel API
[ https://issues.apache.org/jira/browse/LUCENE-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271862#comment-17271862 ] Gautam Worah edited comment on LUCENE-9476 at 1/26/21, 4:36 AM: Submitted a WIP PR [here|https://github.com/apache/lucene-solr/pull/2247] was (Author: gworah): Submitted a WIP PR [here|https://github.com/apache/lucene-solr/pull/2246] > Add a bulk ordinal->FacetLabel API > -- > > Key: LUCENE-9476 > URL: https://issues.apache.org/jira/browse/LUCENE-9476 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6.1 >Reporter: Gautam Worah >Priority: Minor > Labels: performance > > This issue is a spillover from the > [PR|https://github.com/apache/lucene-solr/pull/1733/files] for LUCENE 9450 > The idea here is to share a single {{BinaryDocValues}} instance per leaf per > query instead of creating a new one each time in the > {{DirectoryTaxonomyReader}}. > Suggested by [~mikemccand] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #2235: LUCENE-9690: Hunspell: support special title-case for words with apostrophe
dweiss merged pull request #2235: URL: https://github.com/apache/lucene-solr/pull/2235 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #2236: LUCENE-9691: Hunspell: support trailing comments on aff option lines
dweiss merged pull request #2236: URL: https://github.com/apache/lucene-solr/pull/2236 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] HoustonPutman commented on a change in pull request #193: Support custom log4j2 config from user-provided ConfigMap
HoustonPutman commented on a change in pull request #193: URL: https://github.com/apache/lucene-solr-operator/pull/193#discussion_r564044525 ## File path: controllers/solrcloud_controller.go ## @@ -182,44 +182,61 @@ func (r *SolrCloudReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) { } } - // Generate ConfigMap unless the user supplied a custom ConfigMap for solr.xml ... but the provided ConfigMap - // might be for the Prometheus exporter, so we only care if they provide a solr.xml in the CM - solrXmlConfigMapName := instance.ConfigMapName() - solrXmlMd5 := "" + // Generate ConfigMap unless the user supplied a custom ConfigMap for solr.xml + configMapInfo := make(map[string]string) if instance.Spec.CustomSolrKubeOptions.ConfigMapOptions != nil && instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap != "" { + providedConfigMapName := instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap foundConfigMap := {} - nn := types.NamespacedName{Name: instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap, Namespace: instance.Namespace} + nn := types.NamespacedName{Name: providedConfigMapName, Namespace: instance.Namespace} err = r.Get(context.TODO(), nn, foundConfigMap) if err != nil { return requeueOrNot, err // if they passed a providedConfigMap name, then it must exist } - // ConfigMap doesn't have to have a solr.xml, but if it does, then it needs to be valid! if foundConfigMap.Data != nil { - solrXml, ok := foundConfigMap.Data["solr.xml"] - if ok { + logXml, hasLogXml := foundConfigMap.Data[util.LogXmlFile] + solrXml, hasSolrXml := foundConfigMap.Data[util.SolrXmlFile] + + // if there's a user-provided config, it must have one of the expected keys + if !hasLogXml && !hasSolrXml { + return requeueOrNot, fmt.Errorf("User provided ConfigMap %s must have one of 'solr.xml' and/or 'log4j2.xml'", + providedConfigMapName) + } + + if hasSolrXml { + // make sure the user-provided solr.xml is valid if !strings.Contains(solrXml, "${hostPort:") { return requeueOrNot, fmt.Errorf("Custom solr.xml in ConfigMap %s must contain a placeholder for the 'hostPort' variable, such as ${hostPort:80}", - instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap) + providedConfigMapName) } // stored in the pod spec annotations on the statefulset so that we get a restart when solr.xml changes - solrXmlMd5 = fmt.Sprintf("%x", md5.Sum([]byte(solrXml))) - solrXmlConfigMapName = foundConfigMap.Name - } else { - return requeueOrNot, fmt.Errorf("Required 'solr.xml' key not found in provided ConfigMap %s", - instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap) + configMapInfo[util.SolrXmlMd5Annotation] = fmt.Sprintf("%x", md5.Sum([]byte(solrXml))) + configMapInfo[util.SolrXmlFile] = foundConfigMap.Name + } + + if hasLogXml { + if !strings.Contains(logXml, "monitorInterval=") { + // stored in the pod spec annotations on the statefulset so that we get a restart when the log config changes + configMapInfo[util.LogXmlMd5Annotation] = fmt.Sprintf("%x", md5.Sum([]byte(logXml))) + } // else log4j will automatically refresh for us, so no restart needed + configMapInfo[util.LogXmlFile] = foundConfigMap.Name } + } else { - return requeueOrNot, fmt.Errorf("Provided ConfigMap %s has no data", - instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap) + return requeueOrNot, fmt.Errorf("Provided ConfigMap %s has no data", providedConfigMapName) } - } else { + } + + if configMapInfo[util.SolrXmlFile] == "" { Review comment: So if a user passes a custom
[GitHub] [lucene-solr-operator] HoustonPutman merged pull request #200: Apachify the solr-operator helm chart
HoustonPutman merged pull request #200: URL: https://github.com/apache/lucene-solr-operator/pull/200 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2245: Move old field infos format to backwards-codecs.
jtibshirani commented on a change in pull request #2245: URL: https://github.com/apache/lucene-solr/pull/2245#discussion_r564138601 ## File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene60/Lucene60FieldInfosFormat.java ## @@ -311,6 +312,11 @@ private static IndexOptions getIndexOptions(IndexInput input, byte b) throws IOE } } + /** + * Note: although this format is only used on older versions, we need to keep the write logic Review comment: I hope this assumption is accurate, would appreciate someone double-checking it. ## File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene60/Lucene60FieldInfosFormat.java ## @@ -311,6 +312,11 @@ private static IndexOptions getIndexOptions(IndexInput input, byte b) throws IOE } } + /** + * Note: although this format is only used on older versions, we need to keep the write logic Review comment: I hope this is accurate, would appreciate someone double-checking it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'
murblanc commented on a change in pull request #2199: URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r563645684 ## File path: solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java ## @@ -238,11 +258,93 @@ public PlacementPlan computePlacement(Cluster cluster, PlacementRequest request, // failure. Current code does fail if placement is impossible (constraint is at most one replica of a shard on any node). for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) { makePlacementDecisions(solrCollection, shardName, availabilityZones, replicaType, request.getCountReplicasToCreate(replicaType), - attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, placementPlanFactory, replicaPlacements); + attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, placementContext.getPlacementPlanFactory(), replicaPlacements); } } - return placementPlanFactory.createPlacementPlan(request, replicaPlacements); + return placementContext.getPlacementPlanFactory().createPlacementPlan(request, replicaPlacements); +} + +@Override +public void verifyAllowedModification(ModificationRequest modificationRequest, PlacementContext placementContext) throws PlacementModificationException, InterruptedException { + if (modificationRequest instanceof DeleteShardsRequest) { +throw new UnsupportedOperationException("not implemented yet"); + } else if (modificationRequest instanceof DeleteCollectionRequest) { +verifyDeleteCollection((DeleteCollectionRequest) modificationRequest, placementContext); + } else if (modificationRequest instanceof DeleteReplicasRequest) { +verifyDeleteReplicas((DeleteReplicasRequest) modificationRequest, placementContext); + } else { +throw new UnsupportedOperationException("unsupported request type " + modificationRequest.getClass().getName()); + } +} + +private void verifyDeleteCollection(DeleteCollectionRequest deleteCollectionRequest, PlacementContext placementContext) throws PlacementModificationException, InterruptedException { Review comment: Can we have cycles in the `withCollection` graph? Should we allow a way to override the vetting checks from the Collection API? ## File path: solr/core/src/java/org/apache/solr/cluster/placement/DeleteShardsRequest.java ## @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.cluster.placement; + +import java.util.Set; + +/** + * Delete shards request. + */ +public interface DeleteShardsRequest extends ModificationRequest { Review comment: If we don't use this interface (i.e. the class that implements it) I suggest we do not include either in this PR. Or at least define and call the corresponding method in `AssignStrategy` from the appropriate `*Cmd` even if nothing does a real implementation and vetting based on it (but it would be ready to be consumed maybe by another plugin written by some user). ## File path: solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java ## @@ -238,11 +258,93 @@ public PlacementPlan computePlacement(Cluster cluster, PlacementRequest request, // failure. Current code does fail if placement is impossible (constraint is at most one replica of a shard on any node). for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) { makePlacementDecisions(solrCollection, shardName, availabilityZones, replicaType, request.getCountReplicasToCreate(replicaType), - attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, placementPlanFactory, replicaPlacements); + attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, placementContext.getPlacementPlanFactory(), replicaPlacements); } } - return placementPlanFactory.createPlacementPlan(request, replicaPlacements); + return placementContext.getPlacementPlanFactory().createPlacementPlan(request, replicaPlacements); +} + +@Override
[GitHub] [lucene-solr-operator] thelabdude merged pull request #195: Improve Prom exporter docs
thelabdude merged pull request #195: URL: https://github.com/apache/lucene-solr-operator/pull/195 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on pull request #2205: LUCENE-9668: Deprecate MinShouldMatchSumScorer with WANDScorer
zacharymorn commented on pull request #2205: URL: https://github.com/apache/lucene-solr/pull/2205#issuecomment-766615404 > Thank you @zacharymorn ! No problem! Thanks Adrien for the review and guidance! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2091: Jira/solr 14778
muse-dev[bot] commented on a change in pull request #2091: URL: https://github.com/apache/lucene-solr/pull/2091#discussion_r564197768 ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java ## @@ -94,7 +98,8 @@ public void testFloatEncoding() throws Exception { Review comment: *NULLPTR_DEREFERENCE:* call to `TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses memory that is the null pointer on line 84. ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java ## @@ -120,26 +125,37 @@ void assertTermEquals(String expected, TokenStream stream, byte[] expectPay) thr Review comment: *NULLPTR_DEREFERENCE:* call to `TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses memory that is the null pointer on line 106. ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java ## @@ -51,9 +53,11 @@ public void testPayloads() throws Exception { public void testNext() throws Exception { String test = "The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ dogs|NN"; -DelimitedPayloadTokenFilter filter = new DelimitedPayloadTokenFilter - (whitespaceMockTokenizer(test), - DelimitedPayloadTokenFilter.DEFAULT_DELIMITER, new IdentityEncoder()); +DelimitedPayloadTokenFilter filter = +new DelimitedPayloadTokenFilter( +whitespaceMockTokenizer(test), +DelimitedPayloadTokenFilter.DEFAULT_DELIMITER, +new IdentityEncoder()); filter.reset(); assertTermEquals("The", filter, null); Review comment: *NULLPTR_DEREFERENCE:* call to `TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses memory that is the null pointer on line 62. ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java ## @@ -51,9 +53,11 @@ public void testPayloads() throws Exception { Review comment: *NULLPTR_DEREFERENCE:* call to `TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses memory that is the null pointer on line 38. ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/shingle/TestShingleAnalyzerWrapper.java ## @@ -0,0 +1,505 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.shingle; + +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.BaseTokenStreamTestCase; +import org.apache.lucene.analysis.CharArraySet; +import org.apache.lucene.analysis.MockAnalyzer; +import org.apache.lucene.analysis.MockTokenizer; +import org.apache.lucene.analysis.StopFilter; +import org.apache.lucene.analysis.TokenFilter; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.Tokenizer; +import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; +import org.apache.lucene.document.TextField; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.BooleanClause; +import org.apache.lucene.search.BooleanQuery; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.PhraseQuery; +import org.apache.lucene.search.ScoreDoc; +import org.apache.lucene.search.TermQuery; +import org.apache.lucene.store.Directory; + +/** A test class for ShingleAnalyzerWrapper as regards queries and scoring. */ +public class TestShingleAnalyzerWrapper extends BaseTokenStreamTestCase { + private Analyzer analyzer; + private IndexSearcher searcher; + private IndexReader reader; + private Directory directory; + + /** + * Set up a new index in RAM with three test phrases and the supplied
[GitHub] [lucene-solr-operator] HoustonPutman commented on pull request #200: Apachify the solr-operator helm chart
HoustonPutman commented on pull request #200: URL: https://github.com/apache/lucene-solr-operator/pull/200#issuecomment-767049150 I'll work on that as well @vladiceanu , but it is a separate issue. Thanks for bringing it up! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf closed pull request #2240: LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains
gus-asf closed pull request #2240: URL: https://github.com/apache/lucene-solr/pull/2240 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
bruno-roustant commented on a change in pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563785114 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException { } private static class TermsDict extends BaseTermsEnum { +static final int PADDING_LENGTH = 7; Review comment: Ok, in this case can we rename it LZ4_DECOMPRESSOR_PADDING and add this comment about the decompression speed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] NazerkeBS commented on a change in pull request #2230: SOLR-15011: /admin/logging handler is configured logs to all nodes
NazerkeBS commented on a change in pull request #2230: URL: https://github.com/apache/lucene-solr/pull/2230#discussion_r563636966 ## File path: solr/core/src/java/org/apache/solr/handler/admin/LoggingHandler.java ## @@ -151,6 +156,9 @@ public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throw rsp.add("loggers", info); } rsp.setHttpCaching(false); +if (cc != null && AdminHandlersProxy.maybeProxyToNodes(req, rsp, cc)) { Review comment: SystemInfoHandler is doing similar to this logic; This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on pull request #2205: LUCENE-9668: Deprecate MinShouldMatchSumScorer with WANDScorer
jpountz commented on pull request #2205: URL: https://github.com/apache/lucene-solr/pull/2205#issuecomment-766588060 Thank you @zacharymorn ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf merged pull request #2241: @gus-asf LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains
gus-asf merged pull request #2241: URL: https://github.com/apache/lucene-solr/pull/2241 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] vladiceanu commented on pull request #200: Apachify the solr-operator helm chart
vladiceanu commented on pull request #200: URL: https://github.com/apache/lucene-solr-operator/pull/200#issuecomment-766739705 Not sure if it's the right place to mention, but https://artifacthub.io/packages/helm/solr-operator/solr-operator also needs to be updated to point to the new chart location This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf commented on pull request #2240: LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains
gus-asf commented on pull request #2240: URL: https://github.com/apache/lucene-solr/pull/2240#issuecomment-766687637 need to remake to account for quickfix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz merged pull request #2205: LUCENE-9668: Deprecate MinShouldMatchSumScorer with WANDScorer
jpountz merged pull request #2205: URL: https://github.com/apache/lucene-solr/pull/2205 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2235: LUCENE-9690: Hunspell: support special title-case for words with apostrophe
dweiss commented on a change in pull request #2235: URL: https://github.com/apache/lucene-solr/pull/2235#discussion_r563522049 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java ## @@ -138,6 +142,23 @@ WordCase caseOf(char[] word, int length) { return lowerBuffer; } + // Special prefix handling for Catalan, French, Italian: Review comment: seems like this can be made static? ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java ## @@ -138,6 +142,23 @@ WordCase caseOf(char[] word, int length) { return lowerBuffer; } + // Special prefix handling for Catalan, French, Italian: Review comment: I'll merge it in, you can piggyback static method on a different PR if you wish. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2230: SOLR-15011: /admin/logging handler is configured logs to all nodes
dsmiley commented on a change in pull request #2230: URL: https://github.com/apache/lucene-solr/pull/2230#discussion_r563971668 ## File path: solr/core/src/java/org/apache/solr/handler/admin/LoggingHandler.java ## @@ -151,6 +156,9 @@ public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throw rsp.add("loggers", info); } rsp.setHttpCaching(false); +if (cc != null && AdminHandlersProxy.maybeProxyToNodes(req, rsp, cc)) { Review comment: SIH is doing what I suggest, which is different than what the PR is doing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range
dweiss commented on a change in pull request #2238: URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563528897 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) { if (replacement.isEmpty()) { continue; } -flags[upto++] = (char) Integer.parseInt(replacement); +int flag = Integer.parseInt(replacement); +if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well + throw new IllegalArgumentException( Review comment: It'd be great to be consistent with exceptions when parsing input - sometimes it's ParsingException, here it's IAE. ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) { @Override void appendFlag(char flag, StringBuilder to) { Review comment: Is this intentional? Because it changes the logic of concatenation (always leaving the trailing comma). I liked the previous version better (always leaving the output neat). ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java ## @@ -588,7 +577,7 @@ private boolean checkCondition( } private boolean isFlagAppendedByAffix(int affixId, char flag) { -if (affixId < 0) return false; +if (affixId < 0 || flag == 0) return false; Review comment: Wouldn't it be cleaner to add a constant alias (static variable) FLAG_UNSET for 0 and replace it throughout the code where it compares to zero? You've changed it from -1 to 0 but it really doesn't make it any clearer that it's a "default" unset state. I think it would benefit from being more verbose here. ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) { if (replacement.isEmpty()) { continue; } -flags[upto++] = (char) Integer.parseInt(replacement); +int flag = Integer.parseInt(replacement); +if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well + throw new IllegalArgumentException( Review comment: Eh, I was afraid of that. It'd be good to consolidate it at some point. ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) { @Override void appendFlag(char flag, StringBuilder to) { Review comment: Oh, so something else (other than this method) appends to that stringbuilder? Maybe those places should be fixed instead? I don't have all of the code in front of me, so the question may be naive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on pull request #2245: Move old field infos format to backwards-codecs.
jtibshirani commented on pull request #2245: URL: https://github.com/apache/lucene-solr/pull/2245#issuecomment-767205967 Thanks @iverase for pointing this out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range
donnerpeter commented on a change in pull request #2238: URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563673451 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) { if (replacement.isEmpty()) { continue; } -flags[upto++] = (char) Integer.parseInt(replacement); +int flag = Integer.parseInt(replacement); +if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well + throw new IllegalArgumentException( Review comment: `ParseException` needs some `errorOffset` obligatorily (which is dubiously filled here with the current line number), and it's not available in this method, and not all callers have anything meaningful to pass there. For consistency, we could replace `ParseException` with something less choosy :) ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) { @Override void appendFlag(char flag, StringBuilder to) { Review comment: Yes, that's `some tests failed after implementing step 1 and were fixed in step 2`. However nice it seemed, it was wrong, because other flags were appended after this one without any comma. Trailing commas are no problem, as empty flags are skipped in the previous method. ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java ## @@ -588,7 +577,7 @@ private boolean checkCondition( } private boolean isFlagAppendedByAffix(int affixId, char flag) { -if (affixId < 0) return false; +if (affixId < 0 || flag == 0) return false; Review comment: A good idea, thanks! ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) { @Override void appendFlag(char flag, StringBuilder to) { Review comment: Yes. Currently it's just one place which definitely appends no flags before this one, and may append some flags after this, so the implementation is tied to that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] thelabdude commented on a change in pull request #193: Support custom log4j2 config from user-provided ConfigMap
thelabdude commented on a change in pull request #193: URL: https://github.com/apache/lucene-solr-operator/pull/193#discussion_r564072280 ## File path: controllers/util/solr_util.go ## @@ -327,13 +330,42 @@ func GenerateStatefulSet(solrCloud *solr.SolrCloud, solrCloudStatus *solr.SolrCl envVars = append(envVars, customPodOptions.EnvVariables...) } + // Did the user provide a custom log config? + if configMapInfo[LogXmlFile] != "" { + + if configMapInfo[LogXmlMd5Annotation] != "" { + if podAnnotations == nil { + podAnnotations = make(map[string]string, 1) + } + podAnnotations[LogXmlMd5Annotation] = configMapInfo[LogXmlMd5Annotation] + } + + // cannot use /var/solr as a mountPath, so mount the custom log config in a sub-dir + volName := "log4j2-xml" + mountPath := fmt.Sprintf("/var/solr/%s-log-config", solrCloud.Name) + log4jPropsEnvVarPath := fmt.Sprintf("%s/%s", mountPath, LogXmlFile) + + solrVolumes = append(solrVolumes, corev1.Volume{ Review comment: This is a good catch! K8s allows it and results in a structure like the following in the STS: ``` volumes: - configMap: defaultMode: 420 items: - key: solr.xml path: solr.xml name: dev-custom-solr-xml name: solr-xml - configMap: defaultMode: 420 items: - key: log4j2.xml path: log4j2.xml name: dev-custom-solr-xml name: log4j2-xml ``` But we certainly should use a single volume with multiple `items` as you suggest ;-) Will fix it up and add a test for both keys being provided. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
jaisonbi commented on a change in pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563698470 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -731,7 +731,22 @@ private void doAddSortedField(FieldInfo field, DocValuesProducer valuesProducer) meta.writeLong(data.getFilePointer() - start); // ordsLength } -addTermsDict(DocValues.singleton(valuesProducer.getSorted(field))); +int valuesCount = values.getValueCount(); +switch (mode) { Review comment: yes, should use "if" instead of "switch", thanks:) ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -370,6 +378,11 @@ public void close() throws IOException { long termsIndexLength; long termsIndexAddressesOffset; long termsIndexAddressesLength; + +boolean compressed; +// Reserved for support other compressors. +int compressorCode; Review comment: will remove this..just thought we could support more types of compression algorithms here... ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException { } private static class TermsDict extends BaseTermsEnum { +static final int PADDING_LENGTH = 7; Review comment: Just refer from CompressionMode$LZ4_DECOMPRESSOR...it said add 7 padding bytes can help decompression run faster... ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -791,6 +806,107 @@ private void addTermsDict(SortedSetDocValues values) throws IOException { writeTermsIndex(values); } + private void addCompressedTermsDict(SortedSetDocValues values) throws IOException { Review comment: I will try to optimize this method...thanks for the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #2237: LUCENE-9692: Hunspell: extract Stemmer.stripAffix from similar code in prefix/suffix processing
dweiss merged pull request #2237: URL: https://github.com/apache/lucene-solr/pull/2237 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #2239: LUCENE-9695: don't merge deleted vectors
msokolov commented on a change in pull request #2239: URL: https://github.com/apache/lucene-solr/pull/2239#discussion_r563309172 ## File path: lucene/core/src/java/org/apache/lucene/codecs/VectorWriter.java ## @@ -153,13 +153,12 @@ public int nextDoc() throws IOException { private final DocIDMerger docIdMerger; private final int[] ordBase; private final int cost; -private final int size; +private int size; private int docId; private VectorValuesSub current; -// For each doc with a vector, record its ord in the segments being merged. This enables random -// access into the -// unmerged segments using the ords from the merged segment. +/* For each doc with a vector, record its ord in the segments being merged. This enables random access into the unmerged segments using the ords from the merged segment. + */ Review comment: Hmmm I thought spotless would wrap this line, but it doesn't seem to complain about it ## File path: lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java ## @@ -578,6 +572,8 @@ private int createIndex(Path docsPath, Path indexPath) throws IOException { IndexWriterConfig iwc = new IndexWriterConfig().setOpenMode(IndexWriterConfig.OpenMode.CREATE); // iwc.setMergePolicy(NoMergePolicy.INSTANCE); iwc.setRAMBufferSizeMB(1994d); +iwc.setMaxBufferedDocs(1); Review comment: Oh I did not mean to include this change here. Probably we will want to have some command line parameter to control this, but for now having the default be to index a large segment is probably better ## File path: lucene/core/src/java/org/apache/lucene/codecs/VectorWriter.java ## @@ -194,6 +197,7 @@ public int nextDoc() throws IOException { current = docIdMerger.next(); if (current == null) { docId = NO_MORE_DOCS; +size = ord; Review comment: Thanks, yes I did. ## File path: lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java ## @@ -124,21 +126,84 @@ public void testMerge() throws Exception { } } - private void dumpGraph(KnnGraphValues values, int size) throws IOException { + /** + * Verify that we get the *same* graph by indexing one segment as we do by indexing two segments + * and merging. + */ + public void testMergeProducesSameGraph() throws Exception { Review comment: thanks, yes this flushed out the two problems I saw, so I'm pretty confident they are fixed now, after running this a few 100 times. I had also wanted to add a test asserting that KNN search precision remains above some threshold, but sadly with random vectors, it would always eventually fail, even though mostly it would succeed, so not a very useful unit test and I removed it. Probably we can add something to luceneutil ## File path: lucene/core/src/test/org/apache/lucene/index/TestVectorValues.java ## @@ -748,29 +751,107 @@ public void testRandom() throws Exception { assertEquals(dimension, v.length); String idString = ctx.reader().document(docId).getField("id").stringValue(); int id = Integer.parseInt(idString); -assertArrayEquals(idString, values[id], v, 0); -++valueCount; +if (ctx.reader().getLiveDocs() == null || ctx.reader().getLiveDocs().get(docId)) { + assertArrayEquals(idString, values[id], v, 0); + ++valueCount; +} else { + assertNull(values[id]); +} } } assertEquals(numValues, valueCount); -assertEquals(numValues, totalSize); +assertEquals(numValues, totalSize - numDeletes); + } +} + } + + /** + * Index random vectors, sometimes skipping documents, sometimes deleting a document, sometimes + * merging, sometimes sorting the index, and verify that the expected values can be read back + * consistently. + */ + public void testRandom2() throws Exception { Review comment: I forgot why I had done this (there is already a testRandom), so I added some comments explaining how it differs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2091: Jira/solr 14778
muse-dev[bot] commented on a change in pull request #2091: URL: https://github.com/apache/lucene-solr/pull/2091#discussion_r564197768 ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java ## @@ -94,7 +98,8 @@ public void testFloatEncoding() throws Exception { Review comment: *NULLPTR_DEREFERENCE:* call to `TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses memory that is the null pointer on line 84. ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java ## @@ -120,26 +125,37 @@ void assertTermEquals(String expected, TokenStream stream, byte[] expectPay) thr Review comment: *NULLPTR_DEREFERENCE:* call to `TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses memory that is the null pointer on line 106. ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java ## @@ -51,9 +53,11 @@ public void testPayloads() throws Exception { public void testNext() throws Exception { String test = "The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ dogs|NN"; -DelimitedPayloadTokenFilter filter = new DelimitedPayloadTokenFilter - (whitespaceMockTokenizer(test), - DelimitedPayloadTokenFilter.DEFAULT_DELIMITER, new IdentityEncoder()); +DelimitedPayloadTokenFilter filter = +new DelimitedPayloadTokenFilter( +whitespaceMockTokenizer(test), +DelimitedPayloadTokenFilter.DEFAULT_DELIMITER, +new IdentityEncoder()); filter.reset(); assertTermEquals("The", filter, null); Review comment: *NULLPTR_DEREFERENCE:* call to `TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses memory that is the null pointer on line 62. ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java ## @@ -51,9 +53,11 @@ public void testPayloads() throws Exception { Review comment: *NULLPTR_DEREFERENCE:* call to `TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses memory that is the null pointer on line 38. ## File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/shingle/TestShingleAnalyzerWrapper.java ## @@ -0,0 +1,505 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.analysis.shingle; + +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.analysis.BaseTokenStreamTestCase; +import org.apache.lucene.analysis.CharArraySet; +import org.apache.lucene.analysis.MockAnalyzer; +import org.apache.lucene.analysis.MockTokenizer; +import org.apache.lucene.analysis.StopFilter; +import org.apache.lucene.analysis.TokenFilter; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.Tokenizer; +import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; +import org.apache.lucene.document.TextField; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.BooleanClause; +import org.apache.lucene.search.BooleanQuery; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.PhraseQuery; +import org.apache.lucene.search.ScoreDoc; +import org.apache.lucene.search.TermQuery; +import org.apache.lucene.store.Directory; + +/** A test class for ShingleAnalyzerWrapper as regards queries and scoring. */ +public class TestShingleAnalyzerWrapper extends BaseTokenStreamTestCase { + private Analyzer analyzer; + private IndexSearcher searcher; + private IndexReader reader; + private Directory directory; + + /** + * Set up a new index in RAM with three test phrases and the supplied
[jira] [Commented] (LUCENE-9694) New tool for creating a deterministic index
[ https://issues.apache.org/jira/browse/LUCENE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271790#comment-17271790 ] Haoyu Zhai commented on LUCENE-9694: I've opened a PR for this: https://github.com/apache/lucene-solr/pull/2246 > New tool for creating a deterministic index > --- > > Key: LUCENE-9694 > URL: https://issues.apache.org/jira/browse/LUCENE-9694 > Project: Lucene - Core > Issue Type: New Feature > Components: general/tools >Reporter: Haoyu Zhai >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Lucene's index is segmented, and sometimes number of segments and documents > arrangement greatly impact performance. > Given a stable index sort, our team create a tool that records document > arrangement (called index map) of an index and rearrange another index > (consists of same documents) into the same structure (segment num, and > documents included in each segment). > This tool could be also used in lucene benchmarks for a faster deterministic > index construction (if I understand correctly lucene benchmark is using a > single thread manner to achieve this). > > We've already had some discussion in email > [https://markmail.org/message/lbtdntclpnocmfuf] > And I've implemented the first method, using {{IndexWriter.addIndexes}} and a > customized {{FilteredCodecReader}} to achieve the goal. The index > construction time is about 25min and time executing this tool is about 10min. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zhaih opened a new pull request #2246: LUCENE-9694: New tool for creating a deterministic index
zhaih opened a new pull request #2246: URL: https://github.com/apache/lucene-solr/pull/2246 # Description Create a new tool `IndexRearranger`, which could rearrange a built index concurrently to desired segment number and document distribution # Solution Essentially combines `IndexWriter.addIndexes` and `FilterCodecReader` to select only certain documents into 1 segment # Tests Added one unit test testing rearranger. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14330) Return docs with null value in expand for field when collapse has nullPolicy=collapse
[ https://issues.apache.org/jira/browse/SOLR-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-14330: -- Attachment: SOLR-14330.patch Assignee: Chris M. Hostetter Status: Open (was: Open) Attaching a strawman patch with some tests. The patch introduces an {{expand.nullGroup=true|false}} option to control if/when this behavior happens, and works with both nullPolicy=collapse and nullPolicy=expand (although obviously it only creates a single group for null docs in the {{expanded}} section). It should also work fine with {{expand.field}} and {{expand.q}} type situations, but there's nocommits to actually test that. wanted to put the patch out before getting too deep into new tests incase people have concerns about the semantics/behavior/UX and wanted to discuss if it should behave or be implemented differently. > Return docs with null value in expand for field when collapse has > nullPolicy=collapse > - > > Key: SOLR-14330 > URL: https://issues.apache.org/jira/browse/SOLR-14330 > Project: Solr > Issue Type: Wish >Reporter: Munendra S N >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14330.patch > > > When documents doesn't contain value for field then, with collapse either > those documents could be either ignored(default), collapsed(one document is > chosen) or expanded(all are returned). This is controlled by {{nullPolicy}} > When {{nullPolicy}} is {{collapse}}, it would be nice to return all documents > with {{null}} value in expand block if {{expand=true}} > Also, when used with {{expand.field}}, even then we should return such > documents -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-6059) Basic support for Cross-Origin resource sharing (CORS) in search requests
[ https://issues.apache.org/jira/browse/SOLR-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271758#comment-17271758 ] Tomas Eduardo Fernandez Lobbe commented on SOLR-6059: - In this Jira, the idea was to allow CORS in search requests only via a {{SearchComponent}} (the main use was for an autocomplete feature), so it's unrelated to the V1/V2 admin APIs. Not sure if this is the right approach compared to the {{web.xml}} changes suggested in SOLR-12292. > Basic support for Cross-Origin resource sharing (CORS) in search requests > - > > Key: SOLR-6059 > URL: https://issues.apache.org/jira/browse/SOLR-6059 > Project: Solr > Issue Type: New Feature >Affects Versions: 4.9, 6.0 >Reporter: Tomas Eduardo Fernandez Lobbe >Priority: Major > Attachments: SOLR-6059.patch > > > Support cross-origin requests to specific search request handlers. > See http://www.w3.org/TR/cors -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15089) Allow backup/restoration to Amazon's S3 blobstore
[ https://issues.apache.org/jira/browse/SOLR-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271751#comment-17271751 ] Varun Thacker commented on SOLR-15089: -- This is exciting! Maybe we could start by using an S3 Mock? > Allow backup/restoration to Amazon's S3 blobstore > -- > > Key: SOLR-15089 > URL: https://issues.apache.org/jira/browse/SOLR-15089 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Priority: Major > > Solr's BackupRepository interface provides an abstraction around the physical > location/format that backups are stored in. This allows plugin writers to > create "repositories" for a variety of storage mediums. It'd be nice if Solr > offered more mediums out of the box though, such as some of the "blobstore" > offerings provided by various cloud providers. > This ticket proposes that a "BackupRepository" implementation for Amazon's > popular 'S3' blobstore, so that Solr users can use it for backups without > needing to write their own code. > Amazon offers a s3 Java client with acceptable licensing, and the required > code is relatively simple. The biggest challenge in supporting this will > likely be procedural - integration testing requires S3 access and S3 access > costs money. We can check with INFRA to see if there is any way to get cloud > credits for an integration test to run in nightly Jenkins runs on the ASF > Jenkins server. Alternatively we can try to stub out the blobstore in some > reliable way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on pull request #2245: Move old field infos format to backwards-codecs.
jtibshirani commented on pull request #2245: URL: https://github.com/apache/lucene-solr/pull/2245#issuecomment-767205967 Thanks @iverase for pointing this out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2245: Move old field infos format to backwards-codecs.
jtibshirani commented on a change in pull request #2245: URL: https://github.com/apache/lucene-solr/pull/2245#discussion_r564138601 ## File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene60/Lucene60FieldInfosFormat.java ## @@ -311,6 +312,11 @@ private static IndexOptions getIndexOptions(IndexInput input, byte b) throws IOE } } + /** + * Note: although this format is only used on older versions, we need to keep the write logic Review comment: I hope this is accurate, would appreciate someone double-checking it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2245: Move old field infos format to backwards-codecs.
jtibshirani commented on a change in pull request #2245: URL: https://github.com/apache/lucene-solr/pull/2245#discussion_r564138601 ## File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene60/Lucene60FieldInfosFormat.java ## @@ -311,6 +312,11 @@ private static IndexOptions getIndexOptions(IndexInput input, byte b) throws IOE } } + /** + * Note: although this format is only used on older versions, we need to keep the write logic Review comment: I hope this assumption is accurate, would appreciate someone double-checking it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani opened a new pull request #2245: Move old field infos format to backwards-codecs.
jtibshirani opened a new pull request #2245: URL: https://github.com/apache/lucene-solr/pull/2245 We introduced a new `Lucene90FieldInfosFormat`, so the old `Lucene60FieldInfosFormat` should live in backwards-codecs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] megancarey opened a new pull request #2244: SOLR-15099 Add null checks to IndexSizeTrigger
megancarey opened a new pull request #2244: URL: https://github.com/apache/lucene-solr/pull/2244 …its are enqueued # Description Want to avoid noisy NPE on core info variables since we already log.warn on line 330. # Solution Minor fix: add null checks on the core info variables, as we've seen on ZK restarts that these are unavailable. # Tests Ran IndexSizeTrigger test locally and it succeeded. Didn't add tests for this. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15099) Add null check on core info variables in IndexSizeTrigger
[ https://issues.apache.org/jira/browse/SOLR-15099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megan Carey updated SOLR-15099: --- Labels: easyfix patch-available (was: easyfix) > Add null check on core info variables in IndexSizeTrigger > - > > Key: SOLR-15099 > URL: https://issues.apache.org/jira/browse/SOLR-15099 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 8.7 >Reporter: Megan Carey >Priority: Minor > Labels: easyfix, patch-available > > A minor fix, but we've seen NPEs from IndexSizeTrigger when ZK is restarted > since it's unable to fetch the core info. All we need is a null check. > In the patch I'll also update a string value: > https://github.com/apache/lucene-solr/blob/branch_8x/solr/core/src/java/org/apache/solr/cloud/autoscaling/IndexSizeTrigger.java#L339 > And might add a log to report index size when splits are enqueued :) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15096) [REGRESSION] Collection Delete Performance significantly degraded in Java 11 v 8
[ https://issues.apache.org/jira/browse/SOLR-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271728#comment-17271728 ] Mike Drob commented on SOLR-15096: -- While I am very confident that I initially found this issue with a standalone zookeeper, I can no longer reproduce it with a zookeeper running in a local docker image (although I can still reproduce with our embedded zookeeper). I am very unclear on what is going on here. > [REGRESSION] Collection Delete Performance significantly degraded in Java 11 > v 8 > > > Key: SOLR-15096 > URL: https://issues.apache.org/jira/browse/SOLR-15096 > Project: Solr > Issue Type: Bug >Affects Versions: master (9.0) >Reporter: Mike Drob >Priority: Blocker > Fix For: master (9.0) > > Attachments: Screen Shot 2021-01-21 at 5.44.25 PM.png > > > While doing some other performance testing I noticed that collection deletion > in 8.8 (RC1) would take approximately 200ms, while the same operation would > take 800ms using a recent snapshot ({{a233ed2fd1b}}) from master branch. > I have not done further investigation at this time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] thelabdude commented on a change in pull request #193: Support custom log4j2 config from user-provided ConfigMap
thelabdude commented on a change in pull request #193: URL: https://github.com/apache/lucene-solr-operator/pull/193#discussion_r564072280 ## File path: controllers/util/solr_util.go ## @@ -327,13 +330,42 @@ func GenerateStatefulSet(solrCloud *solr.SolrCloud, solrCloudStatus *solr.SolrCl envVars = append(envVars, customPodOptions.EnvVariables...) } + // Did the user provide a custom log config? + if configMapInfo[LogXmlFile] != "" { + + if configMapInfo[LogXmlMd5Annotation] != "" { + if podAnnotations == nil { + podAnnotations = make(map[string]string, 1) + } + podAnnotations[LogXmlMd5Annotation] = configMapInfo[LogXmlMd5Annotation] + } + + // cannot use /var/solr as a mountPath, so mount the custom log config in a sub-dir + volName := "log4j2-xml" + mountPath := fmt.Sprintf("/var/solr/%s-log-config", solrCloud.Name) + log4jPropsEnvVarPath := fmt.Sprintf("%s/%s", mountPath, LogXmlFile) + + solrVolumes = append(solrVolumes, corev1.Volume{ Review comment: This is a good catch! K8s allows it and results in a structure like the following in the STS: ``` volumes: - configMap: defaultMode: 420 items: - key: solr.xml path: solr.xml name: dev-custom-solr-xml name: solr-xml - configMap: defaultMode: 420 items: - key: log4j2.xml path: log4j2.xml name: dev-custom-solr-xml name: log4j2-xml ``` But we certainly should use a single volume with multiple `items` as you suggest ;-) Will fix it up and add a test for both keys being provided. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] HoustonPutman commented on a change in pull request #193: Support custom log4j2 config from user-provided ConfigMap
HoustonPutman commented on a change in pull request #193: URL: https://github.com/apache/lucene-solr-operator/pull/193#discussion_r564044525 ## File path: controllers/solrcloud_controller.go ## @@ -182,44 +182,61 @@ func (r *SolrCloudReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) { } } - // Generate ConfigMap unless the user supplied a custom ConfigMap for solr.xml ... but the provided ConfigMap - // might be for the Prometheus exporter, so we only care if they provide a solr.xml in the CM - solrXmlConfigMapName := instance.ConfigMapName() - solrXmlMd5 := "" + // Generate ConfigMap unless the user supplied a custom ConfigMap for solr.xml + configMapInfo := make(map[string]string) if instance.Spec.CustomSolrKubeOptions.ConfigMapOptions != nil && instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap != "" { + providedConfigMapName := instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap foundConfigMap := {} - nn := types.NamespacedName{Name: instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap, Namespace: instance.Namespace} + nn := types.NamespacedName{Name: providedConfigMapName, Namespace: instance.Namespace} err = r.Get(context.TODO(), nn, foundConfigMap) if err != nil { return requeueOrNot, err // if they passed a providedConfigMap name, then it must exist } - // ConfigMap doesn't have to have a solr.xml, but if it does, then it needs to be valid! if foundConfigMap.Data != nil { - solrXml, ok := foundConfigMap.Data["solr.xml"] - if ok { + logXml, hasLogXml := foundConfigMap.Data[util.LogXmlFile] + solrXml, hasSolrXml := foundConfigMap.Data[util.SolrXmlFile] + + // if there's a user-provided config, it must have one of the expected keys + if !hasLogXml && !hasSolrXml { + return requeueOrNot, fmt.Errorf("User provided ConfigMap %s must have one of 'solr.xml' and/or 'log4j2.xml'", + providedConfigMapName) + } + + if hasSolrXml { + // make sure the user-provided solr.xml is valid if !strings.Contains(solrXml, "${hostPort:") { return requeueOrNot, fmt.Errorf("Custom solr.xml in ConfigMap %s must contain a placeholder for the 'hostPort' variable, such as ${hostPort:80}", - instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap) + providedConfigMapName) } // stored in the pod spec annotations on the statefulset so that we get a restart when solr.xml changes - solrXmlMd5 = fmt.Sprintf("%x", md5.Sum([]byte(solrXml))) - solrXmlConfigMapName = foundConfigMap.Name - } else { - return requeueOrNot, fmt.Errorf("Required 'solr.xml' key not found in provided ConfigMap %s", - instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap) + configMapInfo[util.SolrXmlMd5Annotation] = fmt.Sprintf("%x", md5.Sum([]byte(solrXml))) + configMapInfo[util.SolrXmlFile] = foundConfigMap.Name + } + + if hasLogXml { + if !strings.Contains(logXml, "monitorInterval=") { + // stored in the pod spec annotations on the statefulset so that we get a restart when the log config changes + configMapInfo[util.LogXmlMd5Annotation] = fmt.Sprintf("%x", md5.Sum([]byte(logXml))) + } // else log4j will automatically refresh for us, so no restart needed + configMapInfo[util.LogXmlFile] = foundConfigMap.Name } + } else { - return requeueOrNot, fmt.Errorf("Provided ConfigMap %s has no data", - instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap) + return requeueOrNot, fmt.Errorf("Provided ConfigMap %s has no data", providedConfigMapName) } - } else { + } + + if configMapInfo[util.SolrXmlFile] == "" { Review comment: So if a user passes a custom
[jira] [Updated] (SOLR-15083) prometheus-exporter metric solr_metrics_jvm_os_cpu_time_seconds is misnamed
[ https://issues.apache.org/jira/browse/SOLR-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-15083: Labels: monitoring newdev prometheus (was: monitoring prometheus) > prometheus-exporter metric solr_metrics_jvm_os_cpu_time_seconds is misnamed > --- > > Key: SOLR-15083 > URL: https://issues.apache.org/jira/browse/SOLR-15083 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - prometheus-exporter >Affects Versions: 8.6, master (9.0) >Reporter: Mathieu Marie >Priority: Minor > Labels: monitoring, newdev, prometheus > > *solr_metrics_jvm_os_cpu_time_seconds* metric exported by prometheus-exporter > has seconds in its name, however it appears that it is microseconds. > This name can create confusion when one wants to report it in a dashboard. > That metric is defined in > [https://github.com/apache/lucene-solr/blob/branch_8_5/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml#L247] > {code} > > .metrics["solr.jvm"] | to_entries | .[] | select(.key == > "os.processCpuTime") as $object | > ($object.value / 1000.0) as $value | > { > name : "solr_metrics_jvm_os_cpu_time_seconds", > type : "COUNTER", > help : "See following URL: > https://lucene.apache.org/solr/guide/metrics-reporting.html;, > label_names : ["item"], > label_values : ["processCpuTime"], > value: $value > } > > {code} > In the above config we see that the metric came from *os.processCpuTime*, > which itself came from JMX call > [getProcessCpuTime()|https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuTime()]. > That javadoc says > {code} > long getProcessCpuTime() > Returns the CPU time used by the process on which the Java virtual machine is > running in nanoseconds. The returned value is of nanoseconds precision but > not necessarily nanoseconds accuracy. This method returns -1 if the the > platform does not support this operation. > Returns: > the CPU time used by the process in nanoseconds, or -1 if this operation is > not supported. > {code} > Nanoseconds / 1000 is microseconds. > Either the name or the computation should be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-10203) Remove dist/test-framework from the binary download archive
[ https://issues.apache.org/jira/browse/SOLR-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved SOLR-10203. - Fix Version/s: master (9.0) Resolution: Fixed Resolving – it's close enough, given how large it was before. We might still remove this lingering jar. I think someone wanting to learn how to write a Solr plugin might best be served by looking at existing ones, and having an increasing number of them hosted off of our repo is helpful in that regard. We needn't have this last jar as a "signal" to it being possible. > Remove dist/test-framework from the binary download archive > --- > > Key: SOLR-10203 > URL: https://issues.apache.org/jira/browse/SOLR-10203 > Project: Solr > Issue Type: Sub-task > Components: Build >Affects Versions: 7.0 >Reporter: Alexandre Rafalovitch >Assignee: Alexandre Rafalovitch >Priority: Minor > Fix For: master (9.0) > > > Libraries in the dist/test-framework are shipped with every copy of Solr > binary, yet they are not used anywhere directly. They take approximately 10 > MBytes. > Remove the directory and provide guidance in a README file on how to get them > for those people who are writing their own testing solutions against Solr. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15096) [REGRESSION] Collection Delete Performance significantly degraded in Java 11 v 8
[ https://issues.apache.org/jira/browse/SOLR-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271647#comment-17271647 ] Mike Drob commented on SOLR-15096: -- Some more specific logging to highlight the differences with a few observations: {noformat:title=Java 8 RELOAD} 2021-01-25 19:45:10.804 INFO (OverseerThreadFactory-18-thread-3-processing-n:10.0.0.160:8983_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Executing Collection Cmd=action=RELOAD, asyncId=null ... 2021-01-25 19:45:10.941 INFO (qtp1448061896-108) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={core=coll-1_shard1_replica_n1=/admin/cores=RELOAD=javabin=2} status=0 QTime=133 2021-01-25 19:45:10.944 INFO (qtp1448061896-27) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={name=coll-1=RELOAD} status=0 QTime=143 {noformat} {noformat:title=Java 11 RELOAD} 2021-01-25 19:43:47.073 INFO (OverseerThreadFactory-18-thread-3-processing-n:10.0.0.160:8983_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Executing Collection Cmd=action=RELOAD, asyncId=null ... 2021-01-25 19:43:47.221 INFO (qtp1275028674-99) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={core=coll-1_shard1_replica_n1=/admin/cores=RELOAD=javabin=2} status=0 QTime=144 2021-01-25 19:43:47.297 INFO (qtp1275028674-28) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={name=coll-1=RELOAD} status=0 QTime=274 {noformat} The time of the *core* reload is pretty close - 133 v 144. I'm willing to call that within margin of error based on whatever else the OS was doing at the time. The time of the *collection* reload is more suspicious. With Java 8, we log that we are executing the cmd 3ms after the timer starts, while with Java 11 the time has already been running for *50ms* by the time we get to the same point. Perhaps there's something different about hash map lookup or call site resolution since we're technically using a method reference here. Then, there's an additional *76ms* pause after the core reload has completed before we acknowledge the collection reload. Together, these account for almost all of the performance difference that we observe. > [REGRESSION] Collection Delete Performance significantly degraded in Java 11 > v 8 > > > Key: SOLR-15096 > URL: https://issues.apache.org/jira/browse/SOLR-15096 > Project: Solr > Issue Type: Bug >Affects Versions: master (9.0) >Reporter: Mike Drob >Priority: Blocker > Fix For: master (9.0) > > Attachments: Screen Shot 2021-01-21 at 5.44.25 PM.png > > > While doing some other performance testing I noticed that collection deletion > in 8.8 (RC1) would take approximately 200ms, while the same operation would > take 800ms using a recent snapshot ({{a233ed2fd1b}}) from master branch. > I have not done further investigation at this time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15108) Randomly use new SolrCloud plugins in test suite
Megan Carey created SOLR-15108: -- Summary: Randomly use new SolrCloud plugins in test suite Key: SOLR-15108 URL: https://issues.apache.org/jira/browse/SOLR-15108 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling, SolrCloud Affects Versions: master (9.0) Reporter: Megan Carey The new pluggable Autoscaling framework is currently unused by the test suite. Ideally, our unit tests will run against this framework some percentage of the time. I'll work on configuring unit tests to switch between Legacy placement and the Affinity placement plugin, either via: # A custom solr.xml that will trade off with default randomly # Cluster properties set randomly # Using a system property that will inform the solr.xml randomly -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15108) Randomly use new 9.0 Autoscaling plugins in test suite
[ https://issues.apache.org/jira/browse/SOLR-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Megan Carey updated SOLR-15108: --- Summary: Randomly use new 9.0 Autoscaling plugins in test suite (was: Randomly use new SolrCloud plugins in test suite) > Randomly use new 9.0 Autoscaling plugins in test suite > -- > > Key: SOLR-15108 > URL: https://issues.apache.org/jira/browse/SOLR-15108 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling, SolrCloud >Affects Versions: master (9.0) >Reporter: Megan Carey >Priority: Major > > The new pluggable Autoscaling framework is currently unused by the test > suite. Ideally, our unit tests will run against this framework some > percentage of the time. > I'll work on configuring unit tests to switch between Legacy placement and > the Affinity placement plugin, either via: > # A custom solr.xml that will trade off with default randomly > # Cluster properties set randomly > # Using a system property that will inform the solr.xml randomly -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] HoustonPutman merged pull request #200: Apachify the solr-operator helm chart
HoustonPutman merged pull request #200: URL: https://github.com/apache/lucene-solr-operator/pull/200 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15078) ExpandComponent treats all docs with '0' in a numeric collapse field the same as if null
[ https://issues.apache.org/jira/browse/SOLR-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter resolved SOLR-15078. --- Fix Version/s: 8.9 master (9.0) Resolution: Fixed > ExpandComponent treats all docs with '0' in a numeric collapse field the same > as if null > > > Key: SOLR-15078 > URL: https://issues.apache.org/jira/browse/SOLR-15078 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Fix For: master (9.0), 8.9 > > Attachments: SOLR-15078.patch > > > ExpandComponent has an equivalent to the collapse qparser bug tracked in > SOLR-15047... > {quote}...has some very, _very_, old code/semantics in it that date back to > when the {{FieldCache}} was incapable of differentiating between a document > that contained '0' in the field being un-inverted, and a document that didn't > have any value in that field. > This limitation does not exist in DocValues (nor has it existed for a long > time) but as the DocValues API has evolved, and as the [...] code has been > updated to take advantage of the newer APIs that make it obvious when a > document has no value in a field, the [...] code still explicitly equates "0" > in a numeric field with the "null group" > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15078) ExpandComponent treats all docs with '0' in a numeric collapse field the same as if null
[ https://issues.apache.org/jira/browse/SOLR-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271622#comment-17271622 ] ASF subversion and git services commented on SOLR-15078: Commit d8a754a4b48d3a0a0bf7386a711deff007d63107 in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d8a754a ] SOLR-15078: Fix ExpandComponent behavior when expanding on numeric fields to differentiate '0' group from null group (cherry picked from commit 47a89aca715e18402c183ed15a6076603c63ec52) > ExpandComponent treats all docs with '0' in a numeric collapse field the same > as if null > > > Key: SOLR-15078 > URL: https://issues.apache.org/jira/browse/SOLR-15078 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-15078.patch > > > ExpandComponent has an equivalent to the collapse qparser bug tracked in > SOLR-15047... > {quote}...has some very, _very_, old code/semantics in it that date back to > when the {{FieldCache}} was incapable of differentiating between a document > that contained '0' in the field being un-inverted, and a document that didn't > have any value in that field. > This limitation does not exist in DocValues (nor has it existed for a long > time) but as the DocValues API has evolved, and as the [...] code has been > updated to take advantage of the newer APIs that make it obvious when a > document has no value in a field, the [...] code still explicitly equates "0" > in a numeric field with the "null group" > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9570) Review code diffs after automatic formatting and correct problems before it is applied
[ https://issues.apache.org/jira/browse/LUCENE-9570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271620#comment-17271620 ] ASF subversion and git services commented on LUCENE-9570: - Commit acbea9ec2676b579beb706944fe9482d8d8f44c7 in lucene-solr's branch refs/heads/branch_8x from Mike Drob [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=acbea9e ] LUCENE-9570 Add placeholder revs file to branch_8x > Review code diffs after automatic formatting and correct problems before it > is applied > -- > > Key: LUCENE-9570 > URL: https://issues.apache.org/jira/browse/LUCENE-9570 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Blocker > Fix For: master (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > Review and correct all the javadocs before they're messed up by automatic > formatting. Apply project-by-project, review diff, correct. Lots of diffs but > it should be relatively quick. > *Reviewing diffs manually* > * switch to branch jira/LUCENE-9570 which the PR is based on: > {code:java} > git remote add dweiss g...@github.com:dweiss/lucene-solr.git > git fetch dweiss > git checkout jira/LUCENE-9570 > {code} > * Open gradle/validation/spotless.gradle and locate the project/ package you > wish to review. Enable it in spotless.gradle by creating a corresponding > switch case block (refer to existing examples), for example: > {code:java} > case ":lucene:highlighter": > target "src/**" > targetExclude "**/resources/**", "**/overview.html" > break > {code} > * Reformat the code: > {code:java} > gradlew tidy && git diff -w > /tmp/diff.patch && git status > {code} > * Look at what has changed (git status) and review the differences manually > (/tmp/diff.patch). If everything looks ok, commit it directly to > jira/LUCENE-9570 or make a PR against that branch. > {code:java} > git commit -am ":lucene:core - src/**/org/apache/lucene/document/**" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] HoustonPutman commented on pull request #200: Apachify the solr-operator helm chart
HoustonPutman commented on pull request #200: URL: https://github.com/apache/lucene-solr-operator/pull/200#issuecomment-767049150 I'll work on that as well @vladiceanu , but it is a separate issue. Thanks for bringing it up! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2230: SOLR-15011: /admin/logging handler is configured logs to all nodes
dsmiley commented on a change in pull request #2230: URL: https://github.com/apache/lucene-solr/pull/2230#discussion_r563971668 ## File path: solr/core/src/java/org/apache/solr/handler/admin/LoggingHandler.java ## @@ -151,6 +156,9 @@ public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throw rsp.add("loggers", info); } rsp.setHttpCaching(false); +if (cc != null && AdminHandlersProxy.maybeProxyToNodes(req, rsp, cc)) { Review comment: SIH is doing what I suggest, which is different than what the PR is doing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-8962: - Fix Version/s: 8.7 Description: Two improvements were added: 8.6 has merge-on-commit (by Froh et. all), 8.7 has merge-on-refresh (by Simon). See \{{MergePolicy.findFullFlushMerges}} The original description follows: With near-real-time search we ask {{IndexWriter}} to write all in-memory segments to disk and open an {{IndexReader}} to search them, and this is typically a quick operation. However, when you use many threads for concurrent indexing, {{IndexWriter}} will accumulate write many small segments during {{refresh}} and this then adds search-time cost as searching must visit all of these tiny segments. The merge policy would normally quickly coalesce these small segments if given a little time ... so, could we somehow improve \{{IndexWriter'}}s refresh to optionally kick off merge policy to merge segments below some threshold before opening the near-real-time reader? It'd be a bit tricky because while we are waiting for merges, indexing may continue, and new segments may be flushed, but those new segments shouldn't be included in the point-in-time segments returned by refresh ... One could almost do this on top of Lucene today, with a custom merge policy, and some hackity logic to have the merge policy target small segments just written by refresh, but it's tricky to then open a near-real-time reader, excluding newly flushed but including newly merged segments since the refresh originally finished ... I'm not yet sure how best to solve this, so I wanted to open an issue for discussion! was: With near-real-time search we ask {{IndexWriter}} to write all in-memory segments to disk and open an {{IndexReader}} to search them, and this is typically a quick operation. However, when you use many threads for concurrent indexing, {{IndexWriter}} will accumulate write many small segments during {{refresh}} and this then adds search-time cost as searching must visit all of these tiny segments. The merge policy would normally quickly coalesce these small segments if given a little time ... so, could we somehow improve {{IndexWriter'}}s refresh to optionally kick off merge policy to merge segments below some threshold before opening the near-real-time reader? It'd be a bit tricky because while we are waiting for merges, indexing may continue, and new segments may be flushed, but those new segments shouldn't be included in the point-in-time segments returned by refresh ... One could almost do this on top of Lucene today, with a custom merge policy, and some hackity logic to have the merge policy target small segments just written by refresh, but it's tricky to then open a near-real-time reader, excluding newly flushed but including newly merged segments since the refresh originally finished ... I'm not yet sure how best to solve this, so I wanted to open an issue for discussion! > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: master (9.0), 8.6, 8.7 > > Attachments: LUCENE-8962_demo.png, failed-tests.patch, > failure_log.txt, test.diff > > Time Spent: 31h > Remaining Estimate: 0h > > Two improvements were added: 8.6 has merge-on-commit (by Froh et. all), 8.7 > has merge-on-refresh (by Simon). See \{{MergePolicy.findFullFlushMerges}} > The original description follows: > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve \{{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written
[jira] [Commented] (SOLR-15078) ExpandComponent treats all docs with '0' in a numeric collapse field the same as if null
[ https://issues.apache.org/jira/browse/SOLR-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271597#comment-17271597 ] ASF subversion and git services commented on SOLR-15078: Commit 47a89aca715e18402c183ed15a6076603c63ec52 in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=47a89ac ] SOLR-15078: Fix ExpandComponent behavior when expanding on numeric fields to differentiate '0' group from null group > ExpandComponent treats all docs with '0' in a numeric collapse field the same > as if null > > > Key: SOLR-15078 > URL: https://issues.apache.org/jira/browse/SOLR-15078 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-15078.patch > > > ExpandComponent has an equivalent to the collapse qparser bug tracked in > SOLR-15047... > {quote}...has some very, _very_, old code/semantics in it that date back to > when the {{FieldCache}} was incapable of differentiating between a document > that contained '0' in the field being un-inverted, and a document that didn't > have any value in that field. > This limitation does not exist in DocValues (nor has it existed for a long > time) but as the DocValues API has evolved, and as the [...] code has been > updated to take advantage of the newer APIs that make it obvious when a > document has no value in a field, the [...] code still explicitly equates "0" > in a numeric field with the "null group" > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr-operator] thelabdude merged pull request #195: Improve Prom exporter docs
thelabdude merged pull request #195: URL: https://github.com/apache/lucene-solr-operator/pull/195 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15096) [REGRESSION] Collection Delete Performance significantly degraded in Java 11 v 8
[ https://issues.apache.org/jira/browse/SOLR-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271589#comment-17271589 ] Mike Drob commented on SOLR-15096: -- Interestingly, {{ConcurrentDeleteAndCreateCollectionTest.testConcurrentCreateAndDeleteOverTheSameConfig}} appears to be about 10% faster with Java 11 than with Java 8. > [REGRESSION] Collection Delete Performance significantly degraded in Java 11 > v 8 > > > Key: SOLR-15096 > URL: https://issues.apache.org/jira/browse/SOLR-15096 > Project: Solr > Issue Type: Bug >Affects Versions: master (9.0) >Reporter: Mike Drob >Priority: Blocker > Fix For: master (9.0) > > Attachments: Screen Shot 2021-01-21 at 5.44.25 PM.png > > > While doing some other performance testing I noticed that collection deletion > in 8.8 (RC1) would take approximately 200ms, while the same operation would > take 800ms using a recent snapshot ({{a233ed2fd1b}}) from master branch. > I have not done further investigation at this time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15073) Unsafe cast in SystemInfoHandler
[ https://issues.apache.org/jira/browse/SOLR-15073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke resolved SOLR-15073. Resolution: Fixed Thanks [~nyivan] for reporting this issue! > Unsafe cast in SystemInfoHandler > > > Key: SOLR-15073 > URL: https://issues.apache.org/jira/browse/SOLR-15073 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6, 8.7 >Reporter: Nikolay Ivanov >Assignee: Christine Poerschke >Priority: Major > Fix For: 8.8, master (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > I have observed an unsafe cast in > SystemInfoHandler::getSecurityInfo > Is this by design? Currently I have a custom AuthorizationPlugin that > directly implements AuthorizationPlugin interface. With the latest solr > version it is not permitted anymore. A workaround is to extend the > RuleBasedAuthorizationPluginBase, which is not ideal imo. Please share your > thoughts -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)
[ https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke resolved SOLR-15071. Resolution: Fixed Thanks everyone! > Bug on LTR when using solr 8.6.3 - index out of bounds > DisiPriorityQueue.add(DisiPriorityQueue.java:102) > > > Key: SOLR-15071 > URL: https://issues.apache.org/jira/browse/SOLR-15071 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LTR >Affects Versions: 8.6, 8.7 >Reporter: Florin Babes >Assignee: Christine Poerschke >Priority: Major > Labels: ltr > Fix For: 8.8, master (9.0) > > Attachments: featurestore+model+sample_documents.zip > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Hello, > We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are > using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3 > we receive an error when we try to compute some SolrFeatures. We didn't > find any pattern of the queries that fail. > Example: > We have the following query raw parameters: > q=lg cx 4k oled 120 hz -> just of many examples > term_dq=lg cx 4k oled 120 hz > rq={!ltr model=model reRankDocs=1000 store=feature_store > efi.term=${term_dq}} > defType=edismax, > mm=2<75% > The features are something like this: > { > "name":"similarity_query_fileld_1", > "class":"org.apache.solr.ltr.feature.SolrFeature", > "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"}, > "store":"feature_store" > }, > { > "name":"similarity_query_field_2", > "class":"org.apache.solr.ltr.feature.SolrFeature", > "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"}, > "store":"feature_store" > } > We are testing ~6300 production queries and for about 1% of them we receive > that following error message: > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","java.lang.ArrayIndexOutOfBoundsException"], > "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds > for length 2", > The stacktrace is : > org.apache.solr.common.SolrException: > java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2 > at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159 > 9) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413 > ) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596) > at > org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC > omponent.java:1513) > at > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403 > ) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler. > java:360) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java > :214) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav > a:1596) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235 > ) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161 > 0) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233 > ) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130 > 0) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580 > ) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215 > ) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at >
[jira] [Commented] (SOLR-15096) [REGRESSION] Collection Delete Performance significantly degraded in Java 11 v 8
[ https://issues.apache.org/jira/browse/SOLR-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271571#comment-17271571 ] Mike Drob commented on SOLR-15096: -- Minor updates: Java 15 behaves similar to Java 11. Using standalone mode and testing with core admin APIs does not seem affected. > [REGRESSION] Collection Delete Performance significantly degraded in Java 11 > v 8 > > > Key: SOLR-15096 > URL: https://issues.apache.org/jira/browse/SOLR-15096 > Project: Solr > Issue Type: Bug >Affects Versions: master (9.0) >Reporter: Mike Drob >Priority: Blocker > Fix For: master (9.0) > > Attachments: Screen Shot 2021-01-21 at 5.44.25 PM.png > > > While doing some other performance testing I noticed that collection deletion > in 8.8 (RC1) would take approximately 200ms, while the same operation would > take 800ms using a recent snapshot ({{a233ed2fd1b}}) from master branch. > I have not done further investigation at this time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15107) 500 errors due to "ArithmeticException: / by zero" in Jetty's AbstractConnectionPool
Chris M. Hostetter created SOLR-15107: - Summary: 500 errors due to "ArithmeticException: / by zero" in Jetty's AbstractConnectionPool Key: SOLR-15107 URL: https://issues.apache.org/jira/browse/SOLR-15107 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 8.8 Reporter: Chris M. Hostetter Upstream bug affects jetty 9.4.32+, fixed in 9.4.36+... * [https://github.com/eclipse/jetty.project/issues/5731] * [https://github.com/eclipse/jetty.project/issues/5819] * [https://github.com/eclipse/jetty.project/pull/5820] First affects Solr 8.8 due to Jetty upgrade in SOLR-14844 Looks like this in logs... {noformat} 123391 ERROR (qtp1570620031-1192) [x:collection1 ] o.a.s.h.RequestHandlerBase java.lang.ArithmeticException: / by zero at org.eclipse.jetty.util.Pool.acquire(Pool.java:278) at org.eclipse.jetty.client.AbstractConnectionPool.activate(AbstractConnectionPool.java:284) at org.eclipse.jetty.client.AbstractConnectionPool.acquire(AbstractConnectionPool.java:209) at org.eclipse.jetty.client.HttpDestination.process(HttpDestination.java:331) at org.eclipse.jetty.client.HttpDestination.send(HttpDestination.java:318) at org.eclipse.jetty.client.HttpDestination.send(HttpDestination.java:311) at org.eclipse.jetty.client.HttpDestination.send(HttpDestination.java:288) at org.eclipse.jetty.client.HttpDestination.send(HttpDestination.java:265) at org.eclipse.jetty.client.HttpClient.send(HttpClient.java:594) at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:772) at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:764) at org.apache.solr.client.solrj.impl.Http2SolrClient.asyncRequest(Http2SolrClient.java:387) at org.apache.solr.client.solrj.impl.LBHttp2SolrClient.doRequest(LBHttp2SolrClient.java:151) at org.apache.solr.client.solrj.impl.LBHttp2SolrClient.asyncReq(LBHttp2SolrClient.java:127) at org.apache.solr.handler.component.HttpShardHandler.submit(HttpShardHandler.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:454) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2610) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:518) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:432) ...{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15076) Inconsistent metric types in ReplicationHandler
[ https://issues.apache.org/jira/browse/SOLR-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved SOLR-15076. - Resolution: Fixed > Inconsistent metric types in ReplicationHandler > --- > > Key: SOLR-15076 > URL: https://issues.apache.org/jira/browse/SOLR-15076 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.9 > > > As pointed out by [~dsmiley] in SOLR-14924 there are cases when > ReplicaHandler returns unexpected type of a metric (string instead of a > number): > {quote} > There are test failures in TestReplicationHandler introduced by this change > (I think). See > https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/ > java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.String > at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0) > at > org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361) > The test could be made to convert to a string. But it suggests an > inconsistency that ought to be fixed – apparently ReplicationHandler > sometimes returns its details using all strings and othertimes with the typed > variants – and that's bad. > {quote} > Reproducing seed from David: > {quote} > gradlew :solr:core:test --tests > "org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 > -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 > -Ptests.file.encoding=ISO-8859-1 > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14924) Some ReplicationHandler metrics are reported using incorrect types
[ https://issues.apache.org/jira/browse/SOLR-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271394#comment-17271394 ] ASF subversion and git services commented on SOLR-14924: Commit eaae9d18822c7648d0e0cfacc4e9e79b67ffbe90 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eaae9d1 ] SOLR-15076: Fix wrong test assumption - type of this property has changed in SOLR-14924. > Some ReplicationHandler metrics are reported using incorrect types > -- > > Key: SOLR-14924 > URL: https://issues.apache.org/jira/browse/SOLR-14924 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 8.6.3, 8.7 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.7 > > Attachments: SOLR-14924.patch > > > Some metrics reported from {{ReplicationHandler}} use incorrect types - they > are reported as String values instead of the numerics. > This is caused by using {{ReplicationHandler.addVal}} utility method with the > type {{Integer.class}}, which the method doesn't support and it returns the > value as a string. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15076) Inconsistent metric types in ReplicationHandler
[ https://issues.apache.org/jira/browse/SOLR-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271393#comment-17271393 ] ASF subversion and git services commented on SOLR-15076: Commit eaae9d18822c7648d0e0cfacc4e9e79b67ffbe90 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eaae9d1 ] SOLR-15076: Fix wrong test assumption - type of this property has changed in SOLR-14924. > Inconsistent metric types in ReplicationHandler > --- > > Key: SOLR-15076 > URL: https://issues.apache.org/jira/browse/SOLR-15076 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.9 > > > As pointed out by [~dsmiley] in SOLR-14924 there are cases when > ReplicaHandler returns unexpected type of a metric (string instead of a > number): > {quote} > There are test failures in TestReplicationHandler introduced by this change > (I think). See > https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/ > java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.String > at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0) > at > org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361) > The test could be made to convert to a string. But it suggests an > inconsistency that ought to be fixed – apparently ReplicationHandler > sometimes returns its details using all strings and othertimes with the typed > variants – and that's bad. > {quote} > Reproducing seed from David: > {quote} > gradlew :solr:core:test --tests > "org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 > -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 > -Ptests.file.encoding=ISO-8859-1 > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15076) Inconsistent metric types in ReplicationHandler
[ https://issues.apache.org/jira/browse/SOLR-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271388#comment-17271388 ] ASF subversion and git services commented on SOLR-15076: Commit 166d39a12eff53d9cfdf47b101cfe98a7020dcba in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=166d39a ] SOLR-15076: Fix wrong test assumption - type of this property has changed in SOLR-14924. > Inconsistent metric types in ReplicationHandler > --- > > Key: SOLR-15076 > URL: https://issues.apache.org/jira/browse/SOLR-15076 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.9 > > > As pointed out by [~dsmiley] in SOLR-14924 there are cases when > ReplicaHandler returns unexpected type of a metric (string instead of a > number): > {quote} > There are test failures in TestReplicationHandler introduced by this change > (I think). See > https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/ > java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.String > at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0) > at > org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361) > The test could be made to convert to a string. But it suggests an > inconsistency that ought to be fixed – apparently ReplicationHandler > sometimes returns its details using all strings and othertimes with the typed > variants – and that's bad. > {quote} > Reproducing seed from David: > {quote} > gradlew :solr:core:test --tests > "org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 > -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 > -Ptests.file.encoding=ISO-8859-1 > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14924) Some ReplicationHandler metrics are reported using incorrect types
[ https://issues.apache.org/jira/browse/SOLR-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271389#comment-17271389 ] ASF subversion and git services commented on SOLR-14924: Commit 166d39a12eff53d9cfdf47b101cfe98a7020dcba in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=166d39a ] SOLR-15076: Fix wrong test assumption - type of this property has changed in SOLR-14924. > Some ReplicationHandler metrics are reported using incorrect types > -- > > Key: SOLR-14924 > URL: https://issues.apache.org/jira/browse/SOLR-14924 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 8.6.3, 8.7 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: 8.7 > > Attachments: SOLR-14924.patch > > > Some metrics reported from {{ReplicationHandler}} use incorrect types - they > are reported as String values instead of the numerics. > This is caused by using {{ReplicationHandler.addVal}} utility method with the > type {{Integer.class}}, which the method doesn't support and it returns the > value as a string. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
bruno-roustant commented on a change in pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563785114 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException { } private static class TermsDict extends BaseTermsEnum { +static final int PADDING_LENGTH = 7; Review comment: Ok, in this case can we rename it LZ4_DECOMPRESSOR_PADDING and add this comment about the decompression speed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14067) Move StatelessScriptUpdateProcessor to a contrib
[ https://issues.apache.org/jira/browse/SOLR-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271350#comment-17271350 ] ASF subversion and git services commented on SOLR-14067: Commit ce1bba6d66ae71d928e8d3932cfc7409ee5fdf53 in lucene-solr's branch refs/heads/master from ep...@opensourceconnections.com [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ce1bba6 ] Revert "SOLR-14067: v3 Create /contrib/scripting module with ScriptingUpdateProcessor (#2215)" This reverts commit cf5db8d6513e0f3e556ab6ee1b9ad3a6472ad2f2. > Move StatelessScriptUpdateProcessor to a contrib > > > Key: SOLR-14067 > URL: https://issues.apache.org/jira/browse/SOLR-14067 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: David Eric Pugh >Priority: Major > Fix For: master (9.0) > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Move server-side scripting out of core and into a new contrib. This is > better for security. > Former description: > > We should eliminate all scripting capabilities within Solr. Let us start with > the StatelessScriptUpdateProcessor deprecation/removal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15106) Thread in OverseerTaskProcessor should not "return"
Mathieu Marie created SOLR-15106: Summary: Thread in OverseerTaskProcessor should not "return" Key: SOLR-15106 URL: https://issues.apache.org/jira/browse/SOLR-15106 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 8.6, master (9.0) Reporter: Mathieu Marie I have encountered a scenario were ZK was not accessible for a long time (due to _jute.maxbuffer_ issue, but not related to the rest of this issue). During that time, the ClusterStateUpdater and OC queues from the Overseer got filled with 1200+ messages. Once we restored ZK availability, the ClusterStateUpdater queue got emptied, but not the OC one. The Overseer stopped to dequeue from the OC queue. After some digging in the code it seems that a *return* from the overseer thread starting the runners could be the issue. Code in OverseerTaskProcessor.java (https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L357) The lines of codes that immediately follow should also be reviewed carefully as they also return or interrupt the thread that is responsible to execute the runners. Anyhow, if anybody hit that same issue, the quick workaround is to bump the overseer instance to elect a new overseer on another node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
jaisonbi commented on a change in pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563708394 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -791,6 +806,107 @@ private void addTermsDict(SortedSetDocValues values) throws IOException { writeTermsIndex(values); } + private void addCompressedTermsDict(SortedSetDocValues values) throws IOException { Review comment: I will try to optimize this method...thanks for the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
jaisonbi commented on a change in pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563702718 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException { } private static class TermsDict extends BaseTermsEnum { +static final int PADDING_LENGTH = 7; Review comment: Just refer from CompressionMode$LZ4_DECOMPRESSOR...it said add 7 padding bytes can help decompression run faster... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
jaisonbi commented on a change in pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563700602 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -370,6 +378,11 @@ public void close() throws IOException { long termsIndexLength; long termsIndexAddressesOffset; long termsIndexAddressesLength; + +boolean compressed; +// Reserved for support other compressors. +int compressorCode; Review comment: will remove this..just thought we could support more types of compression algorithms here... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter
[ https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271292#comment-17271292 ] ASF subversion and git services commented on LUCENE-9575: - Commit f942b2dd8a484879d806fcc4fa95c7393f348d9e in lucene-solr's branch refs/heads/master from Gus Heck [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f942b2d ] @gus-asf LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains (#2241) LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains to fix failure on seed 65EA739C95F40313 > Add PatternTypingFilter > --- > > Key: LUCENE-9575 > URL: https://issues.apache.org/jira/browse/LUCENE-9575 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > One of the key asks when the Library of Congress was asking me to develop the > Advanced Query Parser was to be able to recognize arbitrary patterns that > included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they > wanted 401k and 401(k) to match documents with either style reference, and > NOT match documents that happen to have isolated 401 or k tokens (i.e. not > documents about the http status code) And of course we wanted to give up as > little of the text analysis features they were already using. > This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and > one solr specific filter in SOLR-14597 that re-analyzes tokens with an > arbitrary analyzer defined for a type in the solr schema, combine to achieve > this. > This filter has the job of spotting the patterns, and adding the intended > synonym as at type to the token (from which minimal punctuation has been > removed). It also sets flags on the token which are retained through the > analysis chain, and at the very end the type is converted to a synonym and > the original token(s) for that type are dropped avoiding the match on 401 > (for example) > The pattern matching is specified in a file that looks like: > {code} > 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2 > 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3 > 2 C\+\+ ::: c_plus_plus > {code} > That file would match match legal reference patterns such as 401(k), 401k, > 501(c)3 and C++ The format is: > ::: > and groups in the pattern are substituted into the replacement so the first > line above would create synonyms such as: > {code} > 401k --> legal2_401_k > 401(k) --> legal2_401_k > 503(c) --> legal2_503_c > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter
[ https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271293#comment-17271293 ] ASF subversion and git services commented on LUCENE-9575: - Commit f942b2dd8a484879d806fcc4fa95c7393f348d9e in lucene-solr's branch refs/heads/master from Gus Heck [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f942b2d ] @gus-asf LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains (#2241) LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains to fix failure on seed 65EA739C95F40313 > Add PatternTypingFilter > --- > > Key: LUCENE-9575 > URL: https://issues.apache.org/jira/browse/LUCENE-9575 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > One of the key asks when the Library of Congress was asking me to develop the > Advanced Query Parser was to be able to recognize arbitrary patterns that > included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they > wanted 401k and 401(k) to match documents with either style reference, and > NOT match documents that happen to have isolated 401 or k tokens (i.e. not > documents about the http status code) And of course we wanted to give up as > little of the text analysis features they were already using. > This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and > one solr specific filter in SOLR-14597 that re-analyzes tokens with an > arbitrary analyzer defined for a type in the solr schema, combine to achieve > this. > This filter has the job of spotting the patterns, and adding the intended > synonym as at type to the token (from which minimal punctuation has been > removed). It also sets flags on the token which are retained through the > analysis chain, and at the very end the type is converted to a synonym and > the original token(s) for that type are dropped avoiding the match on 401 > (for example) > The pattern matching is specified in a file that looks like: > {code} > 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2 > 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3 > 2 C\+\+ ::: c_plus_plus > {code} > That file would match match legal reference patterns such as 401(k), 401k, > 501(c)3 and C++ The format is: > ::: > and groups in the pattern are substituted into the replacement so the first > line above would create synonyms such as: > {code} > 401k --> legal2_401_k > 401(k) --> legal2_401_k > 503(c) --> legal2_503_c > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues
jaisonbi commented on a change in pull request #2213: URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563698470 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java ## @@ -731,7 +731,22 @@ private void doAddSortedField(FieldInfo field, DocValuesProducer valuesProducer) meta.writeLong(data.getFilePointer() - start); // ordsLength } -addTermsDict(DocValues.singleton(valuesProducer.getSorted(field))); +int valuesCount = values.getValueCount(); +switch (mode) { Review comment: yes, should use "if" instead of "switch", thanks:) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf merged pull request #2241: @gus-asf LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains
gus-asf merged pull request #2241: URL: https://github.com/apache/lucene-solr/pull/2241 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter opened a new pull request #2243: LUCENE-9698: Hunspell: reuse char[] when possible when stripping affix
donnerpeter opened a new pull request #2243: URL: https://github.com/apache/lucene-solr/pull/2243 # Description There's no need to allocate another char[] if we can analyze a sub-array of what we already have # Solution In addition to `char[]` and `int length`, pass `int offset` everywhere, and adjust offset/length instead of allocating a new array, when an affix is removed and nothing is added in its place. # Tests No behavior change, no new tests # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9698) Hunspell: reuse char[] when possible when stripping affix
Peter Gromov created LUCENE-9698: Summary: Hunspell: reuse char[] when possible when stripping affix Key: LUCENE-9698 URL: https://issues.apache.org/jira/browse/LUCENE-9698 Project: Lucene - Core Issue Type: Sub-task Reporter: Peter Gromov to reduce allocation rate -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter opened a new pull request #2242: LUCENE-9697: Hunspell Stemmer: use the same FST.BytesReader on all recursion levels
donnerpeter opened a new pull request #2242: URL: https://github.com/apache/lucene-solr/pull/2242 # Description There's no need to allocate 3 `BytesReader`s when just one would be enough, as it's used as a scratch, without a need to preserve any state between uses. # Solution Allocate just one `BytesReader` per affix type # Tests No behavior change, no tests # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15105) Sum aggregation not supported for externalField [Exception]
[ https://issues.apache.org/jira/browse/SOLR-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Khandelwal updated SOLR-15105: - Description: I upgraded solr (earlier version was 8.1.0) and got the following exception: {code:java} org.apache.solr.common.SolrException: sum aggregation not supported for popularityFile at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45) at org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221) at org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87){code} It happens when doing sum aggregation on a field type of solr.ExternalFileField Here's the fieldType config: {code:java} {code} was: I upgraded solr (earlier version was 8.1.0) and got the following exception: {code:java} org.apache.solr.common.SolrException: sum aggregation not supported for popularityFile at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45) at org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221) at org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87) It happens when doing sum aggregation on a field type of solr.ExternalFileField{code} Here's the fieldType config: {code:java} {code} > Sum aggregation not supported for externalField [Exception] > --- > > Key: SOLR-15105 > URL: https://issues.apache.org/jira/browse/SOLR-15105 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 8.7 >Reporter: Hitesh Khandelwal >Priority: Major > > I upgraded solr (earlier version was 8.1.0) and got the following exception: > {code:java} > org.apache.solr.common.SolrException: sum aggregation not supported for > popularityFile > at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45) > at > org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221) > at > org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87){code} > It happens when doing sum aggregation on a field type of > solr.ExternalFileField > Here's the fieldType config: > {code:java} > indexed="true" class="solr.ExternalFileField"/>{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15105) Sum aggregation not supported for externalField [Exception]
[ https://issues.apache.org/jira/browse/SOLR-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Khandelwal updated SOLR-15105: - Description: I upgraded solr (earlier version was 8.1.0) and got the following exception: {code:java} org.apache.solr.common.SolrException: sum aggregation not supported for popularityFile at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45) at org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221) at org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87) It happens when doing sum aggregation on a field type of solr.ExternalFileField{code} Here's the fieldType config: {code:java} {code} was: I upgraded solr (earlier version was 8.1.0) and got the following exception: org.apache.solr.common.SolrException: sum aggregation not supported for popularityFile at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45) at org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221) at org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87) It happens when doing sum aggregation on a field type of solr.ExternalFileField Here's the fieldType config: {code:java} {code} > Sum aggregation not supported for externalField [Exception] > --- > > Key: SOLR-15105 > URL: https://issues.apache.org/jira/browse/SOLR-15105 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 8.7 >Reporter: Hitesh Khandelwal >Priority: Major > > I upgraded solr (earlier version was 8.1.0) and got the following exception: > {code:java} > org.apache.solr.common.SolrException: sum aggregation not supported for > popularityFile > at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45) > at > org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221) > at > org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87) > It happens when doing sum aggregation on a field type of > solr.ExternalFileField{code} > Here's the fieldType config: > {code:java} > indexed="true" class="solr.ExternalFileField"/>{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15105) Sum aggregation not supported for externalField [Exception]
Hitesh Khandelwal created SOLR-15105: Summary: Sum aggregation not supported for externalField [Exception] Key: SOLR-15105 URL: https://issues.apache.org/jira/browse/SOLR-15105 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: search Affects Versions: 8.7 Reporter: Hitesh Khandelwal I upgraded solr (earlier version was 8.1.0) and got the following exception: org.apache.solr.common.SolrException: sum aggregation not supported for popularityFile at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45) at org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221) at org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87) It happens when doing sum aggregation on a field type of solr.ExternalFileField Here's the fieldType config: {code:java} {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range
donnerpeter commented on a change in pull request #2238: URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563684678 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) { @Override void appendFlag(char flag, StringBuilder to) { Review comment: Yes. Currently it's just one place which definitely appends no flags before this one, and may append some flags after this, so the implementation is tied to that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9697) Hunspell Stemmer: use the same FST.BytesReader on all recursion levels
Peter Gromov created LUCENE-9697: Summary: Hunspell Stemmer: use the same FST.BytesReader on all recursion levels Key: LUCENE-9697 URL: https://issues.apache.org/jira/browse/LUCENE-9697 Project: Lucene - Core Issue Type: Sub-task Reporter: Peter Gromov -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range
dweiss commented on a change in pull request #2238: URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563680034 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) { @Override void appendFlag(char flag, StringBuilder to) { Review comment: Oh, so something else (other than this method) appends to that stringbuilder? Maybe those places should be fixed instead? I don't have all of the code in front of me, so the question may be naive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range
dweiss commented on a change in pull request #2238: URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563678211 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) { if (replacement.isEmpty()) { continue; } -flags[upto++] = (char) Integer.parseInt(replacement); +int flag = Integer.parseInt(replacement); +if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well + throw new IllegalArgumentException( Review comment: Eh, I was afraid of that. It'd be good to consolidate it at some point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range
donnerpeter commented on a change in pull request #2238: URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563675377 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java ## @@ -588,7 +577,7 @@ private boolean checkCondition( } private boolean isFlagAppendedByAffix(int affixId, char flag) { -if (affixId < 0) return false; +if (affixId < 0 || flag == 0) return false; Review comment: A good idea, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range
donnerpeter commented on a change in pull request #2238: URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563675171 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) { @Override void appendFlag(char flag, StringBuilder to) { Review comment: Yes, that's `some tests failed after implementing step 1 and were fixed in step 2`. However nice it seemed, it was wrong, because other flags were appended after this one without any comma. Trailing commas are no problem, as empty flags are skipped in the previous method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range
donnerpeter commented on a change in pull request #2238: URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563673451 ## File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java ## @@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) { if (replacement.isEmpty()) { continue; } -flags[upto++] = (char) Integer.parseInt(replacement); +int flag = Integer.parseInt(replacement); +if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well + throw new IllegalArgumentException( Review comment: `ParseException` needs some `errorOffset` obligatorily (which is dubiously filled here with the current line number), and it's not available in this method, and not all callers have anything meaningful to pass there. For consistency, we could replace `ParseException` with something less choosy :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'
murblanc commented on a change in pull request #2199: URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r563645684 ## File path: solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java ## @@ -238,11 +258,93 @@ public PlacementPlan computePlacement(Cluster cluster, PlacementRequest request, // failure. Current code does fail if placement is impossible (constraint is at most one replica of a shard on any node). for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) { makePlacementDecisions(solrCollection, shardName, availabilityZones, replicaType, request.getCountReplicasToCreate(replicaType), - attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, placementPlanFactory, replicaPlacements); + attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, placementContext.getPlacementPlanFactory(), replicaPlacements); } } - return placementPlanFactory.createPlacementPlan(request, replicaPlacements); + return placementContext.getPlacementPlanFactory().createPlacementPlan(request, replicaPlacements); +} + +@Override +public void verifyAllowedModification(ModificationRequest modificationRequest, PlacementContext placementContext) throws PlacementModificationException, InterruptedException { + if (modificationRequest instanceof DeleteShardsRequest) { +throw new UnsupportedOperationException("not implemented yet"); + } else if (modificationRequest instanceof DeleteCollectionRequest) { +verifyDeleteCollection((DeleteCollectionRequest) modificationRequest, placementContext); + } else if (modificationRequest instanceof DeleteReplicasRequest) { +verifyDeleteReplicas((DeleteReplicasRequest) modificationRequest, placementContext); + } else { +throw new UnsupportedOperationException("unsupported request type " + modificationRequest.getClass().getName()); + } +} + +private void verifyDeleteCollection(DeleteCollectionRequest deleteCollectionRequest, PlacementContext placementContext) throws PlacementModificationException, InterruptedException { Review comment: Can we have cycles in the `withCollection` graph? Should we allow a way to override the vetting checks from the Collection API? ## File path: solr/core/src/java/org/apache/solr/cluster/placement/DeleteShardsRequest.java ## @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.cluster.placement; + +import java.util.Set; + +/** + * Delete shards request. + */ +public interface DeleteShardsRequest extends ModificationRequest { Review comment: If we don't use this interface (i.e. the class that implements it) I suggest we do not include either in this PR. Or at least define and call the corresponding method in `AssignStrategy` from the appropriate `*Cmd` even if nothing does a real implementation and vetting based on it (but it would be ready to be consumed maybe by another plugin written by some user). ## File path: solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java ## @@ -238,11 +258,93 @@ public PlacementPlan computePlacement(Cluster cluster, PlacementRequest request, // failure. Current code does fail if placement is impossible (constraint is at most one replica of a shard on any node). for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) { makePlacementDecisions(solrCollection, shardName, availabilityZones, replicaType, request.getCountReplicasToCreate(replicaType), - attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, placementPlanFactory, replicaPlacements); + attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, placementContext.getPlacementPlanFactory(), replicaPlacements); } } - return placementPlanFactory.createPlacementPlan(request, replicaPlacements); + return placementContext.getPlacementPlanFactory().createPlacementPlan(request, replicaPlacements); +} + +@Override
[GitHub] [lucene-solr-operator] vladiceanu commented on pull request #200: Apachify the solr-operator helm chart
vladiceanu commented on pull request #200: URL: https://github.com/apache/lucene-solr-operator/pull/200#issuecomment-766739705 Not sure if it's the right place to mention, but https://artifacthub.io/packages/helm/solr-operator/solr-operator also needs to be updated to point to the new chart location This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] NazerkeBS commented on a change in pull request #2230: SOLR-15011: /admin/logging handler is configured logs to all nodes
NazerkeBS commented on a change in pull request #2230: URL: https://github.com/apache/lucene-solr/pull/2230#discussion_r563636966 ## File path: solr/core/src/java/org/apache/solr/handler/admin/LoggingHandler.java ## @@ -151,6 +156,9 @@ public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throw rsp.add("loggers", info); } rsp.setHttpCaching(false); +if (cc != null && AdminHandlersProxy.maybeProxyToNodes(req, rsp, cc)) { Review comment: SystemInfoHandler is doing similar to this logic; This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9696) RegExp with group references
Gus Heck created LUCENE-9696: Summary: RegExp with group references Key: LUCENE-9696 URL: https://issues.apache.org/jira/browse/LUCENE-9696 Project: Lucene - Core Issue Type: Wish Reporter: Gus Heck PatternTypingFilter presently relies on java util regexes, but LUCENE-7465 found performance benefits using our own RegExp class instead. Unfortunately RegExp does not currently report matching subgroups which is key to PatternTypingFilter's use (and probably useful in other endeavors as well). What's needed is reporting of sub-groups such that new RegExp("(foo(.+)")) -->> converted to run atomaton etc --> match found for "foobar" --> somehow reports getGroup(1) as "bar" And getGroup() can be called on some object reasonably accessible to the code using RegExp in the first place. Clearly there's a lot to be worked out there since the normal usage pattern converts things to a DFA / run Automaton etc, and subgroups are not a natural concept for those classes. But if this could be achieved without loosing the performance benefits, that would be interesting :). Opening this Wish ticket as encouraged by [~mikemccand] in LUCENE-9575. I won't be able to work on it any time soon to encourage anyone else interested to pick it up or to drop links or ideas in here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter
[ https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271218#comment-17271218 ] Jim Ferenczi commented on LUCENE-9575: -- Thanks [~gus] and sorry for the race condition ;). > Add PatternTypingFilter > --- > > Key: LUCENE-9575 > URL: https://issues.apache.org/jira/browse/LUCENE-9575 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > One of the key asks when the Library of Congress was asking me to develop the > Advanced Query Parser was to be able to recognize arbitrary patterns that > included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they > wanted 401k and 401(k) to match documents with either style reference, and > NOT match documents that happen to have isolated 401 or k tokens (i.e. not > documents about the http status code) And of course we wanted to give up as > little of the text analysis features they were already using. > This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and > one solr specific filter in SOLR-14597 that re-analyzes tokens with an > arbitrary analyzer defined for a type in the solr schema, combine to achieve > this. > This filter has the job of spotting the patterns, and adding the intended > synonym as at type to the token (from which minimal punctuation has been > removed). It also sets flags on the token which are retained through the > analysis chain, and at the very end the type is converted to a synonym and > the original token(s) for that type are dropped avoiding the match on 401 > (for example) > The pattern matching is specified in a file that looks like: > {code} > 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2 > 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3 > 2 C\+\+ ::: c_plus_plus > {code} > That file would match match legal reference patterns such as 401(k), 401k, > 501(c)3 and C++ The format is: > ::: > and groups in the pattern are substituted into the replacement so the first > line above would create synonyms such as: > {code} > 401k --> legal2_401_k > 401(k) --> legal2_401_k > 503(c) --> legal2_503_c > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org