[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-25 Thread GitBox


jaisonbi commented on a change in pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r564272513



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
##
@@ -791,6 +806,107 @@ private void addTermsDict(SortedSetDocValues values) 
throws IOException {
 writeTermsIndex(values);
   }
 
+  private void addCompressedTermsDict(SortedSetDocValues values) throws 
IOException {
+final long size = values.getValueCount();
+meta.writeVLong(size);
+meta.writeInt(Lucene80DocValuesFormat.TERMS_DICT_BLOCK_LZ4_CODE);
+
+ByteBuffersDataOutput addressBuffer = new ByteBuffersDataOutput();
+ByteBuffersIndexOutput addressOutput =
+new ByteBuffersIndexOutput(addressBuffer, "temp", "temp");
+meta.writeInt(DIRECT_MONOTONIC_BLOCK_SHIFT);
+long numBlocks =
+(size + Lucene80DocValuesFormat.TERMS_DICT_BLOCK_LZ4_MASK)
+>>> Lucene80DocValuesFormat.TERMS_DICT_BLOCK_LZ4_SHIFT;
+DirectMonotonicWriter writer =
+DirectMonotonicWriter.getInstance(
+meta, addressOutput, numBlocks, DIRECT_MONOTONIC_BLOCK_SHIFT);
+
+LZ4.FastCompressionHashTable ht = new LZ4.FastCompressionHashTable();
+ByteArrayDataOutput bufferedOutput = new 
ByteArrayDataOutput(termsDictBuffer);
+long ord = 0;
+long start = data.getFilePointer();
+int maxLength = 0;
+TermsEnum iterator = values.termsEnum();
+int maxBlockLength = 0;
+BytesRefBuilder previous = new BytesRefBuilder();
+for (BytesRef term = iterator.next(); term != null; term = 
iterator.next()) {
+  int termLength = term.length;
+  if ((ord & Lucene80DocValuesFormat.TERMS_DICT_BLOCK_LZ4_MASK) == 0) {
+if (bufferedOutput.getPosition() > 0) {
+  int uncompressedLength = bufferedOutput.getPosition();
+  data.writeVInt(uncompressedLength);
+  maxBlockLength = Math.max(maxBlockLength, uncompressedLength);
+  long before = data.getFilePointer();
+  // Compress block
+  LZ4.compress(termsDictBuffer, 0, uncompressedLength, data, ht);
+  int compressedLength = (int) (data.getFilePointer() - before);
+  // Corner case: Compressed length might be bigger than un-compressed 
length.
+  maxBlockLength = Math.max(maxBlockLength, compressedLength);
+  bufferedOutput.reset(termsDictBuffer);
+}
+
+writer.add(data.getFilePointer() - start);
+data.writeVInt(termLength);
+data.writeBytes(term.bytes, term.offset, termLength);
+  } else {
+final int prefixLength = StringHelper.bytesDifference(previous.get(), 
term);
+final int suffixLength = term.length - prefixLength;
+assert suffixLength > 0; // terms are unique
+int reservedSize = suffixLength + 11; // 1 byte + 2 vint.
+bufferedOutput = maybeGrowBuffer(bufferedOutput, reservedSize);
+bufferedOutput.writeByte(
+(byte) (Math.min(prefixLength, 15) | (Math.min(15, suffixLength - 
1) << 4)));
+
+if (prefixLength >= 15) {
+  bufferedOutput.writeVInt(prefixLength - 15);
+}
+if (suffixLength >= 16) {
+  bufferedOutput.writeVInt(suffixLength - 16);
+}
+bufferedOutput.writeBytes(term.bytes, term.offset + prefixLength, 
suffixLength);
+  }
+  maxLength = Math.max(maxLength, termLength);
+  previous.copyBytes(term);
+  ++ord;
+}
+
+// Compress and write out the last block
+if (bufferedOutput.getPosition() > 0) {
+  int uncompressedLength = bufferedOutput.getPosition();
+  data.writeVInt(uncompressedLength);
+  maxBlockLength = Math.max(maxBlockLength, uncompressedLength);
+  long before = data.getFilePointer();
+  LZ4.compress(termsDictBuffer, 0, uncompressedLength, data, ht);
+  int compressedLength = (int) (data.getFilePointer() - before);
+  maxBlockLength = Math.max(maxBlockLength, compressedLength);
+}
+
+writer.finish();
+meta.writeInt(maxLength);
+// Write one more int for storing max block length. For compressed terms 
dict only.
+meta.writeInt(maxBlockLength);
+meta.writeLong(start);
+meta.writeLong(data.getFilePointer() - start);
+start = data.getFilePointer();
+addressBuffer.copyTo(data);
+meta.writeLong(start);
+meta.writeLong(data.getFilePointer() - start);
+
+// Now write the reverse terms index
+writeTermsIndex(values);
+  }
+
+  private ByteArrayDataOutput maybeGrowBuffer(ByteArrayDataOutput 
bufferedOutput, int termLength) {
+int pos = bufferedOutput.getPosition(), originalLength = 
termsDictBuffer.length;
+if (pos + termLength >= originalLength - 1) {
+  int newLength = (originalLength + termLength) << 1;

Review comment:
   Makes sense. Call ArrayUtil.grow is enough.





This is an automated message from the Apache Git Service.
To 

[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-25 Thread GitBox


jaisonbi commented on a change in pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r564271912



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException {
   }
 
   private static class TermsDict extends BaseTermsEnum {
+static final int PADDING_LENGTH = 7;

Review comment:
   ok..will rename and add this comment:)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9406) Make it simpler to track IndexWriter's events

2021-01-25 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271878#comment-17271878
 ] 

Zach Chen commented on LUCENE-9406:
---

Makes sense! For the initial interface proposal, I’m thinking something along 
the same line with what you had in the previous PR for event metrics collection:
{code:java}
interface EventMetrics {
    Map providesMetrics();
}

interface IndexWriterEvent extends EventMetrics {
    public void beginPointInTimeMerge(MergeTrigger);
    public void completePointInTimeMerge(MergeTrigger);
    ...
}
{code}
The implementation for IndexWriterEvent can be set into IndexWriterConfig / 
LiveIndexWriterConfig, and used in IndexWriter’s key event points just like in 
previous PR. 

For event metrics consumption, I’m considering something similar to 
Dropwizard’s metrics reporter:
{code:java}
interface EventMetricsReporter {
   public void report(EventMetrics);  // calls EventMetrics.provideMetrics() to 
get data
}
{code}
such that application can provide custom implementation for data consumption: 
{code:java}
class FileBasedEventReporter extends EventMetricsReporter {}
class NetworkBasedEventReporter extends EventMetricsReporter {}
...{code}
 

What do you think ?

> Make it simpler to track IndexWriter's events
> -
>
> Key: LUCENE-9406
> URL: https://issues.apache.org/jira/browse/LUCENE-9406
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
>
> This is the second spinoff from a [controversial PR to add a new index-time 
> feature to Lucene to merge small segments during 
> commit|https://github.com/apache/lucene-solr/pull/1552].  That change can 
> substantially reduce the number of small index segments to search.
> In that PR, there was a new proposed interface, {{IndexWriterEvents}}, giving 
> the application a chance to track when {{IndexWriter}} kicked off merges 
> during commit, how many, how long it waited, how often it gave up waiting, 
> etc.
> Such telemetry from production usage is really helpful when tuning settings 
> like which merges (e.g. a size threshold) to attempt on commit, and how long 
> to wait during commit, etc.
> I am splitting out this issue to explore possible approaches to do this.  
> E.g. [~simonw] proposed using a statistics class instead, but if I understood 
> that correctly, I think that would put the role of aggregation inside 
> {{IndexWriter}}, which is not ideal.
> Many interesting events, e.g. how many merges are being requested, how large 
> are they, how long did they take to complete or fail, etc., can be gleaned by 
> wrapping expert Lucene classes like {{MergePolicy}} and {{MergeScheduler}}.  
> But for those events that cannot (e.g. {{IndexWriter}} stopped waiting for 
> merges during commit), it would be very helpful to have some simple way to 
> track so applications can better tune.
> It is also possible to subclass {{IndexWriter}} and override key methods, but 
> I think that is inherently risky as {{IndexWriter}}'s protected methods are 
> not considered to be a stable API, and the synchronization used by 
> {{IndexWriter}} is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9476) Add a bulk ordinal->FacetLabel API

2021-01-25 Thread Gautam Worah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271862#comment-17271862
 ] 

Gautam Worah commented on LUCENE-9476:
--

Submitted a WIP PR [here|https://github.com/apache/lucene-solr/pull/2246]

> Add a bulk ordinal->FacetLabel API
> --
>
> Key: LUCENE-9476
> URL: https://issues.apache.org/jira/browse/LUCENE-9476
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6.1
>Reporter: Gautam Worah
>Priority: Minor
>  Labels: performance
>
> This issue is a spillover from the 
> [PR|https://github.com/apache/lucene-solr/pull/1733/files] for LUCENE 9450
> The idea here is to share a single {{BinaryDocValues}} instance per leaf per 
> query instead of creating a new one each time in the 
> {{DirectoryTaxonomyReader}}.
> Suggested by [~mikemccand]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gautamworah96 opened a new pull request #2247: WIP: LUCENE-9476 Add basic functionality, basic tests

2021-01-25 Thread GitBox


gautamworah96 opened a new pull request #2247:
URL: https://github.com/apache/lucene-solr/pull/2247


   
   
   
   # Description
   
   In [LUCENE-9450](https://issues.apache.org/jira/browse/LUCENE-9450) we 
switched the Taxonomy index from Stored Fields to `BinaryDocValues.` In the 
resulting implementation of the `getPath` code, we create a new 
`BinaryDocValues`'s values instance for each ordinal. 
   It may happen that we may traverse over the same nodes over and over again 
if the `getPath` API is called multiple times for ordinals in the same 
segment/with the same `readerIndex`.
   
   This PR takes advantage of that fact by sorting ordinals and then trying to 
find out if some of the ordinals are present in the same segment/have the same 
`readerIndex` (by trying to `advanceExact` to the correct position and not 
failing) thereby allowing us to reuse the previous `BinaryDocValues` object. 
   
   
   # Solution
   
   Steps:
   1. Sort all ordinals and remember their position so as to store the path in 
the correct position
   2. Try to `advanceExact` to the correct position with the previously 
calculated `readerIndex`. If the operation fails, try to find the correct 
segment for the ordinal and then `advanceExact` to the desired position.
   3. Store this position for future ordinals.
   
   # Tests
   
   Added a new test for the API that compares the individual `getPath` results 
from ordinals with the bulk FacetLabels returned by the `getBulkPath` API
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9476) Add a bulk ordinal->FacetLabel API

2021-01-25 Thread Gautam Worah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271862#comment-17271862
 ] 

Gautam Worah edited comment on LUCENE-9476 at 1/26/21, 4:36 AM:


Submitted a WIP PR [here|https://github.com/apache/lucene-solr/pull/2247]


was (Author: gworah):
Submitted a WIP PR [here|https://github.com/apache/lucene-solr/pull/2246]

> Add a bulk ordinal->FacetLabel API
> --
>
> Key: LUCENE-9476
> URL: https://issues.apache.org/jira/browse/LUCENE-9476
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.6.1
>Reporter: Gautam Worah
>Priority: Minor
>  Labels: performance
>
> This issue is a spillover from the 
> [PR|https://github.com/apache/lucene-solr/pull/1733/files] for LUCENE 9450
> The idea here is to share a single {{BinaryDocValues}} instance per leaf per 
> query instead of creating a new one each time in the 
> {{DirectoryTaxonomyReader}}.
> Suggested by [~mikemccand]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #2235: LUCENE-9690: Hunspell: support special title-case for words with apostrophe

2021-01-25 Thread GitBox


dweiss merged pull request #2235:
URL: https://github.com/apache/lucene-solr/pull/2235


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #2236: LUCENE-9691: Hunspell: support trailing comments on aff option lines

2021-01-25 Thread GitBox


dweiss merged pull request #2236:
URL: https://github.com/apache/lucene-solr/pull/2236


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] HoustonPutman commented on a change in pull request #193: Support custom log4j2 config from user-provided ConfigMap

2021-01-25 Thread GitBox


HoustonPutman commented on a change in pull request #193:
URL: 
https://github.com/apache/lucene-solr-operator/pull/193#discussion_r564044525



##
File path: controllers/solrcloud_controller.go
##
@@ -182,44 +182,61 @@ func (r *SolrCloudReconciler) Reconcile(req ctrl.Request) 
(ctrl.Result, error) {
}
}
 
-   // Generate ConfigMap unless the user supplied a custom ConfigMap for 
solr.xml ... but the provided ConfigMap
-   // might be for the Prometheus exporter, so we only care if they 
provide a solr.xml in the CM
-   solrXmlConfigMapName := instance.ConfigMapName()
-   solrXmlMd5 := ""
+   // Generate ConfigMap unless the user supplied a custom ConfigMap for 
solr.xml
+   configMapInfo := make(map[string]string)
if instance.Spec.CustomSolrKubeOptions.ConfigMapOptions != nil && 
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap != "" {
+   providedConfigMapName := 
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap
foundConfigMap := {}
-   nn := types.NamespacedName{Name: 
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap, 
Namespace: instance.Namespace}
+   nn := types.NamespacedName{Name: providedConfigMapName, 
Namespace: instance.Namespace}
err = r.Get(context.TODO(), nn, foundConfigMap)
if err != nil {
return requeueOrNot, err // if they passed a 
providedConfigMap name, then it must exist
}
 
-   // ConfigMap doesn't have to have a solr.xml, but if it does, 
then it needs to be valid!
if foundConfigMap.Data != nil {
-   solrXml, ok := foundConfigMap.Data["solr.xml"]
-   if ok {
+   logXml, hasLogXml := 
foundConfigMap.Data[util.LogXmlFile]
+   solrXml, hasSolrXml := 
foundConfigMap.Data[util.SolrXmlFile]
+
+   // if there's a user-provided config, it must have one 
of the expected keys
+   if !hasLogXml && !hasSolrXml {
+   return requeueOrNot, fmt.Errorf("User provided 
ConfigMap %s must have one of 'solr.xml' and/or 'log4j2.xml'",
+   providedConfigMapName)
+   }
+
+   if hasSolrXml {
+   // make sure the user-provided solr.xml is valid
if !strings.Contains(solrXml, "${hostPort:") {
return requeueOrNot,
fmt.Errorf("Custom solr.xml in 
ConfigMap %s must contain a placeholder for the 'hostPort' variable, such as 
${hostPort:80}",
-   
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap)
+   providedConfigMapName)
}
// stored in the pod spec annotations on the 
statefulset so that we get a restart when solr.xml changes
-   solrXmlMd5 = fmt.Sprintf("%x", 
md5.Sum([]byte(solrXml)))
-   solrXmlConfigMapName = foundConfigMap.Name
-   } else {
-   return requeueOrNot, fmt.Errorf("Required 
'solr.xml' key not found in provided ConfigMap %s",
-   
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap)
+   configMapInfo[util.SolrXmlMd5Annotation] = 
fmt.Sprintf("%x", md5.Sum([]byte(solrXml)))
+   configMapInfo[util.SolrXmlFile] = 
foundConfigMap.Name
+   }
+
+   if hasLogXml {
+   if !strings.Contains(logXml, 
"monitorInterval=") {
+   // stored in the pod spec annotations 
on the statefulset so that we get a restart when the log config changes
+   configMapInfo[util.LogXmlMd5Annotation] 
= fmt.Sprintf("%x", md5.Sum([]byte(logXml)))
+   } // else log4j will automatically refresh for 
us, so no restart needed
+   configMapInfo[util.LogXmlFile] = 
foundConfigMap.Name
}
+
} else {
-   return requeueOrNot, fmt.Errorf("Provided ConfigMap %s 
has no data",
-   
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap)
+   return requeueOrNot, fmt.Errorf("Provided ConfigMap %s 
has no data", providedConfigMapName)
}
-   } else {
+   }
+
+   if configMapInfo[util.SolrXmlFile] == "" {

Review comment:
   So if a user passes a custom 

[GitHub] [lucene-solr-operator] HoustonPutman merged pull request #200: Apachify the solr-operator helm chart

2021-01-25 Thread GitBox


HoustonPutman merged pull request #200:
URL: https://github.com/apache/lucene-solr-operator/pull/200


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2245: Move old field infos format to backwards-codecs.

2021-01-25 Thread GitBox


jtibshirani commented on a change in pull request #2245:
URL: https://github.com/apache/lucene-solr/pull/2245#discussion_r564138601



##
File path: 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene60/Lucene60FieldInfosFormat.java
##
@@ -311,6 +312,11 @@ private static IndexOptions getIndexOptions(IndexInput 
input, byte b) throws IOE
 }
   }
 
+  /**
+   * Note: although this format is only used on older versions, we need to 
keep the write logic

Review comment:
   I hope this assumption is accurate, would appreciate someone 
double-checking it.

##
File path: 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene60/Lucene60FieldInfosFormat.java
##
@@ -311,6 +312,11 @@ private static IndexOptions getIndexOptions(IndexInput 
input, byte b) throws IOE
 }
   }
 
+  /**
+   * Note: although this format is only used on older versions, we need to 
keep the write logic

Review comment:
   I hope this is accurate, would appreciate someone double-checking it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-25 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r563645684



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java
##
@@ -238,11 +258,93 @@ public PlacementPlan computePlacement(Cluster cluster, 
PlacementRequest request,
 // failure. Current code does fail if placement is impossible 
(constraint is at most one replica of a shard on any node).
 for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) {
   makePlacementDecisions(solrCollection, shardName, availabilityZones, 
replicaType, request.getCountReplicasToCreate(replicaType),
-  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementPlanFactory, replicaPlacements);
+  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementContext.getPlacementPlanFactory(), replicaPlacements);
 }
   }
 
-  return placementPlanFactory.createPlacementPlan(request, 
replicaPlacements);
+  return 
placementContext.getPlacementPlanFactory().createPlacementPlan(request, 
replicaPlacements);
+}
+
+@Override
+public void verifyAllowedModification(ModificationRequest 
modificationRequest, PlacementContext placementContext) throws 
PlacementModificationException, InterruptedException {
+  if (modificationRequest instanceof DeleteShardsRequest) {
+throw new UnsupportedOperationException("not implemented yet");
+  } else if (modificationRequest instanceof DeleteCollectionRequest) {
+verifyDeleteCollection((DeleteCollectionRequest) modificationRequest, 
placementContext);
+  } else if (modificationRequest instanceof DeleteReplicasRequest) {
+verifyDeleteReplicas((DeleteReplicasRequest) modificationRequest, 
placementContext);
+  } else {
+throw new UnsupportedOperationException("unsupported request type " + 
modificationRequest.getClass().getName());
+  }
+}
+
+private void verifyDeleteCollection(DeleteCollectionRequest 
deleteCollectionRequest, PlacementContext placementContext) throws 
PlacementModificationException, InterruptedException {

Review comment:
   Can we have cycles in the `withCollection` graph? Should we allow a way 
to override the vetting checks from the Collection API?

##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/DeleteShardsRequest.java
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster.placement;
+
+import java.util.Set;
+
+/**
+ * Delete shards request.
+ */
+public interface DeleteShardsRequest extends ModificationRequest {

Review comment:
   If we don't use this interface (i.e. the class that implements it) I 
suggest we do not include either in this PR. Or at least define and call the 
corresponding method in `AssignStrategy` from the appropriate `*Cmd` even if 
nothing does a real implementation and vetting based on it (but it would be 
ready to be consumed maybe by another plugin written by some user).

##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java
##
@@ -238,11 +258,93 @@ public PlacementPlan computePlacement(Cluster cluster, 
PlacementRequest request,
 // failure. Current code does fail if placement is impossible 
(constraint is at most one replica of a shard on any node).
 for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) {
   makePlacementDecisions(solrCollection, shardName, availabilityZones, 
replicaType, request.getCountReplicasToCreate(replicaType),
-  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementPlanFactory, replicaPlacements);
+  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementContext.getPlacementPlanFactory(), replicaPlacements);
 }
   }
 
-  return placementPlanFactory.createPlacementPlan(request, 
replicaPlacements);
+  return 
placementContext.getPlacementPlanFactory().createPlacementPlan(request, 
replicaPlacements);
+}
+
+@Override

[GitHub] [lucene-solr-operator] thelabdude merged pull request #195: Improve Prom exporter docs

2021-01-25 Thread GitBox


thelabdude merged pull request #195:
URL: https://github.com/apache/lucene-solr-operator/pull/195


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn commented on pull request #2205: LUCENE-9668: Deprecate MinShouldMatchSumScorer with WANDScorer

2021-01-25 Thread GitBox


zacharymorn commented on pull request #2205:
URL: https://github.com/apache/lucene-solr/pull/2205#issuecomment-766615404


   > Thank you @zacharymorn !
   
   No problem! Thanks Adrien for the review and guidance!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2091: Jira/solr 14778

2021-01-25 Thread GitBox


muse-dev[bot] commented on a change in pull request #2091:
URL: https://github.com/apache/lucene-solr/pull/2091#discussion_r564197768



##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java
##
@@ -94,7 +98,8 @@ public void testFloatEncoding() throws Exception {

Review comment:
   *NULLPTR_DEREFERENCE:*  call to 
`TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses 
memory that is the null pointer on line 84.

##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java
##
@@ -120,26 +125,37 @@ void assertTermEquals(String expected, TokenStream 
stream, byte[] expectPay) thr

Review comment:
   *NULLPTR_DEREFERENCE:*  call to 
`TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses 
memory that is the null pointer on line 106.

##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java
##
@@ -51,9 +53,11 @@ public void testPayloads() throws Exception {
   public void testNext() throws Exception {
 
 String test = "The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ 
brown|JJ dogs|NN";
-DelimitedPayloadTokenFilter filter = new DelimitedPayloadTokenFilter
-  (whitespaceMockTokenizer(test), 
-   DelimitedPayloadTokenFilter.DEFAULT_DELIMITER, new IdentityEncoder());
+DelimitedPayloadTokenFilter filter =
+new DelimitedPayloadTokenFilter(
+whitespaceMockTokenizer(test),
+DelimitedPayloadTokenFilter.DEFAULT_DELIMITER,
+new IdentityEncoder());
 filter.reset();
 assertTermEquals("The", filter, null);

Review comment:
   *NULLPTR_DEREFERENCE:*  call to 
`TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses 
memory that is the null pointer on line 62.

##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java
##
@@ -51,9 +53,11 @@ public void testPayloads() throws Exception {

Review comment:
   *NULLPTR_DEREFERENCE:*  call to 
`TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses 
memory that is the null pointer on line 38.

##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/shingle/TestShingleAnalyzerWrapper.java
##
@@ -0,0 +1,505 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.shingle;
+
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.BaseTokenStreamTestCase;
+import org.apache.lucene.analysis.CharArraySet;
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.analysis.MockTokenizer;
+import org.apache.lucene.analysis.StopFilter;
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.Tokenizer;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.document.TextField;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.BooleanClause;
+import org.apache.lucene.search.BooleanQuery;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.PhraseQuery;
+import org.apache.lucene.search.ScoreDoc;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.store.Directory;
+
+/** A test class for ShingleAnalyzerWrapper as regards queries and scoring. */
+public class TestShingleAnalyzerWrapper extends BaseTokenStreamTestCase {
+  private Analyzer analyzer;
+  private IndexSearcher searcher;
+  private IndexReader reader;
+  private Directory directory;
+
+  /**
+   * Set up a new index in RAM with three test phrases and the supplied 

[GitHub] [lucene-solr-operator] HoustonPutman commented on pull request #200: Apachify the solr-operator helm chart

2021-01-25 Thread GitBox


HoustonPutman commented on pull request #200:
URL: 
https://github.com/apache/lucene-solr-operator/pull/200#issuecomment-767049150


   I'll work on that as well @vladiceanu , but it is a separate issue. Thanks 
for bringing it up!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf closed pull request #2240: LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains

2021-01-25 Thread GitBox


gus-asf closed pull request #2240:
URL: https://github.com/apache/lucene-solr/pull/2240


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-25 Thread GitBox


bruno-roustant commented on a change in pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563785114



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException {
   }
 
   private static class TermsDict extends BaseTermsEnum {
+static final int PADDING_LENGTH = 7;

Review comment:
   Ok, in this case can we rename it LZ4_DECOMPRESSOR_PADDING and add this 
comment about the decompression speed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] NazerkeBS commented on a change in pull request #2230: SOLR-15011: /admin/logging handler is configured logs to all nodes

2021-01-25 Thread GitBox


NazerkeBS commented on a change in pull request #2230:
URL: https://github.com/apache/lucene-solr/pull/2230#discussion_r563636966



##
File path: solr/core/src/java/org/apache/solr/handler/admin/LoggingHandler.java
##
@@ -151,6 +156,9 @@ public void handleRequestBody(SolrQueryRequest req, 
SolrQueryResponse rsp) throw
   rsp.add("loggers", info);
 }
 rsp.setHttpCaching(false);
+if (cc != null && AdminHandlersProxy.maybeProxyToNodes(req, rsp, cc)) {

Review comment:
   SystemInfoHandler is doing similar to this logic; 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #2205: LUCENE-9668: Deprecate MinShouldMatchSumScorer with WANDScorer

2021-01-25 Thread GitBox


jpountz commented on pull request #2205:
URL: https://github.com/apache/lucene-solr/pull/2205#issuecomment-766588060


   Thank you @zacharymorn !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf merged pull request #2241: @gus-asf LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains

2021-01-25 Thread GitBox


gus-asf merged pull request #2241:
URL: https://github.com/apache/lucene-solr/pull/2241


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] vladiceanu commented on pull request #200: Apachify the solr-operator helm chart

2021-01-25 Thread GitBox


vladiceanu commented on pull request #200:
URL: 
https://github.com/apache/lucene-solr-operator/pull/200#issuecomment-766739705


   Not sure if it's the right place to mention, but 
https://artifacthub.io/packages/helm/solr-operator/solr-operator also needs to 
be updated to point to the new chart location 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf commented on pull request #2240: LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains

2021-01-25 Thread GitBox


gus-asf commented on pull request #2240:
URL: https://github.com/apache/lucene-solr/pull/2240#issuecomment-766687637


   need to remake to account for quickfix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #2205: LUCENE-9668: Deprecate MinShouldMatchSumScorer with WANDScorer

2021-01-25 Thread GitBox


jpountz merged pull request #2205:
URL: https://github.com/apache/lucene-solr/pull/2205


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2235: LUCENE-9690: Hunspell: support special title-case for words with apostrophe

2021-01-25 Thread GitBox


dweiss commented on a change in pull request #2235:
URL: https://github.com/apache/lucene-solr/pull/2235#discussion_r563522049



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##
@@ -138,6 +142,23 @@ WordCase caseOf(char[] word, int length) {
 return lowerBuffer;
   }
 
+  // Special prefix handling for Catalan, French, Italian:

Review comment:
   seems like this can be made static?

##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##
@@ -138,6 +142,23 @@ WordCase caseOf(char[] word, int length) {
 return lowerBuffer;
   }
 
+  // Special prefix handling for Catalan, French, Italian:

Review comment:
   I'll merge it in, you can piggyback static method on a different PR if 
you wish.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2230: SOLR-15011: /admin/logging handler is configured logs to all nodes

2021-01-25 Thread GitBox


dsmiley commented on a change in pull request #2230:
URL: https://github.com/apache/lucene-solr/pull/2230#discussion_r563971668



##
File path: solr/core/src/java/org/apache/solr/handler/admin/LoggingHandler.java
##
@@ -151,6 +156,9 @@ public void handleRequestBody(SolrQueryRequest req, 
SolrQueryResponse rsp) throw
   rsp.add("loggers", info);
 }
 rsp.setHttpCaching(false);
+if (cc != null && AdminHandlersProxy.maybeProxyToNodes(req, rsp, cc)) {

Review comment:
   SIH is doing what I suggest, which is different than what the PR is 
doing.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

2021-01-25 Thread GitBox


dweiss commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563528897



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
 if (replacement.isEmpty()) {
   continue;
 }
-flags[upto++] = (char) Integer.parseInt(replacement);
+int flag = Integer.parseInt(replacement);
+if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags 
as well
+  throw new IllegalArgumentException(

Review comment:
   It'd be great to be consistent with exceptions when parsing input - 
sometimes it's ParsingException, here it's IAE. 

##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
 @Override
 void appendFlag(char flag, StringBuilder to) {

Review comment:
   Is this intentional? Because it changes the logic of concatenation 
(always leaving the trailing comma). I liked the previous version better 
(always leaving the output neat).

##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##
@@ -588,7 +577,7 @@ private boolean checkCondition(
   }
 
   private boolean isFlagAppendedByAffix(int affixId, char flag) {
-if (affixId < 0) return false;
+if (affixId < 0 || flag == 0) return false;

Review comment:
   Wouldn't it be cleaner to add a constant alias (static variable) 
FLAG_UNSET for 0 and replace it throughout the code where it compares to zero? 
You've changed it from -1 to 0 but it really doesn't make it any clearer that 
it's a "default" unset state. I think it would benefit from being more verbose 
here.

##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
 if (replacement.isEmpty()) {
   continue;
 }
-flags[upto++] = (char) Integer.parseInt(replacement);
+int flag = Integer.parseInt(replacement);
+if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags 
as well
+  throw new IllegalArgumentException(

Review comment:
   Eh, I was afraid of that. It'd be good to consolidate it at some point.

##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
 @Override
 void appendFlag(char flag, StringBuilder to) {

Review comment:
   Oh, so something else (other than this method) appends to that 
stringbuilder? Maybe those places should be fixed instead? I don't have all of 
the code in front of me, so the question may be naive.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on pull request #2245: Move old field infos format to backwards-codecs.

2021-01-25 Thread GitBox


jtibshirani commented on pull request #2245:
URL: https://github.com/apache/lucene-solr/pull/2245#issuecomment-767205967


   Thanks @iverase for pointing this out.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

2021-01-25 Thread GitBox


donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563673451



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
 if (replacement.isEmpty()) {
   continue;
 }
-flags[upto++] = (char) Integer.parseInt(replacement);
+int flag = Integer.parseInt(replacement);
+if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags 
as well
+  throw new IllegalArgumentException(

Review comment:
   `ParseException` needs some `errorOffset` obligatorily (which is 
dubiously filled here with the current line number), and it's not available in 
this method, and not all callers have anything meaningful to pass there. For 
consistency, we could replace `ParseException` with something less choosy :)

##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
 @Override
 void appendFlag(char flag, StringBuilder to) {

Review comment:
   Yes, that's `some tests failed after implementing step 1 and were fixed 
in step 2`. However nice it seemed, it was wrong, because other flags were 
appended after this one without any comma. Trailing commas are no problem, as 
empty flags are skipped in the previous method.

##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##
@@ -588,7 +577,7 @@ private boolean checkCondition(
   }
 
   private boolean isFlagAppendedByAffix(int affixId, char flag) {
-if (affixId < 0) return false;
+if (affixId < 0 || flag == 0) return false;

Review comment:
   A good idea, thanks!

##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
 @Override
 void appendFlag(char flag, StringBuilder to) {

Review comment:
   Yes. Currently it's just one place which definitely appends no flags 
before this one, and may append some flags after this, so the implementation is 
tied to that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] thelabdude commented on a change in pull request #193: Support custom log4j2 config from user-provided ConfigMap

2021-01-25 Thread GitBox


thelabdude commented on a change in pull request #193:
URL: 
https://github.com/apache/lucene-solr-operator/pull/193#discussion_r564072280



##
File path: controllers/util/solr_util.go
##
@@ -327,13 +330,42 @@ func GenerateStatefulSet(solrCloud *solr.SolrCloud, 
solrCloudStatus *solr.SolrCl
envVars = append(envVars, customPodOptions.EnvVariables...)
}
 
+   // Did the user provide a custom log config?
+   if configMapInfo[LogXmlFile] != "" {
+
+   if configMapInfo[LogXmlMd5Annotation] != "" {
+   if podAnnotations == nil {
+   podAnnotations = make(map[string]string, 1)
+   }
+   podAnnotations[LogXmlMd5Annotation] = 
configMapInfo[LogXmlMd5Annotation]
+   }
+
+   // cannot use /var/solr as a mountPath, so mount the custom log 
config in a sub-dir
+   volName := "log4j2-xml"
+   mountPath := fmt.Sprintf("/var/solr/%s-log-config", 
solrCloud.Name)
+   log4jPropsEnvVarPath := fmt.Sprintf("%s/%s", mountPath, 
LogXmlFile)
+
+   solrVolumes = append(solrVolumes, corev1.Volume{

Review comment:
   This is a good catch! K8s allows it and results in a structure like the 
following in the STS:
   ```
 volumes:
 - configMap:
 defaultMode: 420
 items:
 - key: solr.xml
   path: solr.xml
 name: dev-custom-solr-xml
   name: solr-xml
 - configMap:
 defaultMode: 420
 items:
 - key: log4j2.xml
   path: log4j2.xml
 name: dev-custom-solr-xml
   name: log4j2-xml
   ```
   But we certainly should use a single volume with multiple `items` as you 
suggest ;-) Will fix it up and add a test for both keys being provided.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-25 Thread GitBox


jaisonbi commented on a change in pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563698470



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
##
@@ -731,7 +731,22 @@ private void doAddSortedField(FieldInfo field, 
DocValuesProducer valuesProducer)
   meta.writeLong(data.getFilePointer() - start); // ordsLength
 }
 
-addTermsDict(DocValues.singleton(valuesProducer.getSorted(field)));
+int valuesCount = values.getValueCount();
+switch (mode) {

Review comment:
   yes, should use "if" instead of "switch", thanks:)

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -370,6 +378,11 @@ public void close() throws IOException {
 long termsIndexLength;
 long termsIndexAddressesOffset;
 long termsIndexAddressesLength;
+
+boolean compressed;
+// Reserved for support other compressors.
+int compressorCode;

Review comment:
   will remove this..just thought we could support more types of 
compression algorithms here...

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException {
   }
 
   private static class TermsDict extends BaseTermsEnum {
+static final int PADDING_LENGTH = 7;

Review comment:
   Just refer from CompressionMode$LZ4_DECOMPRESSOR...it said add 7 padding 
bytes can help decompression run faster...

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
##
@@ -791,6 +806,107 @@ private void addTermsDict(SortedSetDocValues values) 
throws IOException {
 writeTermsIndex(values);
   }
 
+  private void addCompressedTermsDict(SortedSetDocValues values) throws 
IOException {

Review comment:
   I will try to optimize this method...thanks for the comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #2237: LUCENE-9692: Hunspell: extract Stemmer.stripAffix from similar code in prefix/suffix processing

2021-01-25 Thread GitBox


dweiss merged pull request #2237:
URL: https://github.com/apache/lucene-solr/pull/2237


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #2239: LUCENE-9695: don't merge deleted vectors

2021-01-25 Thread GitBox


msokolov commented on a change in pull request #2239:
URL: https://github.com/apache/lucene-solr/pull/2239#discussion_r563309172



##
File path: lucene/core/src/java/org/apache/lucene/codecs/VectorWriter.java
##
@@ -153,13 +153,12 @@ public int nextDoc() throws IOException {
 private final DocIDMerger docIdMerger;
 private final int[] ordBase;
 private final int cost;
-private final int size;
+private int size;
 
 private int docId;
 private VectorValuesSub current;
-// For each doc with a vector, record its ord in the segments being 
merged. This enables random
-// access into the
-// unmerged segments using the ords from the merged segment.
+/* For each doc with a vector, record its ord in the segments being 
merged. This enables random access into the unmerged segments using the ords 
from the merged segment.
+ */

Review comment:
   Hmmm I thought spotless would wrap this line, but it doesn't seem to 
complain about it

##
File path: lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java
##
@@ -578,6 +572,8 @@ private int createIndex(Path docsPath, Path indexPath) 
throws IOException {
 IndexWriterConfig iwc = new 
IndexWriterConfig().setOpenMode(IndexWriterConfig.OpenMode.CREATE);
 // iwc.setMergePolicy(NoMergePolicy.INSTANCE);
 iwc.setRAMBufferSizeMB(1994d);
+iwc.setMaxBufferedDocs(1);

Review comment:
   Oh I did not mean to include this change here. Probably we will want to 
have some command line parameter to control this, but for now having the 
default be to index a large segment is probably better

##
File path: lucene/core/src/java/org/apache/lucene/codecs/VectorWriter.java
##
@@ -194,6 +197,7 @@ public int nextDoc() throws IOException {
   current = docIdMerger.next();
   if (current == null) {
 docId = NO_MORE_DOCS;
+size = ord;

Review comment:
   Thanks, yes I did.

##
File path: lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java
##
@@ -124,21 +126,84 @@ public void testMerge() throws Exception {
 }
   }
 
-  private void dumpGraph(KnnGraphValues values, int size) throws IOException {
+  /**
+   * Verify that we get the *same* graph by indexing one segment as we do by 
indexing two segments
+   * and merging.
+   */
+  public void testMergeProducesSameGraph() throws Exception {

Review comment:
   thanks, yes this flushed out the two problems I saw, so I'm pretty 
confident they are fixed now, after running this a few 100 times. I had also 
wanted to add a test asserting that KNN search precision remains above some 
threshold, but sadly with random vectors, it would always eventually fail, even 
though mostly it would succeed, so not a very useful unit test and I removed 
it. Probably we can add something to luceneutil

##
File path: lucene/core/src/test/org/apache/lucene/index/TestVectorValues.java
##
@@ -748,29 +751,107 @@ public void testRandom() throws Exception {
 assertEquals(dimension, v.length);
 String idString = 
ctx.reader().document(docId).getField("id").stringValue();
 int id = Integer.parseInt(idString);
-assertArrayEquals(idString, values[id], v, 0);
-++valueCount;
+if (ctx.reader().getLiveDocs() == null || 
ctx.reader().getLiveDocs().get(docId)) {
+  assertArrayEquals(idString, values[id], v, 0);
+  ++valueCount;
+} else {
+  assertNull(values[id]);
+}
   }
 }
 assertEquals(numValues, valueCount);
-assertEquals(numValues, totalSize);
+assertEquals(numValues, totalSize - numDeletes);
+  }
+}
+  }
+
+  /**
+   * Index random vectors, sometimes skipping documents, sometimes deleting a 
document, sometimes
+   * merging, sometimes sorting the index, and verify that the expected values 
can be read back
+   * consistently.
+   */
+  public void testRandom2() throws Exception {

Review comment:
   I forgot why I had done this (there is already a testRandom), so I added 
some comments explaining how it differs.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2091: Jira/solr 14778

2021-01-25 Thread GitBox


muse-dev[bot] commented on a change in pull request #2091:
URL: https://github.com/apache/lucene-solr/pull/2091#discussion_r564197768



##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java
##
@@ -94,7 +98,8 @@ public void testFloatEncoding() throws Exception {

Review comment:
   *NULLPTR_DEREFERENCE:*  call to 
`TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses 
memory that is the null pointer on line 84.

##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java
##
@@ -120,26 +125,37 @@ void assertTermEquals(String expected, TokenStream 
stream, byte[] expectPay) thr

Review comment:
   *NULLPTR_DEREFERENCE:*  call to 
`TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses 
memory that is the null pointer on line 106.

##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java
##
@@ -51,9 +53,11 @@ public void testPayloads() throws Exception {
   public void testNext() throws Exception {
 
 String test = "The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ 
brown|JJ dogs|NN";
-DelimitedPayloadTokenFilter filter = new DelimitedPayloadTokenFilter
-  (whitespaceMockTokenizer(test), 
-   DelimitedPayloadTokenFilter.DEFAULT_DELIMITER, new IdentityEncoder());
+DelimitedPayloadTokenFilter filter =
+new DelimitedPayloadTokenFilter(
+whitespaceMockTokenizer(test),
+DelimitedPayloadTokenFilter.DEFAULT_DELIMITER,
+new IdentityEncoder());
 filter.reset();
 assertTermEquals("The", filter, null);

Review comment:
   *NULLPTR_DEREFERENCE:*  call to 
`TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses 
memory that is the null pointer on line 62.

##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/payloads/TestDelimitedPayloadTokenFilter.java
##
@@ -51,9 +53,11 @@ public void testPayloads() throws Exception {

Review comment:
   *NULLPTR_DEREFERENCE:*  call to 
`TestDelimitedPayloadTokenFilter.assertTermEquals(...)` eventually accesses 
memory that is the null pointer on line 38.

##
File path: 
lucene/analysis/common/src/test/org/apache/lucene/analysis/shingle/TestShingleAnalyzerWrapper.java
##
@@ -0,0 +1,505 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.shingle;
+
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.BaseTokenStreamTestCase;
+import org.apache.lucene.analysis.CharArraySet;
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.analysis.MockTokenizer;
+import org.apache.lucene.analysis.StopFilter;
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.Tokenizer;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.document.TextField;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.BooleanClause;
+import org.apache.lucene.search.BooleanQuery;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.PhraseQuery;
+import org.apache.lucene.search.ScoreDoc;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.store.Directory;
+
+/** A test class for ShingleAnalyzerWrapper as regards queries and scoring. */
+public class TestShingleAnalyzerWrapper extends BaseTokenStreamTestCase {
+  private Analyzer analyzer;
+  private IndexSearcher searcher;
+  private IndexReader reader;
+  private Directory directory;
+
+  /**
+   * Set up a new index in RAM with three test phrases and the supplied 

[jira] [Commented] (LUCENE-9694) New tool for creating a deterministic index

2021-01-25 Thread Haoyu Zhai (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271790#comment-17271790
 ] 

Haoyu Zhai commented on LUCENE-9694:


I've opened a PR for this:

https://github.com/apache/lucene-solr/pull/2246

> New tool for creating a deterministic index
> ---
>
> Key: LUCENE-9694
> URL: https://issues.apache.org/jira/browse/LUCENE-9694
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: general/tools
>Reporter: Haoyu Zhai
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lucene's index is segmented, and sometimes number of segments and documents 
> arrangement greatly impact performance.
> Given a stable index sort, our team create a tool that records document 
> arrangement (called index map) of an index and rearrange another index 
> (consists of same documents) into the same structure (segment num, and 
> documents included in each segment).
> This tool could be also used in lucene benchmarks for a faster deterministic 
> index construction (if I understand correctly lucene benchmark is using a 
> single thread manner to achieve this).
>  
> We've already had some discussion in email
> [https://markmail.org/message/lbtdntclpnocmfuf]
> And I've implemented the first method, using {{IndexWriter.addIndexes}} and a 
> customized {{FilteredCodecReader}} to achieve the goal. The index 
> construction time is about 25min and time executing this tool is about 10min.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zhaih opened a new pull request #2246: LUCENE-9694: New tool for creating a deterministic index

2021-01-25 Thread GitBox


zhaih opened a new pull request #2246:
URL: https://github.com/apache/lucene-solr/pull/2246


   
   
   
   # Description
   
   Create a new tool `IndexRearranger`, which could rearrange a built index 
concurrently to desired segment number and document distribution
   
   # Solution
   
   Essentially combines `IndexWriter.addIndexes` and `FilterCodecReader` to 
select only certain documents into 1 segment
   
   # Tests
   
   Added one unit test testing rearranger.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14330) Return docs with null value in expand for field when collapse has nullPolicy=collapse

2021-01-25 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14330:
--
Attachment: SOLR-14330.patch
  Assignee: Chris M. Hostetter
Status: Open  (was: Open)

Attaching a strawman patch with some tests.

The patch introduces an {{expand.nullGroup=true|false}} option to control 
if/when this behavior happens, and works with both nullPolicy=collapse and 
nullPolicy=expand (although obviously it only creates a single group for null 
docs in the {{expanded}} section).  It should also work fine with 
{{expand.field}} and {{expand.q}} type situations, but there's nocommits to 
actually test that.

wanted to put the patch out before getting too deep into new tests incase 
people have concerns about the semantics/behavior/UX and wanted to discuss if 
it should behave or be implemented differently.

> Return docs with null value in expand for field when collapse has 
> nullPolicy=collapse
> -
>
> Key: SOLR-14330
> URL: https://issues.apache.org/jira/browse/SOLR-14330
> Project: Solr
>  Issue Type: Wish
>Reporter: Munendra S N
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14330.patch
>
>
> When documents doesn't contain value for field then, with collapse either 
> those documents could be either ignored(default), collapsed(one document is 
> chosen) or expanded(all are returned). This is controlled by {{nullPolicy}}
> When {{nullPolicy}} is {{collapse}}, it would be nice to return all documents 
> with {{null}} value in expand block if {{expand=true}}
> Also, when used with {{expand.field}}, even then we should return such 
> documents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-6059) Basic support for Cross-Origin resource sharing (CORS) in search requests

2021-01-25 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271758#comment-17271758
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-6059:
-

In this Jira, the idea was to allow CORS in search requests only via a 
{{SearchComponent}} (the main use was for an autocomplete feature), so it's 
unrelated to the V1/V2 admin APIs. Not sure if this is the right approach 
compared to the {{web.xml}} changes suggested in  SOLR-12292.

> Basic support for Cross-Origin resource sharing (CORS) in search requests
> -
>
> Key: SOLR-6059
> URL: https://issues.apache.org/jira/browse/SOLR-6059
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.9, 6.0
>Reporter: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Attachments: SOLR-6059.patch
>
>
> Support cross-origin requests to specific search request handlers. 
> See http://www.w3.org/TR/cors



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15089) Allow backup/restoration to Amazon's S3 blobstore

2021-01-25 Thread Varun Thacker (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271751#comment-17271751
 ] 

Varun Thacker commented on SOLR-15089:
--

This is exciting! Maybe we could start by using an S3 Mock?

> Allow backup/restoration to Amazon's S3 blobstore 
> --
>
> Key: SOLR-15089
> URL: https://issues.apache.org/jira/browse/SOLR-15089
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jason Gerlowski
>Priority: Major
>
> Solr's BackupRepository interface provides an abstraction around the physical 
> location/format that backups are stored in.  This allows plugin writers to 
> create "repositories" for a variety of storage mediums.  It'd be nice if Solr 
> offered more mediums out of the box though, such as some of the "blobstore" 
> offerings provided by various cloud providers.
> This ticket proposes that a "BackupRepository" implementation for Amazon's 
> popular 'S3' blobstore, so that Solr users can use it for backups without 
> needing to write their own code.
> Amazon offers a s3 Java client with acceptable licensing, and the required 
> code is relatively simple.  The biggest challenge in supporting this will 
> likely be procedural - integration testing requires S3 access and S3 access 
> costs money.  We can check with INFRA to see if there is any way to get cloud 
> credits for an integration test to run in nightly Jenkins runs on the ASF 
> Jenkins server.  Alternatively we can try to stub out the blobstore in some 
> reliable way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on pull request #2245: Move old field infos format to backwards-codecs.

2021-01-25 Thread GitBox


jtibshirani commented on pull request #2245:
URL: https://github.com/apache/lucene-solr/pull/2245#issuecomment-767205967


   Thanks @iverase for pointing this out.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2245: Move old field infos format to backwards-codecs.

2021-01-25 Thread GitBox


jtibshirani commented on a change in pull request #2245:
URL: https://github.com/apache/lucene-solr/pull/2245#discussion_r564138601



##
File path: 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene60/Lucene60FieldInfosFormat.java
##
@@ -311,6 +312,11 @@ private static IndexOptions getIndexOptions(IndexInput 
input, byte b) throws IOE
 }
   }
 
+  /**
+   * Note: although this format is only used on older versions, we need to 
keep the write logic

Review comment:
   I hope this is accurate, would appreciate someone double-checking it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #2245: Move old field infos format to backwards-codecs.

2021-01-25 Thread GitBox


jtibshirani commented on a change in pull request #2245:
URL: https://github.com/apache/lucene-solr/pull/2245#discussion_r564138601



##
File path: 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene60/Lucene60FieldInfosFormat.java
##
@@ -311,6 +312,11 @@ private static IndexOptions getIndexOptions(IndexInput 
input, byte b) throws IOE
 }
   }
 
+  /**
+   * Note: although this format is only used on older versions, we need to 
keep the write logic

Review comment:
   I hope this assumption is accurate, would appreciate someone 
double-checking it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani opened a new pull request #2245: Move old field infos format to backwards-codecs.

2021-01-25 Thread GitBox


jtibshirani opened a new pull request #2245:
URL: https://github.com/apache/lucene-solr/pull/2245


   We introduced a new `Lucene90FieldInfosFormat`, so the old
   `Lucene60FieldInfosFormat` should live in backwards-codecs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] megancarey opened a new pull request #2244: SOLR-15099 Add null checks to IndexSizeTrigger

2021-01-25 Thread GitBox


megancarey opened a new pull request #2244:
URL: https://github.com/apache/lucene-solr/pull/2244


   …its are enqueued
   
   
   
   
   # Description
   
   Want to avoid noisy NPE on core info variables since we already log.warn on 
line 330.
   
   # Solution
   
   Minor fix: add null checks on the core info variables, as we've seen on ZK 
restarts that these are unavailable. 
   
   # Tests
   
   Ran IndexSizeTrigger test locally and it succeeded. Didn't add tests for 
this.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15099) Add null check on core info variables in IndexSizeTrigger

2021-01-25 Thread Megan Carey (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megan Carey updated SOLR-15099:
---
Labels: easyfix patch-available  (was: easyfix)

> Add null check on core info variables in IndexSizeTrigger
> -
>
> Key: SOLR-15099
> URL: https://issues.apache.org/jira/browse/SOLR-15099
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.7
>Reporter: Megan Carey
>Priority: Minor
>  Labels: easyfix, patch-available
>
> A minor fix, but we've seen NPEs from IndexSizeTrigger when ZK is restarted 
> since it's unable to fetch the core info. All we need is a null check. 
> In the patch I'll also update a string value: 
> https://github.com/apache/lucene-solr/blob/branch_8x/solr/core/src/java/org/apache/solr/cloud/autoscaling/IndexSizeTrigger.java#L339
> And might add a log to report index size when splits are enqueued :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15096) [REGRESSION] Collection Delete Performance significantly degraded in Java 11 v 8

2021-01-25 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271728#comment-17271728
 ] 

Mike Drob commented on SOLR-15096:
--

While I am very confident that I initially found this issue with a standalone 
zookeeper, I can no longer reproduce it with a zookeeper running in a local 
docker image (although I can still reproduce with our embedded zookeeper).

I am very unclear on what is going on here.

> [REGRESSION] Collection Delete Performance significantly degraded in Java 11 
> v 8
> 
>
> Key: SOLR-15096
> URL: https://issues.apache.org/jira/browse/SOLR-15096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: master (9.0)
>Reporter: Mike Drob
>Priority: Blocker
> Fix For: master (9.0)
>
> Attachments: Screen Shot 2021-01-21 at 5.44.25 PM.png
>
>
> While doing some other performance testing I noticed that collection deletion 
> in 8.8 (RC1) would take approximately 200ms, while the same operation would 
> take 800ms using a recent snapshot ({{a233ed2fd1b}}) from master branch.
> I have not done further investigation at this time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] thelabdude commented on a change in pull request #193: Support custom log4j2 config from user-provided ConfigMap

2021-01-25 Thread GitBox


thelabdude commented on a change in pull request #193:
URL: 
https://github.com/apache/lucene-solr-operator/pull/193#discussion_r564072280



##
File path: controllers/util/solr_util.go
##
@@ -327,13 +330,42 @@ func GenerateStatefulSet(solrCloud *solr.SolrCloud, 
solrCloudStatus *solr.SolrCl
envVars = append(envVars, customPodOptions.EnvVariables...)
}
 
+   // Did the user provide a custom log config?
+   if configMapInfo[LogXmlFile] != "" {
+
+   if configMapInfo[LogXmlMd5Annotation] != "" {
+   if podAnnotations == nil {
+   podAnnotations = make(map[string]string, 1)
+   }
+   podAnnotations[LogXmlMd5Annotation] = 
configMapInfo[LogXmlMd5Annotation]
+   }
+
+   // cannot use /var/solr as a mountPath, so mount the custom log 
config in a sub-dir
+   volName := "log4j2-xml"
+   mountPath := fmt.Sprintf("/var/solr/%s-log-config", 
solrCloud.Name)
+   log4jPropsEnvVarPath := fmt.Sprintf("%s/%s", mountPath, 
LogXmlFile)
+
+   solrVolumes = append(solrVolumes, corev1.Volume{

Review comment:
   This is a good catch! K8s allows it and results in a structure like the 
following in the STS:
   ```
 volumes:
 - configMap:
 defaultMode: 420
 items:
 - key: solr.xml
   path: solr.xml
 name: dev-custom-solr-xml
   name: solr-xml
 - configMap:
 defaultMode: 420
 items:
 - key: log4j2.xml
   path: log4j2.xml
 name: dev-custom-solr-xml
   name: log4j2-xml
   ```
   But we certainly should use a single volume with multiple `items` as you 
suggest ;-) Will fix it up and add a test for both keys being provided.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] HoustonPutman commented on a change in pull request #193: Support custom log4j2 config from user-provided ConfigMap

2021-01-25 Thread GitBox


HoustonPutman commented on a change in pull request #193:
URL: 
https://github.com/apache/lucene-solr-operator/pull/193#discussion_r564044525



##
File path: controllers/solrcloud_controller.go
##
@@ -182,44 +182,61 @@ func (r *SolrCloudReconciler) Reconcile(req ctrl.Request) 
(ctrl.Result, error) {
}
}
 
-   // Generate ConfigMap unless the user supplied a custom ConfigMap for 
solr.xml ... but the provided ConfigMap
-   // might be for the Prometheus exporter, so we only care if they 
provide a solr.xml in the CM
-   solrXmlConfigMapName := instance.ConfigMapName()
-   solrXmlMd5 := ""
+   // Generate ConfigMap unless the user supplied a custom ConfigMap for 
solr.xml
+   configMapInfo := make(map[string]string)
if instance.Spec.CustomSolrKubeOptions.ConfigMapOptions != nil && 
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap != "" {
+   providedConfigMapName := 
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap
foundConfigMap := {}
-   nn := types.NamespacedName{Name: 
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap, 
Namespace: instance.Namespace}
+   nn := types.NamespacedName{Name: providedConfigMapName, 
Namespace: instance.Namespace}
err = r.Get(context.TODO(), nn, foundConfigMap)
if err != nil {
return requeueOrNot, err // if they passed a 
providedConfigMap name, then it must exist
}
 
-   // ConfigMap doesn't have to have a solr.xml, but if it does, 
then it needs to be valid!
if foundConfigMap.Data != nil {
-   solrXml, ok := foundConfigMap.Data["solr.xml"]
-   if ok {
+   logXml, hasLogXml := 
foundConfigMap.Data[util.LogXmlFile]
+   solrXml, hasSolrXml := 
foundConfigMap.Data[util.SolrXmlFile]
+
+   // if there's a user-provided config, it must have one 
of the expected keys
+   if !hasLogXml && !hasSolrXml {
+   return requeueOrNot, fmt.Errorf("User provided 
ConfigMap %s must have one of 'solr.xml' and/or 'log4j2.xml'",
+   providedConfigMapName)
+   }
+
+   if hasSolrXml {
+   // make sure the user-provided solr.xml is valid
if !strings.Contains(solrXml, "${hostPort:") {
return requeueOrNot,
fmt.Errorf("Custom solr.xml in 
ConfigMap %s must contain a placeholder for the 'hostPort' variable, such as 
${hostPort:80}",
-   
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap)
+   providedConfigMapName)
}
// stored in the pod spec annotations on the 
statefulset so that we get a restart when solr.xml changes
-   solrXmlMd5 = fmt.Sprintf("%x", 
md5.Sum([]byte(solrXml)))
-   solrXmlConfigMapName = foundConfigMap.Name
-   } else {
-   return requeueOrNot, fmt.Errorf("Required 
'solr.xml' key not found in provided ConfigMap %s",
-   
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap)
+   configMapInfo[util.SolrXmlMd5Annotation] = 
fmt.Sprintf("%x", md5.Sum([]byte(solrXml)))
+   configMapInfo[util.SolrXmlFile] = 
foundConfigMap.Name
+   }
+
+   if hasLogXml {
+   if !strings.Contains(logXml, 
"monitorInterval=") {
+   // stored in the pod spec annotations 
on the statefulset so that we get a restart when the log config changes
+   configMapInfo[util.LogXmlMd5Annotation] 
= fmt.Sprintf("%x", md5.Sum([]byte(logXml)))
+   } // else log4j will automatically refresh for 
us, so no restart needed
+   configMapInfo[util.LogXmlFile] = 
foundConfigMap.Name
}
+
} else {
-   return requeueOrNot, fmt.Errorf("Provided ConfigMap %s 
has no data",
-   
instance.Spec.CustomSolrKubeOptions.ConfigMapOptions.ProvidedConfigMap)
+   return requeueOrNot, fmt.Errorf("Provided ConfigMap %s 
has no data", providedConfigMapName)
}
-   } else {
+   }
+
+   if configMapInfo[util.SolrXmlFile] == "" {

Review comment:
   So if a user passes a custom 

[jira] [Updated] (SOLR-15083) prometheus-exporter metric solr_metrics_jvm_os_cpu_time_seconds is misnamed

2021-01-25 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-15083:

Labels: monitoring newdev prometheus  (was: monitoring prometheus)

> prometheus-exporter metric solr_metrics_jvm_os_cpu_time_seconds is misnamed
> ---
>
> Key: SOLR-15083
> URL: https://issues.apache.org/jira/browse/SOLR-15083
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - prometheus-exporter
>Affects Versions: 8.6, master (9.0)
>Reporter: Mathieu Marie
>Priority: Minor
>  Labels: monitoring, newdev, prometheus
>
> *solr_metrics_jvm_os_cpu_time_seconds* metric exported by prometheus-exporter 
> has seconds in its name, however it appears that it is microseconds.
> This name can create confusion when one wants to report it in a dashboard.
>  That metric is defined in 
> [https://github.com/apache/lucene-solr/blob/branch_8_5/solr/contrib/prometheus-exporter/conf/solr-exporter-config.xml#L247]
>  {code}
>   
> .metrics["solr.jvm"] | to_entries | .[] | select(.key == 
> "os.processCpuTime") as $object |
> ($object.value / 1000.0) as $value |
> {
>   name : "solr_metrics_jvm_os_cpu_time_seconds",
>   type : "COUNTER",
>   help : "See following URL: 
> https://lucene.apache.org/solr/guide/metrics-reporting.html;,
>   label_names  : ["item"],
>   label_values : ["processCpuTime"],
>   value: $value
> }
>   
> {code}
> In the above config we see that the metric came from  *os.processCpuTime*, 
> which itself came from JMX call 
> [getProcessCpuTime()|https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuTime()].
> That javadoc says
> {code}
> long getProcessCpuTime()
> Returns the CPU time used by the process on which the Java virtual machine is 
> running in nanoseconds. The returned value is of nanoseconds precision but 
> not necessarily nanoseconds accuracy. This method returns -1 if the the 
> platform does not support this operation.
> Returns:
> the CPU time used by the process in nanoseconds, or -1 if this operation is 
> not supported.
> {code}
> Nanoseconds / 1000 is microseconds.
> Either the name or the computation should be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-10203) Remove dist/test-framework from the binary download archive

2021-01-25 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-10203.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

Resolving – it's close enough, given how large it was before.  We might still 
remove this lingering jar.  

I think someone wanting to learn how to write a Solr plugin might best be 
served by looking at existing ones, and having an increasing number of them 
hosted off of our repo is helpful in that regard.  We needn't have this last 
jar as a "signal" to it being possible.

> Remove dist/test-framework from the binary download archive
> ---
>
> Key: SOLR-10203
> URL: https://issues.apache.org/jira/browse/SOLR-10203
> Project: Solr
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 7.0
>Reporter: Alexandre Rafalovitch
>Assignee: Alexandre Rafalovitch
>Priority: Minor
> Fix For: master (9.0)
>
>
> Libraries in the dist/test-framework are shipped with every copy of Solr 
> binary, yet they are not used anywhere directly. They take approximately 10 
> MBytes. 
> Remove the directory and provide guidance in a README file on how to get them 
> for those people who are writing their own testing solutions against Solr.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15096) [REGRESSION] Collection Delete Performance significantly degraded in Java 11 v 8

2021-01-25 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271647#comment-17271647
 ] 

Mike Drob commented on SOLR-15096:
--

Some more specific logging to highlight the differences with a few observations:

{noformat:title=Java 8 RELOAD}
2021-01-25 19:45:10.804 INFO  
(OverseerThreadFactory-18-thread-3-processing-n:10.0.0.160:8983_solr) [   ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Executing Collection 
Cmd=action=RELOAD, asyncId=null
...
2021-01-25 19:45:10.941 INFO  (qtp1448061896-108) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/cores 
params={core=coll-1_shard1_replica_n1=/admin/cores=RELOAD=javabin=2}
 status=0 QTime=133
2021-01-25 19:45:10.944 INFO  (qtp1448061896-27) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/collections params={name=coll-1=RELOAD} 
status=0 QTime=143
{noformat}

{noformat:title=Java 11 RELOAD}
2021-01-25 19:43:47.073 INFO  
(OverseerThreadFactory-18-thread-3-processing-n:10.0.0.160:8983_solr) [   ] 
o.a.s.c.a.c.OverseerCollectionMessageHandler Executing Collection 
Cmd=action=RELOAD, asyncId=null
...
2021-01-25 19:43:47.221 INFO  (qtp1275028674-99) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/cores 
params={core=coll-1_shard1_replica_n1=/admin/cores=RELOAD=javabin=2}
 status=0 QTime=144
2021-01-25 19:43:47.297 INFO  (qtp1275028674-28) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/collections params={name=coll-1=RELOAD} 
status=0 QTime=274
{noformat}

The time of the *core* reload is pretty close - 133 v 144. I'm willing to call 
that within margin of error based on whatever else the OS was doing at the time.

The time of the *collection* reload is more suspicious. With Java 8, we log 
that we are executing the cmd 3ms after the timer starts, while with Java 11 
the time has already been running for *50ms* by the time we get to the same 
point. Perhaps there's something different about hash map lookup or call site 
resolution since we're technically using a method reference here. Then, there's 
an additional *76ms* pause after the core reload has completed before we 
acknowledge the collection reload. Together, these account for almost all of 
the performance difference that we observe.

> [REGRESSION] Collection Delete Performance significantly degraded in Java 11 
> v 8
> 
>
> Key: SOLR-15096
> URL: https://issues.apache.org/jira/browse/SOLR-15096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: master (9.0)
>Reporter: Mike Drob
>Priority: Blocker
> Fix For: master (9.0)
>
> Attachments: Screen Shot 2021-01-21 at 5.44.25 PM.png
>
>
> While doing some other performance testing I noticed that collection deletion 
> in 8.8 (RC1) would take approximately 200ms, while the same operation would 
> take 800ms using a recent snapshot ({{a233ed2fd1b}}) from master branch.
> I have not done further investigation at this time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15108) Randomly use new SolrCloud plugins in test suite

2021-01-25 Thread Megan Carey (Jira)
Megan Carey created SOLR-15108:
--

 Summary: Randomly use new SolrCloud plugins in test suite
 Key: SOLR-15108
 URL: https://issues.apache.org/jira/browse/SOLR-15108
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: AutoScaling, SolrCloud
Affects Versions: master (9.0)
Reporter: Megan Carey


The new pluggable Autoscaling framework is currently unused by the test suite. 
Ideally, our unit tests will run against this framework some percentage of the 
time. 

I'll work on configuring unit tests to switch between Legacy placement and the 
Affinity placement plugin, either via:
# A custom solr.xml that will trade off with default randomly
# Cluster properties set randomly
# Using a system property that will inform the solr.xml randomly



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15108) Randomly use new 9.0 Autoscaling plugins in test suite

2021-01-25 Thread Megan Carey (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megan Carey updated SOLR-15108:
---
Summary: Randomly use new 9.0 Autoscaling plugins in test suite  (was: 
Randomly use new SolrCloud plugins in test suite)

> Randomly use new 9.0 Autoscaling plugins in test suite
> --
>
> Key: SOLR-15108
> URL: https://issues.apache.org/jira/browse/SOLR-15108
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling, SolrCloud
>Affects Versions: master (9.0)
>Reporter: Megan Carey
>Priority: Major
>
> The new pluggable Autoscaling framework is currently unused by the test 
> suite. Ideally, our unit tests will run against this framework some 
> percentage of the time. 
> I'll work on configuring unit tests to switch between Legacy placement and 
> the Affinity placement plugin, either via:
> # A custom solr.xml that will trade off with default randomly
> # Cluster properties set randomly
> # Using a system property that will inform the solr.xml randomly



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] HoustonPutman merged pull request #200: Apachify the solr-operator helm chart

2021-01-25 Thread GitBox


HoustonPutman merged pull request #200:
URL: https://github.com/apache/lucene-solr-operator/pull/200


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15078) ExpandComponent treats all docs with '0' in a numeric collapse field the same as if null

2021-01-25 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter resolved SOLR-15078.
---
Fix Version/s: 8.9
   master (9.0)
   Resolution: Fixed

> ExpandComponent treats all docs with '0' in a numeric collapse field the same 
> as if null
> 
>
> Key: SOLR-15078
> URL: https://issues.apache.org/jira/browse/SOLR-15078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: master (9.0), 8.9
>
> Attachments: SOLR-15078.patch
>
>
> ExpandComponent has an equivalent to the collapse qparser bug tracked in 
> SOLR-15047...
> {quote}...has some very, _very_, old code/semantics in it that date back to 
> when the {{FieldCache}} was incapable of differentiating between a document 
> that contained '0' in the field being un-inverted, and a document that didn't 
> have any value in that field.
> This limitation does not exist in DocValues (nor has it existed for a long 
> time) but as the DocValues API has evolved, and as the [...] code has been 
> updated to take advantage of the newer APIs that make it obvious when a 
> document has no value in a field, the [...] code still explicitly equates "0" 
> in a numeric field with the "null group"
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15078) ExpandComponent treats all docs with '0' in a numeric collapse field the same as if null

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271622#comment-17271622
 ] 

ASF subversion and git services commented on SOLR-15078:


Commit d8a754a4b48d3a0a0bf7386a711deff007d63107 in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d8a754a ]

SOLR-15078: Fix ExpandComponent behavior when expanding on numeric fields to 
differentiate '0' group from null group

(cherry picked from commit 47a89aca715e18402c183ed15a6076603c63ec52)


> ExpandComponent treats all docs with '0' in a numeric collapse field the same 
> as if null
> 
>
> Key: SOLR-15078
> URL: https://issues.apache.org/jira/browse/SOLR-15078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15078.patch
>
>
> ExpandComponent has an equivalent to the collapse qparser bug tracked in 
> SOLR-15047...
> {quote}...has some very, _very_, old code/semantics in it that date back to 
> when the {{FieldCache}} was incapable of differentiating between a document 
> that contained '0' in the field being un-inverted, and a document that didn't 
> have any value in that field.
> This limitation does not exist in DocValues (nor has it existed for a long 
> time) but as the DocValues API has evolved, and as the [...] code has been 
> updated to take advantage of the newer APIs that make it obvious when a 
> document has no value in a field, the [...] code still explicitly equates "0" 
> in a numeric field with the "null group"
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9570) Review code diffs after automatic formatting and correct problems before it is applied

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271620#comment-17271620
 ] 

ASF subversion and git services commented on LUCENE-9570:
-

Commit acbea9ec2676b579beb706944fe9482d8d8f44c7 in lucene-solr's branch 
refs/heads/branch_8x from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=acbea9e ]

LUCENE-9570 Add placeholder revs file to branch_8x


> Review code diffs after automatic formatting and correct problems before it 
> is applied
> --
>
> Key: LUCENE-9570
> URL: https://issues.apache.org/jira/browse/LUCENE-9570
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Blocker
> Fix For: master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Review and correct all the javadocs before they're messed up by automatic 
> formatting. Apply project-by-project, review diff, correct. Lots of diffs but 
> it should be relatively quick.
> *Reviewing diffs manually*
>  * switch to branch jira/LUCENE-9570 which the PR is based on:
> {code:java}
> git remote add dweiss g...@github.com:dweiss/lucene-solr.git
> git fetch dweiss
> git checkout jira/LUCENE-9570
> {code}
>  * Open gradle/validation/spotless.gradle and locate the project/ package you 
> wish to review. Enable it in spotless.gradle by creating a corresponding 
> switch case block (refer to existing examples), for example:
> {code:java}
>   case ":lucene:highlighter":
> target "src/**"
> targetExclude "**/resources/**", "**/overview.html"
> break
> {code}
>  * Reformat the code:
> {code:java}
> gradlew tidy && git diff -w > /tmp/diff.patch && git status
> {code}
>  * Look at what has changed (git status) and review the differences manually 
> (/tmp/diff.patch). If everything looks ok, commit it directly to 
> jira/LUCENE-9570 or make a PR against that branch.
> {code:java}
> git commit -am ":lucene:core - src/**/org/apache/lucene/document/**"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] HoustonPutman commented on pull request #200: Apachify the solr-operator helm chart

2021-01-25 Thread GitBox


HoustonPutman commented on pull request #200:
URL: 
https://github.com/apache/lucene-solr-operator/pull/200#issuecomment-767049150


   I'll work on that as well @vladiceanu , but it is a separate issue. Thanks 
for bringing it up!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2230: SOLR-15011: /admin/logging handler is configured logs to all nodes

2021-01-25 Thread GitBox


dsmiley commented on a change in pull request #2230:
URL: https://github.com/apache/lucene-solr/pull/2230#discussion_r563971668



##
File path: solr/core/src/java/org/apache/solr/handler/admin/LoggingHandler.java
##
@@ -151,6 +156,9 @@ public void handleRequestBody(SolrQueryRequest req, 
SolrQueryResponse rsp) throw
   rsp.add("loggers", info);
 }
 rsp.setHttpCaching(false);
+if (cc != null && AdminHandlersProxy.maybeProxyToNodes(req, rsp, cc)) {

Review comment:
   SIH is doing what I suggest, which is different than what the PR is 
doing.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2021-01-25 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-8962:
-
Fix Version/s: 8.7
  Description: 
Two improvements were added: 8.6 has merge-on-commit (by Froh et. all), 8.7 has 
merge-on-refresh (by Simon).  See \{{MergePolicy.findFullFlushMerges}}

The original description follows:

With near-real-time search we ask {{IndexWriter}} to write all in-memory 
segments to disk and open an {{IndexReader}} to search them, and this is 
typically a quick operation.

However, when you use many threads for concurrent indexing, {{IndexWriter}} 
will accumulate write many small segments during {{refresh}} and this then adds 
search-time cost as searching must visit all of these tiny segments.

The merge policy would normally quickly coalesce these small segments if given 
a little time ... so, could we somehow improve \{{IndexWriter'}}s refresh to 
optionally kick off merge policy to merge segments below some threshold before 
opening the near-real-time reader?  It'd be a bit tricky because while we are 
waiting for merges, indexing may continue, and new segments may be flushed, but 
those new segments shouldn't be included in the point-in-time segments returned 
by refresh ...

One could almost do this on top of Lucene today, with a custom merge policy, 
and some hackity logic to have the merge policy target small segments just 
written by refresh, but it's tricky to then open a near-real-time reader, 
excluding newly flushed but including newly merged segments since the refresh 
originally finished ...

I'm not yet sure how best to solve this, so I wanted to open an issue for 
discussion!

  was:
With near-real-time search we ask {{IndexWriter}} to write all in-memory 
segments to disk and open an {{IndexReader}} to search them, and this is 
typically a quick operation.

However, when you use many threads for concurrent indexing, {{IndexWriter}} 
will accumulate write many small segments during {{refresh}} and this then adds 
search-time cost as searching must visit all of these tiny segments.

The merge policy would normally quickly coalesce these small segments if given 
a little time ... so, could we somehow improve {{IndexWriter'}}s refresh to 
optionally kick off merge policy to merge segments below some threshold before 
opening the near-real-time reader?  It'd be a bit tricky because while we are 
waiting for merges, indexing may continue, and new segments may be flushed, but 
those new segments shouldn't be included in the point-in-time segments returned 
by refresh ...

One could almost do this on top of Lucene today, with a custom merge policy, 
and some hackity logic to have the merge policy target small segments just 
written by refresh, but it's tricky to then open a near-real-time reader, 
excluding newly flushed but including newly merged segments since the refresh 
originally finished ...

I'm not yet sure how best to solve this, so I wanted to open an issue for 
discussion!


> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: master (9.0), 8.6, 8.7
>
> Attachments: LUCENE-8962_demo.png, failed-tests.patch, 
> failure_log.txt, test.diff
>
>  Time Spent: 31h
>  Remaining Estimate: 0h
>
> Two improvements were added: 8.6 has merge-on-commit (by Froh et. all), 8.7 
> has merge-on-refresh (by Simon).  See \{{MergePolicy.findFullFlushMerges}}
> The original description follows:
> 
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve \{{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written 

[jira] [Commented] (SOLR-15078) ExpandComponent treats all docs with '0' in a numeric collapse field the same as if null

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271597#comment-17271597
 ] 

ASF subversion and git services commented on SOLR-15078:


Commit 47a89aca715e18402c183ed15a6076603c63ec52 in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=47a89ac ]

SOLR-15078: Fix ExpandComponent behavior when expanding on numeric fields to 
differentiate '0' group from null group


> ExpandComponent treats all docs with '0' in a numeric collapse field the same 
> as if null
> 
>
> Key: SOLR-15078
> URL: https://issues.apache.org/jira/browse/SOLR-15078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-15078.patch
>
>
> ExpandComponent has an equivalent to the collapse qparser bug tracked in 
> SOLR-15047...
> {quote}...has some very, _very_, old code/semantics in it that date back to 
> when the {{FieldCache}} was incapable of differentiating between a document 
> that contained '0' in the field being un-inverted, and a document that didn't 
> have any value in that field.
> This limitation does not exist in DocValues (nor has it existed for a long 
> time) but as the DocValues API has evolved, and as the [...] code has been 
> updated to take advantage of the newer APIs that make it obvious when a 
> document has no value in a field, the [...] code still explicitly equates "0" 
> in a numeric field with the "null group"
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr-operator] thelabdude merged pull request #195: Improve Prom exporter docs

2021-01-25 Thread GitBox


thelabdude merged pull request #195:
URL: https://github.com/apache/lucene-solr-operator/pull/195


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15096) [REGRESSION] Collection Delete Performance significantly degraded in Java 11 v 8

2021-01-25 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271589#comment-17271589
 ] 

Mike Drob commented on SOLR-15096:
--

Interestingly, 
{{ConcurrentDeleteAndCreateCollectionTest.testConcurrentCreateAndDeleteOverTheSameConfig}}
 appears to be about 10% faster with Java 11 than with Java 8.

> [REGRESSION] Collection Delete Performance significantly degraded in Java 11 
> v 8
> 
>
> Key: SOLR-15096
> URL: https://issues.apache.org/jira/browse/SOLR-15096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: master (9.0)
>Reporter: Mike Drob
>Priority: Blocker
> Fix For: master (9.0)
>
> Attachments: Screen Shot 2021-01-21 at 5.44.25 PM.png
>
>
> While doing some other performance testing I noticed that collection deletion 
> in 8.8 (RC1) would take approximately 200ms, while the same operation would 
> take 800ms using a recent snapshot ({{a233ed2fd1b}}) from master branch.
> I have not done further investigation at this time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15073) Unsafe cast in SystemInfoHandler

2021-01-25 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke resolved SOLR-15073.

Resolution: Fixed

Thanks [~nyivan] for reporting this issue!

> Unsafe cast in SystemInfoHandler
> 
>
> Key: SOLR-15073
> URL: https://issues.apache.org/jira/browse/SOLR-15073
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6, 8.7
>Reporter: Nikolay Ivanov
>Assignee: Christine Poerschke
>Priority: Major
> Fix For: 8.8, master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I have observed an unsafe cast in 
> SystemInfoHandler::getSecurityInfo
> Is this by design? Currently I have a custom AuthorizationPlugin that 
> directly implements AuthorizationPlugin interface. With the latest solr 
> version it is not permitted anymore. A workaround is to extend the 
> RuleBasedAuthorizationPluginBase, which is not ideal imo. Please share your 
> thoughts



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-25 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke resolved SOLR-15071.

Resolution: Fixed

Thanks everyone!

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Assignee: Christine Poerschke
>Priority: Major
>  Labels: ltr
> Fix For: 8.8, master (9.0)
>
> Attachments: featurestore+model+sample_documents.zip
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> 

[jira] [Commented] (SOLR-15096) [REGRESSION] Collection Delete Performance significantly degraded in Java 11 v 8

2021-01-25 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271571#comment-17271571
 ] 

Mike Drob commented on SOLR-15096:
--

Minor updates:

Java 15 behaves similar to Java 11.

Using standalone mode and testing with core admin APIs does not seem affected.

> [REGRESSION] Collection Delete Performance significantly degraded in Java 11 
> v 8
> 
>
> Key: SOLR-15096
> URL: https://issues.apache.org/jira/browse/SOLR-15096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: master (9.0)
>Reporter: Mike Drob
>Priority: Blocker
> Fix For: master (9.0)
>
> Attachments: Screen Shot 2021-01-21 at 5.44.25 PM.png
>
>
> While doing some other performance testing I noticed that collection deletion 
> in 8.8 (RC1) would take approximately 200ms, while the same operation would 
> take 800ms using a recent snapshot ({{a233ed2fd1b}}) from master branch.
> I have not done further investigation at this time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15107) 500 errors due to "ArithmeticException: / by zero" in Jetty's AbstractConnectionPool

2021-01-25 Thread Chris M. Hostetter (Jira)
Chris M. Hostetter created SOLR-15107:
-

 Summary: 500 errors due to "ArithmeticException: / by zero" in 
Jetty's AbstractConnectionPool
 Key: SOLR-15107
 URL: https://issues.apache.org/jira/browse/SOLR-15107
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 8.8
Reporter: Chris M. Hostetter


Upstream bug affects jetty 9.4.32+, fixed in 9.4.36+...
 * [https://github.com/eclipse/jetty.project/issues/5731]
 * [https://github.com/eclipse/jetty.project/issues/5819]
 * [https://github.com/eclipse/jetty.project/pull/5820]

First affects Solr 8.8 due to Jetty upgrade in SOLR-14844

Looks like this in logs...
{noformat}
 123391 ERROR (qtp1570620031-1192) [x:collection1 ] 
o.a.s.h.RequestHandlerBase java.lang.ArithmeticException: / by zero
at org.eclipse.jetty.util.Pool.acquire(Pool.java:278)
at 
org.eclipse.jetty.client.AbstractConnectionPool.activate(AbstractConnectionPool.java:284)
at 
org.eclipse.jetty.client.AbstractConnectionPool.acquire(AbstractConnectionPool.java:209)
at 
org.eclipse.jetty.client.HttpDestination.process(HttpDestination.java:331)
at 
org.eclipse.jetty.client.HttpDestination.send(HttpDestination.java:318)
at 
org.eclipse.jetty.client.HttpDestination.send(HttpDestination.java:311)
at 
org.eclipse.jetty.client.HttpDestination.send(HttpDestination.java:288)
at 
org.eclipse.jetty.client.HttpDestination.send(HttpDestination.java:265)
at org.eclipse.jetty.client.HttpClient.send(HttpClient.java:594)
at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:772)
at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:764)
at 
org.apache.solr.client.solrj.impl.Http2SolrClient.asyncRequest(Http2SolrClient.java:387)
at 
org.apache.solr.client.solrj.impl.LBHttp2SolrClient.doRequest(LBHttp2SolrClient.java:151)
at 
org.apache.solr.client.solrj.impl.LBHttp2SolrClient.asyncReq(LBHttp2SolrClient.java:127)
at 
org.apache.solr.handler.component.HttpShardHandler.submit(HttpShardHandler.java:160)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:454)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2610)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:518)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:432)

...{noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15076) Inconsistent metric types in ReplicationHandler

2021-01-25 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15076.
-
Resolution: Fixed

> Inconsistent metric types in ReplicationHandler
> ---
>
> Key: SOLR-15076
> URL: https://issues.apache.org/jira/browse/SOLR-15076
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.9
>
>
> As pointed out by [~dsmiley] in SOLR-14924 there are cases when 
> ReplicaHandler returns unexpected type of a metric (string instead of a 
> number):
> {quote}
> There are test failures in TestReplicationHandler introduced by this change 
> (I think). See 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.String
>  at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0)
>  at 
> org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361)
> The test could be made to convert to a string. But it suggests an 
> inconsistency that ought to be fixed – apparently ReplicationHandler 
> sometimes returns its details using all strings and othertimes with the typed 
> variants – and that's bad.
> {quote}
> Reproducing seed from David:
> {quote}
> gradlew :solr:core:test --tests 
> "org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 
> -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 
> -Ptests.file.encoding=ISO-8859-1
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14924) Some ReplicationHandler metrics are reported using incorrect types

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271394#comment-17271394
 ] 

ASF subversion and git services commented on SOLR-14924:


Commit eaae9d18822c7648d0e0cfacc4e9e79b67ffbe90 in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eaae9d1 ]

SOLR-15076: Fix wrong test assumption - type of this property has changed
in SOLR-14924.


> Some ReplicationHandler metrics are reported using incorrect types
> --
>
> Key: SOLR-14924
> URL: https://issues.apache.org/jira/browse/SOLR-14924
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.6.3, 8.7
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.7
>
> Attachments: SOLR-14924.patch
>
>
> Some metrics reported from {{ReplicationHandler}} use incorrect types - they 
> are reported as String values instead of the numerics.
> This is caused by using {{ReplicationHandler.addVal}} utility method with the 
> type {{Integer.class}}, which the method doesn't support and it returns the 
> value as a string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15076) Inconsistent metric types in ReplicationHandler

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271393#comment-17271393
 ] 

ASF subversion and git services commented on SOLR-15076:


Commit eaae9d18822c7648d0e0cfacc4e9e79b67ffbe90 in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eaae9d1 ]

SOLR-15076: Fix wrong test assumption - type of this property has changed
in SOLR-14924.


> Inconsistent metric types in ReplicationHandler
> ---
>
> Key: SOLR-15076
> URL: https://issues.apache.org/jira/browse/SOLR-15076
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.9
>
>
> As pointed out by [~dsmiley] in SOLR-14924 there are cases when 
> ReplicaHandler returns unexpected type of a metric (string instead of a 
> number):
> {quote}
> There are test failures in TestReplicationHandler introduced by this change 
> (I think). See 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.String
>  at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0)
>  at 
> org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361)
> The test could be made to convert to a string. But it suggests an 
> inconsistency that ought to be fixed – apparently ReplicationHandler 
> sometimes returns its details using all strings and othertimes with the typed 
> variants – and that's bad.
> {quote}
> Reproducing seed from David:
> {quote}
> gradlew :solr:core:test --tests 
> "org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 
> -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 
> -Ptests.file.encoding=ISO-8859-1
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15076) Inconsistent metric types in ReplicationHandler

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271388#comment-17271388
 ] 

ASF subversion and git services commented on SOLR-15076:


Commit 166d39a12eff53d9cfdf47b101cfe98a7020dcba in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=166d39a ]

SOLR-15076: Fix wrong test assumption - type of this property has changed
in SOLR-14924.


> Inconsistent metric types in ReplicationHandler
> ---
>
> Key: SOLR-15076
> URL: https://issues.apache.org/jira/browse/SOLR-15076
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.9
>
>
> As pointed out by [~dsmiley] in SOLR-14924 there are cases when 
> ReplicaHandler returns unexpected type of a metric (string instead of a 
> number):
> {quote}
> There are test failures in TestReplicationHandler introduced by this change 
> (I think). See 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.String
>  at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0)
>  at 
> org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361)
> The test could be made to convert to a string. But it suggests an 
> inconsistency that ought to be fixed – apparently ReplicationHandler 
> sometimes returns its details using all strings and othertimes with the typed 
> variants – and that's bad.
> {quote}
> Reproducing seed from David:
> {quote}
> gradlew :solr:core:test --tests 
> "org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 
> -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 
> -Ptests.file.encoding=ISO-8859-1
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14924) Some ReplicationHandler metrics are reported using incorrect types

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271389#comment-17271389
 ] 

ASF subversion and git services commented on SOLR-14924:


Commit 166d39a12eff53d9cfdf47b101cfe98a7020dcba in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=166d39a ]

SOLR-15076: Fix wrong test assumption - type of this property has changed
in SOLR-14924.


> Some ReplicationHandler metrics are reported using incorrect types
> --
>
> Key: SOLR-14924
> URL: https://issues.apache.org/jira/browse/SOLR-14924
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.6.3, 8.7
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.7
>
> Attachments: SOLR-14924.patch
>
>
> Some metrics reported from {{ReplicationHandler}} use incorrect types - they 
> are reported as String values instead of the numerics.
> This is caused by using {{ReplicationHandler.addVal}} utility method with the 
> type {{Integer.class}}, which the method doesn't support and it returns the 
> value as a string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-25 Thread GitBox


bruno-roustant commented on a change in pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563785114



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException {
   }
 
   private static class TermsDict extends BaseTermsEnum {
+static final int PADDING_LENGTH = 7;

Review comment:
   Ok, in this case can we rename it LZ4_DECOMPRESSOR_PADDING and add this 
comment about the decompression speed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14067) Move StatelessScriptUpdateProcessor to a contrib

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271350#comment-17271350
 ] 

ASF subversion and git services commented on SOLR-14067:


Commit ce1bba6d66ae71d928e8d3932cfc7409ee5fdf53 in lucene-solr's branch 
refs/heads/master from ep...@opensourceconnections.com
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ce1bba6 ]

Revert "SOLR-14067: v3 Create /contrib/scripting module with 
ScriptingUpdateProcessor (#2215)"

This reverts commit cf5db8d6513e0f3e556ab6ee1b9ad3a6472ad2f2.


> Move StatelessScriptUpdateProcessor to a contrib
> 
>
> Key: SOLR-14067
> URL: https://issues.apache.org/jira/browse/SOLR-14067
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: David Eric Pugh
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Move server-side scripting out of core and into a new contrib.  This is 
> better for security.
> Former description:
> 
> We should eliminate all scripting capabilities within Solr. Let us start with 
> the StatelessScriptUpdateProcessor deprecation/removal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15106) Thread in OverseerTaskProcessor should not "return"

2021-01-25 Thread Mathieu Marie (Jira)
Mathieu Marie created SOLR-15106:


 Summary: Thread in OverseerTaskProcessor should not "return"
 Key: SOLR-15106
 URL: https://issues.apache.org/jira/browse/SOLR-15106
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 8.6, master (9.0)
Reporter: Mathieu Marie


I have encountered a scenario were ZK was not accessible for a long time (due 
to _jute.maxbuffer_ issue, but not related to the rest of this issue).
During that time, the ClusterStateUpdater and OC queues from the Overseer got 
filled with 1200+ messages.

Once we restored ZK availability, the ClusterStateUpdater queue got emptied, 
but not the OC one.

The Overseer stopped to dequeue from the OC queue.

After some digging in the code it seems that a *return* from the overseer 
thread starting the runners could be the issue.

Code in OverseerTaskProcessor.java 
(https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L357)
The lines of codes that immediately follow should also be reviewed carefully as 
they also return or interrupt the thread that is responsible to execute the 
runners.

Anyhow, if anybody hit that same issue, the quick workaround is to bump the 
overseer instance to elect a new overseer on another node.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-25 Thread GitBox


jaisonbi commented on a change in pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563708394



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
##
@@ -791,6 +806,107 @@ private void addTermsDict(SortedSetDocValues values) 
throws IOException {
 writeTermsIndex(values);
   }
 
+  private void addCompressedTermsDict(SortedSetDocValues values) throws 
IOException {

Review comment:
   I will try to optimize this method...thanks for the comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-25 Thread GitBox


jaisonbi commented on a change in pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563702718



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -1144,6 +1157,7 @@ public TermsEnum termsEnum() throws IOException {
   }
 
   private static class TermsDict extends BaseTermsEnum {
+static final int PADDING_LENGTH = 7;

Review comment:
   Just refer from CompressionMode$LZ4_DECOMPRESSOR...it said add 7 padding 
bytes can help decompression run faster...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-25 Thread GitBox


jaisonbi commented on a change in pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563700602



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
##
@@ -370,6 +378,11 @@ public void close() throws IOException {
 long termsIndexLength;
 long termsIndexAddressesOffset;
 long termsIndexAddressesLength;
+
+boolean compressed;
+// Reserved for support other compressors.
+int compressorCode;

Review comment:
   will remove this..just thought we could support more types of 
compression algorithms here...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271292#comment-17271292
 ] 

ASF subversion and git services commented on LUCENE-9575:
-

Commit f942b2dd8a484879d806fcc4fa95c7393f348d9e in lucene-solr's branch 
refs/heads/master from Gus Heck
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f942b2d ]

@gus-asf LUCENE-9575 Provide a producer for PatternTypingRule in 
TestRandomChains (#2241)

LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains to fix 
failure on seed 65EA739C95F40313


> Add PatternTypingFilter
> ---
>
> Key: LUCENE-9575
> URL: https://issues.apache.org/jira/browse/LUCENE-9575
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> One of the key asks when the Library of Congress was asking me to develop the 
> Advanced Query Parser was to be able to recognize arbitrary patterns that 
> included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
> wanted 401k and 401(k) to match documents with either style reference, and 
> NOT match documents that happen to have isolated 401 or k tokens (i.e. not 
> documents about the http status code) And of course we wanted to give up as 
> little of the text analysis features they were already using.
> This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
> one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
> arbitrary analyzer defined for a type in the solr schema, combine to achieve 
> this. 
> This filter has the job of spotting the patterns, and adding the intended 
> synonym as at type to the token (from which minimal punctuation has been 
> removed). It also sets flags on the token which are retained through the 
> analysis chain, and at the very end the type is converted to a synonym and 
> the original token(s) for that type are dropped avoiding the match on 401 
> (for example) 
> The pattern matching is specified in a file that looks like: 
> {code}
> 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
> 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
> 2 C\+\+ ::: c_plus_plus
> {code}
> That file would match match legal reference patterns such as 401(k), 401k, 
> 501(c)3 and C++ The format is:
>   ::: 
> and groups in the pattern are substituted into the replacement so the first 
> line above would create synonyms such as:
> {code}
> 401k   --> legal2_401_k
> 401(k) --> legal2_401_k
> 503(c) --> legal2_503_c
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter

2021-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271293#comment-17271293
 ] 

ASF subversion and git services commented on LUCENE-9575:
-

Commit f942b2dd8a484879d806fcc4fa95c7393f348d9e in lucene-solr's branch 
refs/heads/master from Gus Heck
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f942b2d ]

@gus-asf LUCENE-9575 Provide a producer for PatternTypingRule in 
TestRandomChains (#2241)

LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains to fix 
failure on seed 65EA739C95F40313


> Add PatternTypingFilter
> ---
>
> Key: LUCENE-9575
> URL: https://issues.apache.org/jira/browse/LUCENE-9575
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> One of the key asks when the Library of Congress was asking me to develop the 
> Advanced Query Parser was to be able to recognize arbitrary patterns that 
> included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
> wanted 401k and 401(k) to match documents with either style reference, and 
> NOT match documents that happen to have isolated 401 or k tokens (i.e. not 
> documents about the http status code) And of course we wanted to give up as 
> little of the text analysis features they were already using.
> This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
> one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
> arbitrary analyzer defined for a type in the solr schema, combine to achieve 
> this. 
> This filter has the job of spotting the patterns, and adding the intended 
> synonym as at type to the token (from which minimal punctuation has been 
> removed). It also sets flags on the token which are retained through the 
> analysis chain, and at the very end the type is converted to a synonym and 
> the original token(s) for that type are dropped avoiding the match on 401 
> (for example) 
> The pattern matching is specified in a file that looks like: 
> {code}
> 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
> 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
> 2 C\+\+ ::: c_plus_plus
> {code}
> That file would match match legal reference patterns such as 401(k), 401k, 
> 501(c)3 and C++ The format is:
>   ::: 
> and groups in the pattern are substituted into the replacement so the first 
> line above would create synonyms such as:
> {code}
> 401k   --> legal2_401_k
> 401(k) --> legal2_401_k
> 503(c) --> legal2_503_c
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jaisonbi commented on a change in pull request #2213: LUCENE-9663: Adding compression to terms dict from SortedSet/Sorted DocValues

2021-01-25 Thread GitBox


jaisonbi commented on a change in pull request #2213:
URL: https://github.com/apache/lucene-solr/pull/2213#discussion_r563698470



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesConsumer.java
##
@@ -731,7 +731,22 @@ private void doAddSortedField(FieldInfo field, 
DocValuesProducer valuesProducer)
   meta.writeLong(data.getFilePointer() - start); // ordsLength
 }
 
-addTermsDict(DocValues.singleton(valuesProducer.getSorted(field)));
+int valuesCount = values.getValueCount();
+switch (mode) {

Review comment:
   yes, should use "if" instead of "switch", thanks:)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf merged pull request #2241: @gus-asf LUCENE-9575 Provide a producer for PatternTypingRule in TestRandomChains

2021-01-25 Thread GitBox


gus-asf merged pull request #2241:
URL: https://github.com/apache/lucene-solr/pull/2241


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter opened a new pull request #2243: LUCENE-9698: Hunspell: reuse char[] when possible when stripping affix

2021-01-25 Thread GitBox


donnerpeter opened a new pull request #2243:
URL: https://github.com/apache/lucene-solr/pull/2243


   
   
   
   # Description
   
   There's no need to allocate another char[] if we can analyze a sub-array of 
what we already have
   
   # Solution
   
   In addition to `char[]` and `int length`, pass `int offset` everywhere, and 
adjust offset/length instead of allocating a new array, when an affix is 
removed and nothing is added in its place.
   
   # Tests
   
   No behavior change, no new tests
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9698) Hunspell: reuse char[] when possible when stripping affix

2021-01-25 Thread Peter Gromov (Jira)
Peter Gromov created LUCENE-9698:


 Summary: Hunspell: reuse char[] when possible when stripping affix
 Key: LUCENE-9698
 URL: https://issues.apache.org/jira/browse/LUCENE-9698
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov


to reduce allocation rate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter opened a new pull request #2242: LUCENE-9697: Hunspell Stemmer: use the same FST.BytesReader on all recursion levels

2021-01-25 Thread GitBox


donnerpeter opened a new pull request #2242:
URL: https://github.com/apache/lucene-solr/pull/2242


   
   
   
   # Description
   
   There's no need to allocate 3 `BytesReader`s when just one would be enough, 
as it's used as a scratch, without a need to preserve any state between uses.
   
   # Solution
   
   Allocate just one `BytesReader` per affix type
   
   # Tests
   
   No behavior change, no tests
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15105) Sum aggregation not supported for externalField [Exception]

2021-01-25 Thread Hitesh Khandelwal (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Khandelwal updated SOLR-15105:
-
Description: 
I upgraded solr (earlier version was 8.1.0) and got the following exception:
{code:java}
org.apache.solr.common.SolrException: sum aggregation not supported for 
popularityFile
 at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45)
 at 
org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221)
 at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87){code}
It happens when doing sum aggregation on a field type of solr.ExternalFileField

Here's the fieldType config:
{code:java}
{code}

  was:
I upgraded solr (earlier version was 8.1.0) and got the following exception:
{code:java}
org.apache.solr.common.SolrException: sum aggregation not supported for 
popularityFile
 at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45)
 at 
org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221)
 at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87)
 It happens when doing sum aggregation on a field type of 
solr.ExternalFileField{code}
Here's the fieldType config:
{code:java}
{code}


> Sum aggregation not supported for externalField [Exception]
> ---
>
> Key: SOLR-15105
> URL: https://issues.apache.org/jira/browse/SOLR-15105
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 8.7
>Reporter: Hitesh Khandelwal
>Priority: Major
>
> I upgraded solr (earlier version was 8.1.0) and got the following exception:
> {code:java}
> org.apache.solr.common.SolrException: sum aggregation not supported for 
> popularityFile
>  at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45)
>  at 
> org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221)
>  at 
> org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87){code}
> It happens when doing sum aggregation on a field type of 
> solr.ExternalFileField
> Here's the fieldType config:
> {code:java}
>  indexed="true" class="solr.ExternalFileField"/>{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15105) Sum aggregation not supported for externalField [Exception]

2021-01-25 Thread Hitesh Khandelwal (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Khandelwal updated SOLR-15105:
-
Description: 
I upgraded solr (earlier version was 8.1.0) and got the following exception:
{code:java}
org.apache.solr.common.SolrException: sum aggregation not supported for 
popularityFile
 at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45)
 at 
org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221)
 at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87)
 It happens when doing sum aggregation on a field type of 
solr.ExternalFileField{code}
Here's the fieldType config:
{code:java}
{code}

  was:
I upgraded solr (earlier version was 8.1.0) and got the following exception:
org.apache.solr.common.SolrException: sum aggregation not supported for 
popularityFile
at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45)
at 
org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221)
at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87)
It happens when doing sum aggregation on a field type of solr.ExternalFileField

Here's the fieldType config:
{code:java}
{code}


> Sum aggregation not supported for externalField [Exception]
> ---
>
> Key: SOLR-15105
> URL: https://issues.apache.org/jira/browse/SOLR-15105
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 8.7
>Reporter: Hitesh Khandelwal
>Priority: Major
>
> I upgraded solr (earlier version was 8.1.0) and got the following exception:
> {code:java}
> org.apache.solr.common.SolrException: sum aggregation not supported for 
> popularityFile
>  at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45)
>  at 
> org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221)
>  at 
> org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87)
>  It happens when doing sum aggregation on a field type of 
> solr.ExternalFileField{code}
> Here's the fieldType config:
> {code:java}
>  indexed="true" class="solr.ExternalFileField"/>{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15105) Sum aggregation not supported for externalField [Exception]

2021-01-25 Thread Hitesh Khandelwal (Jira)
Hitesh Khandelwal created SOLR-15105:


 Summary: Sum aggregation not supported for externalField 
[Exception]
 Key: SOLR-15105
 URL: https://issues.apache.org/jira/browse/SOLR-15105
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: search
Affects Versions: 8.7
Reporter: Hitesh Khandelwal


I upgraded solr (earlier version was 8.1.0) and got the following exception:
org.apache.solr.common.SolrException: sum aggregation not supported for 
popularityFile
at org.apache.solr.search.facet.SumAgg.createSlotAcc(SumAgg.java:45)
at 
org.apache.solr.search.facet.FacetFieldProcessor.createCollectAcc(FacetFieldProcessor.java:221)
at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.createCollectAcc(FacetFieldProcessorByArray.java:87)
It happens when doing sum aggregation on a field type of solr.ExternalFileField

Here's the fieldType config:
{code:java}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

2021-01-25 Thread GitBox


donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563684678



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
 @Override
 void appendFlag(char flag, StringBuilder to) {

Review comment:
   Yes. Currently it's just one place which definitely appends no flags 
before this one, and may append some flags after this, so the implementation is 
tied to that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9697) Hunspell Stemmer: use the same FST.BytesReader on all recursion levels

2021-01-25 Thread Peter Gromov (Jira)
Peter Gromov created LUCENE-9697:


 Summary: Hunspell Stemmer: use the same FST.BytesReader on all 
recursion levels
 Key: LUCENE-9697
 URL: https://issues.apache.org/jira/browse/LUCENE-9697
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Peter Gromov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

2021-01-25 Thread GitBox


dweiss commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563680034



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
 @Override
 void appendFlag(char flag, StringBuilder to) {

Review comment:
   Oh, so something else (other than this method) appends to that 
stringbuilder? Maybe those places should be fixed instead? I don't have all of 
the code in front of me, so the question may be naive.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

2021-01-25 Thread GitBox


dweiss commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563678211



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
 if (replacement.isEmpty()) {
   continue;
 }
-flags[upto++] = (char) Integer.parseInt(replacement);
+int flag = Integer.parseInt(replacement);
+if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags 
as well
+  throw new IllegalArgumentException(

Review comment:
   Eh, I was afraid of that. It'd be good to consolidate it at some point.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

2021-01-25 Thread GitBox


donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563675377



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##
@@ -588,7 +577,7 @@ private boolean checkCondition(
   }
 
   private boolean isFlagAppendedByAffix(int affixId, char flag) {
-if (affixId < 0) return false;
+if (affixId < 0 || flag == 0) return false;

Review comment:
   A good idea, thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

2021-01-25 Thread GitBox


donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563675171



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
 @Override
 void appendFlag(char flag, StringBuilder to) {

Review comment:
   Yes, that's `some tests failed after implementing step 1 and were fixed 
in step 2`. However nice it seemed, it was wrong, because other flags were 
appended after this one without any comma. Trailing commas are no problem, as 
empty flags are skipped in the previous method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

2021-01-25 Thread GitBox


donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563673451



##
File path: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
 if (replacement.isEmpty()) {
   continue;
 }
-flags[upto++] = (char) Integer.parseInt(replacement);
+int flag = Integer.parseInt(replacement);
+if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags 
as well
+  throw new IllegalArgumentException(

Review comment:
   `ParseException` needs some `errorOffset` obligatorily (which is 
dubiously filled here with the current line number), and it's not available in 
this method, and not all callers have anything meaningful to pass there. For 
consistency, we could replace `ParseException` with something less choosy :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #2199: SOLR-15055 (Take 2) Re-implement 'withCollection'

2021-01-25 Thread GitBox


murblanc commented on a change in pull request #2199:
URL: https://github.com/apache/lucene-solr/pull/2199#discussion_r563645684



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java
##
@@ -238,11 +258,93 @@ public PlacementPlan computePlacement(Cluster cluster, 
PlacementRequest request,
 // failure. Current code does fail if placement is impossible 
(constraint is at most one replica of a shard on any node).
 for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) {
   makePlacementDecisions(solrCollection, shardName, availabilityZones, 
replicaType, request.getCountReplicasToCreate(replicaType),
-  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementPlanFactory, replicaPlacements);
+  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementContext.getPlacementPlanFactory(), replicaPlacements);
 }
   }
 
-  return placementPlanFactory.createPlacementPlan(request, 
replicaPlacements);
+  return 
placementContext.getPlacementPlanFactory().createPlacementPlan(request, 
replicaPlacements);
+}
+
+@Override
+public void verifyAllowedModification(ModificationRequest 
modificationRequest, PlacementContext placementContext) throws 
PlacementModificationException, InterruptedException {
+  if (modificationRequest instanceof DeleteShardsRequest) {
+throw new UnsupportedOperationException("not implemented yet");
+  } else if (modificationRequest instanceof DeleteCollectionRequest) {
+verifyDeleteCollection((DeleteCollectionRequest) modificationRequest, 
placementContext);
+  } else if (modificationRequest instanceof DeleteReplicasRequest) {
+verifyDeleteReplicas((DeleteReplicasRequest) modificationRequest, 
placementContext);
+  } else {
+throw new UnsupportedOperationException("unsupported request type " + 
modificationRequest.getClass().getName());
+  }
+}
+
+private void verifyDeleteCollection(DeleteCollectionRequest 
deleteCollectionRequest, PlacementContext placementContext) throws 
PlacementModificationException, InterruptedException {

Review comment:
   Can we have cycles in the `withCollection` graph? Should we allow a way 
to override the vetting checks from the Collection API?

##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/DeleteShardsRequest.java
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.cluster.placement;
+
+import java.util.Set;
+
+/**
+ * Delete shards request.
+ */
+public interface DeleteShardsRequest extends ModificationRequest {

Review comment:
   If we don't use this interface (i.e. the class that implements it) I 
suggest we do not include either in this PR. Or at least define and call the 
corresponding method in `AssignStrategy` from the appropriate `*Cmd` even if 
nothing does a real implementation and vetting based on it (but it would be 
ready to be consumed maybe by another plugin written by some user).

##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/plugins/AffinityPlacementFactory.java
##
@@ -238,11 +258,93 @@ public PlacementPlan computePlacement(Cluster cluster, 
PlacementRequest request,
 // failure. Current code does fail if placement is impossible 
(constraint is at most one replica of a shard on any node).
 for (Replica.ReplicaType replicaType : Replica.ReplicaType.values()) {
   makePlacementDecisions(solrCollection, shardName, availabilityZones, 
replicaType, request.getCountReplicasToCreate(replicaType),
-  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementPlanFactory, replicaPlacements);
+  attrValues, replicaTypeToNodes, nodesWithReplicas, coresOnNodes, 
placementContext.getPlacementPlanFactory(), replicaPlacements);
 }
   }
 
-  return placementPlanFactory.createPlacementPlan(request, 
replicaPlacements);
+  return 
placementContext.getPlacementPlanFactory().createPlacementPlan(request, 
replicaPlacements);
+}
+
+@Override

[GitHub] [lucene-solr-operator] vladiceanu commented on pull request #200: Apachify the solr-operator helm chart

2021-01-25 Thread GitBox


vladiceanu commented on pull request #200:
URL: 
https://github.com/apache/lucene-solr-operator/pull/200#issuecomment-766739705


   Not sure if it's the right place to mention, but 
https://artifacthub.io/packages/helm/solr-operator/solr-operator also needs to 
be updated to point to the new chart location 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] NazerkeBS commented on a change in pull request #2230: SOLR-15011: /admin/logging handler is configured logs to all nodes

2021-01-25 Thread GitBox


NazerkeBS commented on a change in pull request #2230:
URL: https://github.com/apache/lucene-solr/pull/2230#discussion_r563636966



##
File path: solr/core/src/java/org/apache/solr/handler/admin/LoggingHandler.java
##
@@ -151,6 +156,9 @@ public void handleRequestBody(SolrQueryRequest req, 
SolrQueryResponse rsp) throw
   rsp.add("loggers", info);
 }
 rsp.setHttpCaching(false);
+if (cc != null && AdminHandlersProxy.maybeProxyToNodes(req, rsp, cc)) {

Review comment:
   SystemInfoHandler is doing similar to this logic; 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9696) RegExp with group references

2021-01-25 Thread Gus Heck (Jira)
Gus Heck created LUCENE-9696:


 Summary: RegExp with group references
 Key: LUCENE-9696
 URL: https://issues.apache.org/jira/browse/LUCENE-9696
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Gus Heck


PatternTypingFilter presently relies on java util regexes, but LUCENE-7465 
found performance benefits using our own RegExp class instead. Unfortunately 
RegExp does not currently report matching subgroups which is key to 
PatternTypingFilter's use (and probably useful in other endeavors as well).  
What's needed is reporting of sub-groups such that 

new RegExp("(foo(.+)")) -->> converted to run atomaton etc --> match found for 
"foobar" --> somehow reports getGroup(1) as "bar"

And getGroup() can be called on some object reasonably accessible to the code 
using RegExp in the first place.

Clearly there's a lot to be worked out there since the normal usage pattern 
converts things to a DFA / run Automaton etc, and subgroups are not a natural 
concept for those classes. But if this could be achieved without loosing the 
performance benefits, that would be interesting :).

Opening this Wish ticket as encouraged by [~mikemccand] in LUCENE-9575.  I 
won't be able to work on it any time soon to encourage anyone else interested 
to pick it up or to drop links or ideas in here. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9575) Add PatternTypingFilter

2021-01-25 Thread Jim Ferenczi (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271218#comment-17271218
 ] 

Jim Ferenczi commented on LUCENE-9575:
--

Thanks [~gus] and sorry for the race condition ;).

> Add PatternTypingFilter
> ---
>
> Key: LUCENE-9575
> URL: https://issues.apache.org/jira/browse/LUCENE-9575
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> One of the key asks when the Library of Congress was asking me to develop the 
> Advanced Query Parser was to be able to recognize arbitrary patterns that 
> included punctuation such as POW/MIA or 401(k) or C++ etc. Additionally they 
> wanted 401k and 401(k) to match documents with either style reference, and 
> NOT match documents that happen to have isolated 401 or k tokens (i.e. not 
> documents about the http status code) And of course we wanted to give up as 
> little of the text analysis features they were already using.
> This filter in conjunction with the filters from LUCENE-9572, LUCENE-9574 and 
> one solr specific filter in SOLR-14597 that re-analyzes tokens with an 
> arbitrary analyzer defined for a type in the solr schema, combine to achieve 
> this. 
> This filter has the job of spotting the patterns, and adding the intended 
> synonym as at type to the token (from which minimal punctuation has been 
> removed). It also sets flags on the token which are retained through the 
> analysis chain, and at the very end the type is converted to a synonym and 
> the original token(s) for that type are dropped avoiding the match on 401 
> (for example) 
> The pattern matching is specified in a file that looks like: 
> {code}
> 2 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
> 2 (\d+)\(?([a-z])\)?\(?(\d+)\)? ::: legal3_$1_$2_$3
> 2 C\+\+ ::: c_plus_plus
> {code}
> That file would match match legal reference patterns such as 401(k), 401k, 
> 501(c)3 and C++ The format is:
>   ::: 
> and groups in the pattern are substituted into the replacement so the first 
> line above would create synonyms such as:
> {code}
> 401k   --> legal2_401_k
> 401(k) --> legal2_401_k
> 503(c) --> legal2_503_c
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >