Dynamic schema failure for child docs not using "_childDocuments_" key

2020-05-05 Thread mmb1234
I am running into a exception where creating child docs fails unless the
field already exists in the schema (stacktrace is at the bottom of this
post). My solr is v8.5.1 running in standard/non-cloud mode.

$> curl -X POST -H 'Content-Type: application/json'
'http://localhost:8983/solr/mycore/update' --data-binary '[{
  "id": "3dae27db6ee43e878b9d0e8e",
  "phone": "+1 (123) 456-7890",
  "myChildDocuments": [{
"id": "3baf27db6ee43387849d0e8e",
 "enabled": false
   }]
}]'

{
  "responseHeader":{
"status":400,
"QTime":285},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"ERROR: [doc=3baf27db6ee43387849d0e8e] unknown field 'enabled'",
"code":400}}


However using "_childDocuments_" key, it succeeds and child doc fields get
created in the managed-schema

$> curl -X POST -H 'Content-Type: application/json'
'http://localhost:8983/solr/mycore/update' --data-binary '[{
  "id": "6dae27db6ee43e878b9d0e8e",
  "phone": "+1 (123) 456-7890",
  "_childDocuments_": [{
"id": "6baf27db6ee43387849d0e8e",
 "enabled": false
   }]
}]'

{
  "responseHeader":{
"status":0,
"QTime":285}}


== stacktrace ==
2020-05-06 01:01:26.762 ERROR (qtp1569435561-19) [   x:standalone]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR:
[doc=3baf27db6ee43387849d0e8e] unknown field 'enabled'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:226)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:100)
at
org.apache.solr.update.AddUpdateCommand.lambda$null$0(AddUpdateCommand.java:224)
at
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at
java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1631)
at
java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at
java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at
java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
at
java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at 
java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:282)
at
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
at
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1284)
at
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1277)
at
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:975)
at
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:345)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:292)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:239)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:76)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at
org.apache.solr.update.processor.NestedUpdateProcessorFactory$NestedUpdateProcessor.processAdd(NestedUpdateProcessorFactory.java:79)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:259)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:489)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
at 
org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:477)
   

Re: facets & docValues

2020-05-05 Thread Revas
Hi joel, No, we have not, we have softCommit requirement of 2 secs.

On Tue, May 5, 2020 at 3:31 PM Joel Bernstein  wrote:

> Have you configured static warming queries for the facets? This will warm
> the cache structures for the facet fields. You just want to make sure you
> commits are spaced far enough apart that the warming completes before a new
> searcher starts warming.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, May 4, 2020 at 10:27 AM Revas  wrote:
>
> > Hi Erick, Thanks for the explanation and advise. With facet queries, does
> > doc Values help at all ?
> >
> > 1) indexed=true, docValues=true =>  all facets
> >
> > 2)
> >
> >-  indexed=true , docValues=true => only for subfacets
> >- inexed=true, docValues=false=> facet query
> >- docValues=true, indexed=false=> term facets
> >
> >
> >
> > In case of 1 above, => Indexing slowed considerably. over all facet
> > performance improved many fold
> > In case of  2=>  over all performance showed only slight
> > improvement
> >
> > Does that mean turning on docValues even for facet query helps improve
> the
> > performance,  fetching from docValues for facet query is faster than
> > fetching from stored fields ?
> >
> > Thanks
> >
> >
> > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson 
> > wrote:
> >
> > > DocValues should help when faceting over fields, i.e. facet.field=blah.
> > >
> > > I would expect docValues to help with sub facets and, but don’t know
> > > the code well enough to say definitely one way or the other.
> > >
> > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> and
> > > turn docValues off. What that means is that if any operation tries to
> > > uninvert
> > > the index on the Java heap, you’ll get an exception like:
> > > "can not sort on a field w/o docValues unless it is indexed=true
> > > uninvertible=true and the type supports Uninversion:”
> > >
> > > See SOLR-12962
> > >
> > > Speed is only one issue. The entire point of docValues is to not
> > “uninvert”
> > > the field on the heap. This used to lead to very significant memory
> > > pressure. So when turning docValues off, you run the risk of
> > > reverting back to the old behavior and having unexpected memory
> > > consumption, not to mention slowdowns when the uninversion
> > > takes place.
> > >
> > > Also, unless your documents are very large, this is a tiny corpus. It
> can
> > > be
> > > quite hard to get realistic numbers, the signal gets lost in the noise.
> > >
> > > You should only shard when your individual query times exceed your
> > > requirement. Say you have a 95%tile requirement of 1 second response
> > time.
> > >
> > > Let’s further say that you can meet that requirement with 50
> > > queries/second,
> > > but when you get to 75 queries/second your response time exceeds your
> > > requirements. Do NOT shard at this point. Add another replica instead.
> > > Sharding adds inevitable overhead and should only be considered when
> > > you can’t get adequate response time even under fairly light query
> loads
> > > as a general rule.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> > > >
> > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> and
> > > > turning on the indexing on the facet fields helped improve the
> timings
> > of
> > > > the facet query a lot which has (sub facets and facet queries). So
> does
> > > > docValues help at all for sub facets and facet query, our tests
> > > > revealed further query time improvement when we turned off the
> > docValues.
> > > > is that the right approach?
> > > >
> > > > Currently we have only 1 shard and  we are thinking of scaling by
> > > > increasing the number of shards when we see a deterioration on query
> > > time.
> > > > Any suggestions?
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > > >>
> > > >> I think the key is the facet queries. Now, I’m talking from
> > > >> theory rather than diving into the code, but querying on
> > > >> a docValues=true, indexed=false field is really doing a
> > > >> search. And searching on a field like that is effectively
> > > >> analogous to a table scan. Even if somehow an internal
> > > >> structure would be constructed to deal with it, it would
> > > >> probably be on the heap, where you don’t want it.
> > > >>
> > > >> So the test would be to take the queries out and measure
> > > >> performance, but I think that’s the root issue here.
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >>> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> > > >>>
> > > >>> We have faceting fields that have been defined as indexed=false,
> > > >>> stored=false and docValues=true
> > > >>>
> > > >>> However we use a lot of subfacets  using  json facets and facet
> > ranges
> > > >>> 

Re: facets & docValues

2020-05-05 Thread Joel Bernstein
Have you configured static warming queries for the facets? This will warm
the cache structures for the facet fields. You just want to make sure you
commits are spaced far enough apart that the warming completes before a new
searcher starts warming.


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, May 4, 2020 at 10:27 AM Revas  wrote:

> Hi Erick, Thanks for the explanation and advise. With facet queries, does
> doc Values help at all ?
>
> 1) indexed=true, docValues=true =>  all facets
>
> 2)
>
>-  indexed=true , docValues=true => only for subfacets
>- inexed=true, docValues=false=> facet query
>- docValues=true, indexed=false=> term facets
>
>
>
> In case of 1 above, => Indexing slowed considerably. over all facet
> performance improved many fold
> In case of  2=>  over all performance showed only slight
> improvement
>
> Does that mean turning on docValues even for facet query helps improve the
> performance,  fetching from docValues for facet query is faster than
> fetching from stored fields ?
>
> Thanks
>
>
> On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson 
> wrote:
>
> > DocValues should help when faceting over fields, i.e. facet.field=blah.
> >
> > I would expect docValues to help with sub facets and, but don’t know
> > the code well enough to say definitely one way or the other.
> >
> > The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
> > turn docValues off. What that means is that if any operation tries to
> > uninvert
> > the index on the Java heap, you’ll get an exception like:
> > "can not sort on a field w/o docValues unless it is indexed=true
> > uninvertible=true and the type supports Uninversion:”
> >
> > See SOLR-12962
> >
> > Speed is only one issue. The entire point of docValues is to not
> “uninvert”
> > the field on the heap. This used to lead to very significant memory
> > pressure. So when turning docValues off, you run the risk of
> > reverting back to the old behavior and having unexpected memory
> > consumption, not to mention slowdowns when the uninversion
> > takes place.
> >
> > Also, unless your documents are very large, this is a tiny corpus. It can
> > be
> > quite hard to get realistic numbers, the signal gets lost in the noise.
> >
> > You should only shard when your individual query times exceed your
> > requirement. Say you have a 95%tile requirement of 1 second response
> time.
> >
> > Let’s further say that you can meet that requirement with 50
> > queries/second,
> > but when you get to 75 queries/second your response time exceeds your
> > requirements. Do NOT shard at this point. Add another replica instead.
> > Sharding adds inevitable overhead and should only be considered when
> > you can’t get adequate response time even under fairly light query loads
> > as a general rule.
> >
> > Best,
> > Erick
> >
> > > On Apr 16, 2020, at 12:08 PM, Revas  wrote:
> > >
> > > Hi Erick, You are correct, we have only about 1.8M documents so far and
> > > turning on the indexing on the facet fields helped improve the timings
> of
> > > the facet query a lot which has (sub facets and facet queries). So does
> > > docValues help at all for sub facets and facet query, our tests
> > > revealed further query time improvement when we turned off the
> docValues.
> > > is that the right approach?
> > >
> > > Currently we have only 1 shard and  we are thinking of scaling by
> > > increasing the number of shards when we see a deterioration on query
> > time.
> > > Any suggestions?
> > >
> > > Thanks.
> > >
> > >
> > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > >>
> > >> I think the key is the facet queries. Now, I’m talking from
> > >> theory rather than diving into the code, but querying on
> > >> a docValues=true, indexed=false field is really doing a
> > >> search. And searching on a field like that is effectively
> > >> analogous to a table scan. Even if somehow an internal
> > >> structure would be constructed to deal with it, it would
> > >> probably be on the heap, where you don’t want it.
> > >>
> > >> So the test would be to take the queries out and measure
> > >> performance, but I think that’s the root issue here.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>> On Apr 14, 2020, at 11:51 PM, Revas  wrote:
> > >>>
> > >>> We have faceting fields that have been defined as indexed=false,
> > >>> stored=false and docValues=true
> > >>>
> > >>> However we use a lot of subfacets  using  json facets and facet
> ranges
> > >>> using facet.queries. We see that after every soft-commit our
> > performance
> > >>> worsens and performs ideal between commits
> > >>>
> > >>> how is that docValue fields are affected by soft-commit and do we
> need
> > to
> > >>> enable indexing if we use subfacets and facet query to improve
> > >> performance?
> > >>>
> > >>> Tha
> > >>
> > >>
> >
> >
>


Re: Data Import Handler - Concurrent Entity Importing

2020-05-05 Thread Mikhail Khludnev
Hello, James.

DataImportHandler has a lock preventing concurrent execution. If you need
to run several imports in parallel at the same core, you need to duplicate
"/dataimport" handlers definition in solrconfig.xml. Thus, you can run them
in parallel. Regarding schema, I prefer the latter but mileage may vary.

--
Mikhail.

On Tue, May 5, 2020 at 6:39 PM James Greene 
wrote:

> Hello, I'm new to the group here so please excuse me if I do not have the
> etiquette down yet.
>
> Is it possible to have multiple entities (customer configurable, up to 40
> atm) in a DIH configuration to be imported at once?  Right now I have
> multiple root entities in my configuration but they get indexes
> sequentially and this means the entities that are last are always delayed
> hitting the index.
>
> I'm trying to migrate an existing setup (solr 6.6) that utilizes a
> different collection for each "entity type" into a single collection (solr
> 8.4) to get around some of the hurdles faced when needing to have searches
> that require multiple block joins and currently does not work going cross
> core.
>
> I'm also wondering if it is better to fully qualify a field name or use two
> different fields for performing the "same" search.  i.e:
>
>
> {
> type_A_status; Active
> type_A_value: Test
> }
> vs
> {
> type: A
> status: Active
> value: Test
> }
>


-- 
Sincerely yours
Mikhail Khludnev


Data Import Handler - Concurrent Entity Importing

2020-05-05 Thread James Greene
Hello, I'm new to the group here so please excuse me if I do not have the
etiquette down yet.

Is it possible to have multiple entities (customer configurable, up to 40
atm) in a DIH configuration to be imported at once?  Right now I have
multiple root entities in my configuration but they get indexes
sequentially and this means the entities that are last are always delayed
hitting the index.

I'm trying to migrate an existing setup (solr 6.6) that utilizes a
different collection for each "entity type" into a single collection (solr
8.4) to get around some of the hurdles faced when needing to have searches
that require multiple block joins and currently does not work going cross
core.

I'm also wondering if it is better to fully qualify a field name or use two
different fields for performing the "same" search.  i.e:


{
type_A_status; Active
type_A_value: Test
}
vs
{
type: A
status: Active
value: Test
}


solr core metrics & prometheus exporter - indexreader is closed

2020-05-05 Thread Richard Goodman
Hi there,

I've been playing with the prometheus exporter for solr, and have created
my config and have deployed it, so far, all groups were running fine (node,
jetty, jvm), however, I'm repeatedly getting an issue with the core group;

WARN  - 2020-05-05 12:01:24.812; org.apache.solr.prometheus.scraper.Async;
Error occurred during metrics collection
java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://127.0.0.1:8083/solr: Server Error

request:
http://127.0.0.1:8083/solr/admin/metrics?group=core=json=2.2
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
~[?:1.8.0_141]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
~[?:1.8.0_141]
at
org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
~[?:1.8.0_141]
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
~[?:1.8.0_141]
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
~[?:1.8.0_141]
at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
~[?:1.8.0_141]
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
~[?:1.8.0_141]
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
~[?:1.8.0_141]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
~[?:1.8.0_141]
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
~[?:1.8.0_141]
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
~[?:1.8.0_141]
at
org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
~[?:1.8.0_141]
at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
~[?:1.8.0_141]
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
~[?:1.8.0_141]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
~[?:1.8.0_141]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_141]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://127.0.0.1:8083/solr: Server Error

request:
http://127.0.0.1:8083/solr/admin/metrics?group=core=json=2.2
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
at
org.apache.solr.prometheus.scraper.SolrScraper.request(SolrScraper.java:102)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
at
org.apache.solr.prometheus.scraper.SolrCloudScraper.lambda$metricsForAllHosts$6(SolrCloudScraper.java:121)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
at
org.apache.solr.prometheus.scraper.SolrScraper.lambda$null$0(SolrScraper.java:81)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
~[?:1.8.0_141]

Re: gzip compression solr 8.4.1

2020-05-05 Thread Johannes Siegert
Hi,

We did further tests to see where the problem exactly is. These are our
outcomes:

The content-length is calculated correctly, a quick test with curl showed
this.
The problem is that the stream with the gzip data is not fully consumed and
afterwards not closed.

Using the debugger with a breakpoint at
org/apache/solr/common/util/Utils.java:575 shows that it won't enter the
function readFully((entity.getContent()) most likely due to how the gzip
stream content is wrapped and extracted beforehand.

On line org/apache/solr/common/util/Utils.java:582 the
consumeQuietly(entity) should close the stream but does not because of a
silent exception.

This seems to be the same as it is described in
https://issues.apache.org/jira/browse/SOLR-14457

We saw that the problem happened also with correct GZIP responses from
jetty. Not only with non-GZIP as described within the jira issue.

Best,

Johannes

Am Do., 23. Apr. 2020 um 09:55 Uhr schrieb Johannes Siegert <
johannes.sieg...@offerista.com>:

> Hi,
>
> we want to use gzip-compression between our application and the solr
> server.
>
> We use a standalone solr server version 8.4.1 and the prepackaged jetty as
> application server.
>
> We have enabled the jetty gzip module by adding these two files:
>
> {path_to_solr}/server/modules/gzip.mod (see below the question)
> {path_to_solr}/server/etc/jetty-gzip.xml (see below the question)
>
> Within the application we use a HttpSolrServer that is configured with
> allowCompression=true.
>
> After we had released our application we saw that the number of
> connections within the TCP-state CLOSE_WAIT rising up until the application
> was not able to open new connections.
>
>
> After a long debugging session we think the problem is that the header
> "Content-Length" that is returned by the jetty is sometimes wrong when
> gzip-compression is enabled.
>
> The solrj client uses a ContentLengthInputStream, that uses the header
> "Content-Lenght" to detect if all data was received. But the InputStream
> can not be fully consumed because the value of the header "Content-Lenght"
> is higher than the actual content-length.
>
> Usually the method PoolingHttpClientConnectionManager.releaseConnection is
> called after the InputStream was fully consumed. This give the connection
> free to be reused or to be closed by the application.
>
> Due to the incorrect header "Content-Length" the
> PoolingHttpClientConnectionManager.releaseConnection method is never called
> and the connection stays active. After the connection-timeout of the jetty
> is reached, it closes the connection from the server-side and the TCP-state
> switches into CLOSE_WAIT. The client never closes the connection and so the
> number of connections in use rises up.
>
>
> Currently we try to configure the jetty gzip module to return no
> "Content-Length" if gzip-compression was used. We hope that in this case
> another InputStream implementation is used that uses the NULL-terminator to
> see when the InputStream was fully consumed.
>
> Do you have any experiences with this problem or any suggestions for us?
>
> Thanks,
>
> Johannes
>
>
> gzip.mod
>
> -
>
> DO NOT EDIT - See:
> https://www.eclipse.org/jetty/documentation/current/startup-modules.html
>
> [description]
> Enable GzipHandler for dynamic gzip compression
> for the entire server.
>
> [tags]
> handler
>
> [depend]
> server
>
> [xml]
> etc/jetty-gzip.xml
>
> [ini-template]
> ## Minimum content length after which gzip is enabled
> jetty.gzip.minGzipSize=2048
>
> ## Check whether a file with *.gz extension exists
> jetty.gzip.checkGzExists=false
>
> ## Gzip compression level (-1 for default)
> jetty.gzip.compressionLevel=-1
>
> ## User agents for which gzip is disabled
> jetty.gzip.excludedUserAgent=.*MSIE.6\.0.*
>
> -
>
> jetty-gzip.xml
>
> -
>
> 
>  http://www.eclipse.org/jetty/configure_9_3.dtd;>
>
> 
> 
> 
> 
> 
> 
>
> 
> 
> 
>  class="org.eclipse.jetty.server.handler.gzip.GzipHandler">
> 
>  deprecated="gzip.minGzipSize" default="2048" />
> 
> 
>  deprecated="gzip.checkGzExists" default="false" />
> 
> 
>  deprecated="gzip.compressionLevel" default="-1" />
> 
> 
>  default="0" />
> 
> 
>  default="-1" />
> 
> 
>  />
> 
>
> 
> 
> 
>  deprecated="gzip.excludedUserAgent" default=".*MSIE.6\.0.*" />
> 
> 
> 
>
> 
>  default="GET,POST" />
>