ChildDocTransformer and export handler

2020-06-17 Thread Ludger Steens
Dear Community,





we are using the /export handler with Solr 7.7 to fetch a large number of
documents from Solr.

Recently we have extended our schema with Child Documents and now we are
wondering if/how it is possible to export parent documents together with
their corresponding Child Documents.

When using the /select handler this can be done with the
ChildDocTransformer (
https://lucene.apache.org/solr/guide/7_7/transforming-result-documents.html#child-childdoctransformerfactory
).

However, when using the export handle we get an error from Solr.



Our request:

{
  "query" : "*:*",
  "sort" : "id asc",
  "fields" : "id,[child parentFilter='-child_type:* *:*']"
}



The response from Solr:

{

  "responseHeader":{"status":400},

  "response":{

"numFound":0,

"docs":[{"EXCEPTION":"org.apache.solr.common.SolrException:
undefined field: \"[child parentFilter=\"-child_type:* *:*\"]\""}]}

}



Is it possible to get parent documents together with their corresponding
child documents?

If it is possible: What is the correct query?

If it is not possible: Can Streaming Expressions be used together with
child documents? As far as I understand they internally use the export
handler.



Thanks in advance for your help



Ludger


--

*„Beste Arbeitgeber ITK 2020“ - 1. Platz für QAware*
ausgezeichnet von Great Place to Work
<https://www.qaware.de/news/great-place-to-work-deutschlands-beste-arbeitgeber-2020/>
--

Ludger Steens
Softwarearchitekt

QAware GmbH
Aschauer Straße 32
81549 München, Germany
Mobil +49 175 7973969
ludger.ste...@qaware.de
www.qaware.de
--

Geschäftsführer: Christian Kamm, Johannes Weigend, Dr. Josef Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761


AW: Atomic updates with nested documents

2020-06-10 Thread Ludger Steens
Hi Adi,

thank you for your reply!  Although I have to admit that this is not the
response that I was hoping for .

Upgrading to Solr 8 is currently not possible for us because we found
multiple issues when doing so  (see
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/202005.mbox/%3Ce7dc73d4be2ac35404db0f6cfb75f905%40mail.gmail.com%3E).
We have now implemented a workaround and send the whole document with
ChildDocs to Solr instead of doing an atomic update. This works as expected
but is significantly slower.

Regards
Ludger

---
Beste Arbeitgeber ITK 2020 - 1. Platz für QAware
ausgezeichnet von Great Place to Work
---

Ludger Steens
Softwarearchitekt


QAware GmbH
Aschauer Straße 32
81549 München, Germany
Mobil +49 175 7973969
mailto:ludger.ste...@qaware.de
https://www.qaware.de


Geschäftsführer: Christian Kamm, Johannes Weigend, Dr. Josef Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761
---
-Ursprüngliche Nachricht-
Von: Kaminski, Adi 
Gesendet: Sonntag, 7. Juni 2020 08:45
An: solr-user@lucene.apache.org
Betreff: RE: Atomic updates with nested documents

Hi Ludger,
We had the same issue with Solr 7.6, and after discussing with the community
we've found out that this partial update of parent document without "harm"
parent-child association can work only on Solr 8.1 or higher, and It also
requires some prerequisites.

See the below item and it's last comments with details:
https://issues.apache.org/jira/browse/SOLR-12638?focusedCommentId=16894628=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16894628

Eventually we have move to Solr 8.3 and it's working there as expected with
the above mentioned changes.

Regards,
Adi

-Original Message-----
From: Ludger Steens 
Sent: Friday, June 5, 2020 3:24 PM
To: solr-user@lucene.apache.org
Subject: Atomic updates with nested documents

Dear Community,



I am using Solr 7.7 and I am wondering how it is possible to do a partial
update on nested documents / child documents.

Suppose I have committed the following documents to the index:

[

  {

"id": "1",

"testString": "1",

"testInt": "1",

"_childDocuments_": [

  {

"id": "1.1",

"child_type": "child_a",

"testString": "1.1",

"testInt": "1"

  },

  {

"id": "1.2",

"child_type": "child_a",

"testString": "1.1",

"testInt": "1"

  }

]

  }

]

 is id, all fields are indexed.



Now I want to update testInt to 2 on the parent document without losing the
parent child relation (ChildDocTransformerFactory should still produce
correct results).

I tried the following variants, both not successful:



*Variant 1:*

Sending the following update document to the update-Endpoint

[

  {

"id": "1",

"testInt": {

  "set": "2"

}

  }

]

The parent document is updated, but the ChildDocTransformerFactory does not
return any child documents



*Variant 2:*

Sending the following update document to the update-Endpoint

[

  {

"id": "1",

"testInt": {

  "set": "2"

},

"_childDocuments_": [

  {

"id": {

  "set": "1.1"

}

  },

  {

"id": {

  "set": "1.2"

}

  }

]

  }

]

Same result: Parent is updated, but ChildDocTransformerFactory does not
return any child documents





Is there any other way of doing a partial update without losing the parent
child relation?

Resending the complete document with all attributes and children would work
but is inefficient for us (we had to load all documents from Solr before
resending them).



Thanks in advance for your help



Ludger


--

*„Beste Arbeitgeber ITK 2020“ - 1. Platz für QAware* ausgezeichnet von Great
Place to Work
<https://www.qaware.de/news/great-place-to-work-deutschlands-beste-arbeitgeber-2020/>
--

Ludger Steens
Softwarearchitekt

QAware GmbH
Aschauer Straße 32
81549 München, Germany
Mobil +49 175 7973969
ludger.ste...@qaware.de
www.qaware.de
--

Geschäftsführer: Christian Kamm, Johannes Weigend, Dr. Josef Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761


This electronic message may contain proprietary and confidential information
of Verint Systems Inc., its affiliates and/or subsidiaries. The information
is intended to be for the use of the individual(s) or entity(ies) named
above. If you are not the intended recipient (or authorized to receive this
e-mail for the intended recipient), you may not use, copy, disclose or
distribute to anyone this message or any information contained in this
message. If you have received this electronic message in error, please
notify us by replying to this e-mail.


Atomic updates with nested documents

2020-06-05 Thread Ludger Steens
Dear Community,



I am using Solr 7.7 and I am wondering how it is possible to do a partial
update on nested documents / child documents.

Suppose I have committed the following documents to the index:

[

  {

"id": "1",

"testString": "1",

"testInt": "1",

"_childDocuments_": [

  {

"id": "1.1",

"child_type": "child_a",

"testString": "1.1",

"testInt": "1"

  },

  {

"id": "1.2",

"child_type": "child_a",

"testString": "1.1",

"testInt": "1"

  }

]

  }

]

 is id, all fields are indexed.



Now I want to update testInt to 2 on the parent document without losing the
parent child relation (ChildDocTransformerFactory should still produce
correct results).

I tried the following variants, both not successful:



*Variant 1:*

Sending the following update document to the update-Endpoint

[

  {

"id": "1",

"testInt": {

  "set": "2"

}

  }

]

The parent document is updated, but the ChildDocTransformerFactory does not
return any child documents



*Variant 2:*

Sending the following update document to the update-Endpoint

[

  {

"id": "1",

"testInt": {

  "set": "2"

},

"_childDocuments_": [

  {

"id": {

  "set": "1.1"

}

  },

  {

"id": {

  "set": "1.2"

}

  }

]

  }

]

Same result: Parent is updated, but ChildDocTransformerFactory does not
return any child documents





Is there any other way of doing a partial update without losing the parent
child relation?

Resending the complete document with all attributes and children would work
but is inefficient for us (we had to load all documents from Solr before
resending them).



Thanks in advance for your help



Ludger


--

*„Beste Arbeitgeber ITK 2020“ - 1. Platz für QAware*
ausgezeichnet von Great Place to Work
<https://www.qaware.de/news/great-place-to-work-deutschlands-beste-arbeitgeber-2020/>
--

Ludger Steens
Softwarearchitekt

QAware GmbH
Aschauer Straße 32
81549 München, Germany
Mobil +49 175 7973969
ludger.ste...@qaware.de
www.qaware.de
--

Geschäftsführer: Christian Kamm, Johannes Weigend, Dr. Josef Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761


Problems when Upgrading from Solr 7.7.1 to 8.5.0

2020-05-11 Thread Ludger Steens
Hi all,

we recently upgraded our SolrCloud cluster from version 7.7.1 to version
8.5.0 and ran into multiple problems.
In the end we had to revert the upgrade and went back to Solr 7.7.1.

In our company we are using Solr since Version 4 and so far, upgrading
Solr to a newer version was possible without any problems.
We are curious if others are experiencing the same kind of problems and if
these are some known issues. Or maybe we did something wrong and missed
something when upgrading?


1. Network issues when indexing documents
===

Our collection contains roughly 150 million documents.  When we re-created
the collection and re-indexed all documents, we regularly experienced
network problems that causes our loader application to fail.
The Solr log always contains an IOException Exception:

ERROR
(updateExecutor-5-thread-1338-processing-x:PSMG_CI_2020_04_15_10_07_04_sha
rd6_replica_n22 r:core_node25 null n:solr2:8983_solr
c:PSMG_CI_2020_04_15_10_07_04 s:shard6) [c:PSMG_CI_2020_04_15_10_07_04
s:shard6 r:core_node25 x:PSMG_CI_2020_04_15_10_07_04_shard6_replica_n22]
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
SolrCmdDistributor$Req: cmd=add{,id=(null)}; node=StdNode:
http://solr1:8983/solr/PSMG_CI_2020_04_15_10_07_04_shard6_replica_n20/ to
http://solr1:8983/solr/PSMG_CI_2020_04_15_10_07_04_shard6_replica_n20/ =>
java.io.IOException: java.io.IOException: cancel_stream_error
 at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
tProvider.java:197)
 java.io.IOException: java.io.IOException: cancel_stream_error
 at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
tProvider.java:197) ~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
 at
org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputSt
ream.flush(OutputStreamContentProvider.java:151)
~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
 at
org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputSt
ream.write(OutputStreamContentProvider.java:145)
~[jetty-client-9.4.24.v20191120.jar:9.4.24.v20191120]
 at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:2
16) ~[solr-solrj-8.5.0.jar:8.5.0 7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42
- romseygeek - 2020-03-1309:38:26]
 at
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.
java:209) ~[solr-solrj-8.5.0.jar:8.5.0
7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42 - romseygeek - 202003-13
09:38:26]
 at
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:172)
~[solr-solrj-8.5.0.jar:8.5.0 7ac489bf7b97b61749b19fa2ee0dc46e74b8dc42 -
romseygeek - 2020-03-13 09:3826]

After the Exception the collection usually was in a degraded state for
some time and shards try to recover and sync with the leader.

In the Solr changelog we saw that one major change from 7.x to 8.x was
that Solr now uses HTTP/2 instead of HTTP/1.1. So we tried to disable
HTTP/2 by setting the system property solr.http1=true.
That did make the indexing process a LOT more stable but we still saw a
IOExceptions from time to time. Disabling HTTP/2 did not completely fix
the problem.

ERROR
(updateExecutor-5-thread-9310-processing-x:PSMG_BOM_2020_04_28_05_00_11_sh
ard7_replica_n24 r:core_node27 null n:solr3:8983_solr
c:PSMG_BOM_2020_04_28_05_00_11 s:shard7) [c:PSMG_BOM_2020_04_28_05_00_11
s:shard7 r:core_node27 x:PSMG_BOM_2020_04_28_05_00_11_shard7_replica_n24]
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
SolrCmdDistributor$Req: cmd=add{,id=5141653a-e33a-4b60-856d-7aa2ce73dee7};
node=ForwardNode:
http://solr2:8983/solr/PSMG_BOM_2020_04_28_05_00_11_shard6_replica_n22/ to
http://solr2:8983/solr/PSMG_BOM_2020_04_28_05_00_11_shard6_replica_n22/ =>
java.io.IOException: java.io.EOFException:
HttpConnectionOverHTTP@9dc7ad1::SocketChannelEndPoint@2d20213b{solr2/10.0.
0.216:8983<->/10.0.0.193:38728,ISHUT,fill=-,flush=-,to=5/60}{io=0/0,ki
o=0,kro=1}->HttpConnectionOverHTTP@9dc7ad1(l:/10.0.0.193:38728 <->
r:solr2/10.0.0.216:8983,closed=false)=>HttpChannelOverHTTP@47a242c3(exchan
ge=HttpExchange@6ffd260f req=PENDING/null@null
res=PENDING/null@null)[send=HttpSenderOverHTTP@17e056f9(req=CONTENT,snd=ID
LE,failure=null)[HttpGenerator@3b6594c7{s=COMMITTED}],recv=HttpReceiverOve
rHTTP@6e847d32(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0 of -1}]]
at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredConten
tProvider.java:197)
java.io.IOException: java.io.EOFException:
HttpConnectionOverHTTP@9dc7ad1::SocketChannelEndPoint@2d20213b{solr2/10.0.
0.216:8983<->/10.0.0.193:38728,ISHUT,fill=-,flush=-,to=5/60}{io=0/0,ki
o=0,kro=1}->HttpConnectionOverHTTP@9dc7ad1(l:/10.0.0.193:38728 <->
r:solr2/10.0.0.216:8983,closed=false)=>HttpChannelOverHTTP@47a242c3(exchan
ge=HttpExchange@6ffd260f req=PENDING/null@null
res=PENDING/null@null)[send=HttpSenderOverHTTP@17e056f9(req=CONTENT,snd=ID