Re: Solr 8.6.1: Can't round-trip nested document from SolrJ
Hi Alex, Currently, Fixing the documentation for nested docs is under progress. More context is available in this JIRA - https://issues.apache.org/jira/browse/SOLR-14383. https://github.com/arafalov/SolrJTest/blob/master/src/com/solrstart/solrj/Main.java The child doc transformer needs to be specified as part of the fl parameter like fl=*,[child] so that the descendants are returned for each matching doc. As the query q=* matches all the documents, they are returned. If only parent doc needs to be returned with descendants then, we should either use block join query or query clause which matches only parent doc. Another thing I noticed in the code is that the child docs are indexed as anonymous docs (similar to old syntax) instead of indexing them in the new syntax. With this, the nested block will be indexed but since the schema has _nested_path_ defined [child] doc transformer won't return any docs. Anonymous child docs need parentFilter but specifying parentFilter with _nested_path_ will lead to error It is due to this check - https://github.com/apache/lucene-solr/blob/1c8f4c988a07b08f83d85e27e59b43eed5e2ca2a/solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java#L104 Instead of indexing the docs this way, > SolrInputDocument parent1 = new SolrInputDocument(); > parent1.addField("id", "p1"); > parent1.addField("name", "parent1"); > parent1.addField("class", "foo.bar.parent1"); > > SolrInputDocument child1 = new SolrInputDocument(); > > parent1.addChildDocument(child1); > child1.addField("id", "c1"); > child1.addField("name", "child1"); > child1.addField("class", "foo.bar.child1"); > > modify it to indexing > SolrInputDocument parent1 = new SolrInputDocument(); > parent1.addField("id", "p1"); > parent1.addField("name", "parent1"); > parent1.addField("class", "foo.bar.parent1"); > > SolrInputDocument child1 = new SolrInputDocument(); > > parent1.addField("sometag", Arrays.asList(child1)); > child1.addField("id", "c1"); > child1.addField("name", "child1"); > child1.addField("class", "foo.bar.child1"); > > I think, once the documentation fixes get merged to master, indexing and searching with the nested documents will become much clearer. Regards, Munendra S N On Sun, Aug 23, 2020 at 5:18 AM Alexandre Rafalovitch wrote: > Hello, > > I am trying to get up to date with both SolrJ and Nested Document > implementation and not sure where I am failing with a basic test > ( > https://github.com/arafalov/SolrJTest/blob/master/src/com/solrstart/solrj/Main.java > ). > > I am using Solr 8.6.1 with a core created with bin/solr create -c > solrj (schemaless is still on). > > I then index a nested parent/child/grandchild document in and then > query it back. Looking at debug it seems to go out fine as a nested > doc but come back as a 3 individual ones. > > Output is: > SolrInputDocument(fields: [id=p1, name=parent1, > class=foo.bar.parent1], children: [SolrInputDocument(fields: [id=c1, > name=child1, class=foo.bar.child1], children: > [SolrInputDocument(fields: [id=gc1, name=grandChild1, > class=foo.bar.grandchild1])])]) > > {responseHeader={status=0,QTime=1,params={q=*,wt=javabin,version=2}},response={numFound=3,numFoundExact=true,start=0,docs=[SolrDocument{id=gc1, > name=[grandChild1], class=[foo.bar.grandchild1], > _version_=1675769219435724800}, SolrDocument{id=c1, name=[child1], > class=[foo.bar.child1], _version_=1675769219435724800}, > SolrDocument{id=p1, name=[parent1], class=[foo.bar.parent1], > _version_=1675769219435724800}]}} > Found 3 documents > > Field: 'id' => 'gc1' > Field: 'name' => '[grandChild1]' > Field: 'class' => '[foo.bar.grandchild1]' > Field: '_version_' => '1675769219435724800' > Children: false > > Field: 'id' => 'c1' > Field: 'name' => '[child1]' > Field: 'class' => '[foo.bar.child1]' > Field: '_version_' => '1675769219435724800' > Children: false > > Field: 'id' => 'p1' > Field: 'name' => '[parent1]' > Field: 'class' => '[foo.bar.parent1]' > Field: '_version_' => '1675769219435724800' > Children: false > > Looking in Admin UI: > * _root_ element is there and has 3 instances of 'p1' value > * _nest_path_ (of type _nest_path_ !?!) is also there but is not populated > * _nest_parent_ is not there > > I am not quite sure what that means and what other scheme modification > (to the _default_) I need to do to get it to work. > > I also tried to reproduce the example in the documentation (e.g. > https://lucene.apache.org/solr/guide/8_6/indexing-nested-documents.html > and > https://lucene.apache.org/solr/guide/8_6/searching-nested-documents.html#searching-nested-documents > ) > but both seem to also want some undiscussed schema (e.g. with ID field > instead of id) and fail to execute against default schema. > > I am kind of stuck. Anybody has a working SolrJ/Nested example or > ideas of what I missed. > > Regards, >Alex. >
Solr 8.6.1: Can't round-trip nested document from SolrJ
Hello, I am trying to get up to date with both SolrJ and Nested Document implementation and not sure where I am failing with a basic test (https://github.com/arafalov/SolrJTest/blob/master/src/com/solrstart/solrj/Main.java). I am using Solr 8.6.1 with a core created with bin/solr create -c solrj (schemaless is still on). I then index a nested parent/child/grandchild document in and then query it back. Looking at debug it seems to go out fine as a nested doc but come back as a 3 individual ones. Output is: SolrInputDocument(fields: [id=p1, name=parent1, class=foo.bar.parent1], children: [SolrInputDocument(fields: [id=c1, name=child1, class=foo.bar.child1], children: [SolrInputDocument(fields: [id=gc1, name=grandChild1, class=foo.bar.grandchild1])])]) {responseHeader={status=0,QTime=1,params={q=*,wt=javabin,version=2}},response={numFound=3,numFoundExact=true,start=0,docs=[SolrDocument{id=gc1, name=[grandChild1], class=[foo.bar.grandchild1], _version_=1675769219435724800}, SolrDocument{id=c1, name=[child1], class=[foo.bar.child1], _version_=1675769219435724800}, SolrDocument{id=p1, name=[parent1], class=[foo.bar.parent1], _version_=1675769219435724800}]}} Found 3 documents Field: 'id' => 'gc1' Field: 'name' => '[grandChild1]' Field: 'class' => '[foo.bar.grandchild1]' Field: '_version_' => '1675769219435724800' Children: false Field: 'id' => 'c1' Field: 'name' => '[child1]' Field: 'class' => '[foo.bar.child1]' Field: '_version_' => '1675769219435724800' Children: false Field: 'id' => 'p1' Field: 'name' => '[parent1]' Field: 'class' => '[foo.bar.parent1]' Field: '_version_' => '1675769219435724800' Children: false Looking in Admin UI: * _root_ element is there and has 3 instances of 'p1' value * _nest_path_ (of type _nest_path_ !?!) is also there but is not populated * _nest_parent_ is not there I am not quite sure what that means and what other scheme modification (to the _default_) I need to do to get it to work. I also tried to reproduce the example in the documentation (e.g. https://lucene.apache.org/solr/guide/8_6/indexing-nested-documents.html and https://lucene.apache.org/solr/guide/8_6/searching-nested-documents.html#searching-nested-documents) but both seem to also want some undiscussed schema (e.g. with ID field instead of id) and fail to execute against default schema. I am kind of stuck. Anybody has a working SolrJ/Nested example or ideas of what I missed. Regards, Alex.
Re: All cores gone along with all solr configuration upon reboot
Autopurge shouldn’t matter, that’s just cleaning up old snapshots. That is, it should be configured, but having it enabled or not should have no bearing on your data disappearing. Also, are you absolutely certain that you are using your external ZK? Check the port on the admin screen. 9983 is the default for embededded ZK. All that said, nothing in Solr just deletes all this. The fact that you only saw this on reboot is highly suspicious, some external-to-Solr process, anything from a startup script to restoring a disk image to…. is removing that data I suspect. Best, Erick > On Aug 22, 2020, at 9:24 AM, yaswanth kumar wrote: > > Thanks Eric for looking into this.. > > But as I said before I confirmed that the paths in zookeeper were changed to > local path than the /tmp that comes default with package. Does the zoo.cfg > need to have autopurge settings ??which I don’t have in my config > > Also I did make sure that zoo.cfg inside solr and my external zoo are > pointing to the same and have same configs if it matters. > > Sent from my iPhone > >> On Aug 22, 2020, at 9:07 AM, Erick Erickson wrote: >> >> Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults to >> putting its data in /tmp/zookeeper, see the zookeeper config file. And, of >> course, when you reboot it goes away. >> >> I’ve always disliked this, but the Zookeeper folks did it that way. So if >> you just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under >> Solr’s control. >> >> As for how to recover, assuming you put your configsets in some kind of >> version control as we recommend: >> >> 0> set up Zookeeper to keep it’s data somewhere permanent. You may want to >> archive snapshots upon occasion as well. >> >> 1> save away the data directory for _one_ replica from each shard of every >> collection somewhere. You should have a bunch of directories like >> SOLR_HOME/…./collection1_shard1_replica_n1/data. >> >> 2> recreate all your collections with leader-only new collections with the >> exact same number of shards, i.e. shards with only a single replica. >> >> 3> shut down all your Solr instances >> >> 4> copy the data directories you saved in <2>. You _MUST_ copy to >> corresponding shards. The important bit is that a data directory from >> collection1_shard1 goes back to collection1_shard1. If you copy it back to >> collection1_shard2 Bad Things Happen. Actually, I’d delete the target data >> directories first and then copy. >> >> 5> restart your Solr instances and verify they look OK. >> >> 6> use the collections API ADDREPLICA to build out your collections. >> >> Best, >> Erick >> >>> On Aug 22, 2020, at 12:10 AM, yaswanth kumar wrote: >>> >>> Can someone help me on the below issue?? >>> >>> I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes >>> >>> All the configs were pushed initially and Also Indexed all the data into >>> multiple collections with 3 replicas on each collection >>> >>> Now part of server maintenance these solr nodes were restarted and once >>> they came back solr could became empty.. lost all the collections .. all >>> collections specific instance directories in the path /solr/server/solr >>> Were deleted ..but data folders are intact nothing lost.. not really sure >>> on how to recover from this situation. >>> >>> Did make sure that the zoo.cfg was properly configured (permanent paths for >>> zoo data and logs instead of /tmp )as I am using external zoo instead of >>> the one that comes with solr. >>> >>> Solr data path is a nas storage which is a common for all three solr nodes >>> >>> Another data point is that I enabled solr basic authentication as well if >>> that’s making any difference. Even clusterstate , schema’s, security Json >>> were all lost.. really looking for a help in understanding to prevent this >>> happening again. >>> >>> Sent from my iPhone >>
Re: All cores gone along with all solr configuration upon reboot
Thanks Eric for looking into this.. But as I said before I confirmed that the paths in zookeeper were changed to local path than the /tmp that comes default with package. Does the zoo.cfg need to have autopurge settings ??which I don’t have in my config Also I did make sure that zoo.cfg inside solr and my external zoo are pointing to the same and have same configs if it matters. Sent from my iPhone > On Aug 22, 2020, at 9:07 AM, Erick Erickson wrote: > > Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults to > putting its data in /tmp/zookeeper, see the zookeeper config file. And, of > course, when you reboot it goes away. > > I’ve always disliked this, but the Zookeeper folks did it that way. So if you > just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under Solr’s > control. > > As for how to recover, assuming you put your configsets in some kind of > version control as we recommend: > > 0> set up Zookeeper to keep it’s data somewhere permanent. You may want to > archive snapshots upon occasion as well. > > 1> save away the data directory for _one_ replica from each shard of every > collection somewhere. You should have a bunch of directories like > SOLR_HOME/…./collection1_shard1_replica_n1/data. > > 2> recreate all your collections with leader-only new collections with the > exact same number of shards, i.e. shards with only a single replica. > > 3> shut down all your Solr instances > > 4> copy the data directories you saved in <2>. You _MUST_ copy to > corresponding shards. The important bit is that a data directory from > collection1_shard1 goes back to collection1_shard1. If you copy it back to > collection1_shard2 Bad Things Happen. Actually, I’d delete the target data > directories first and then copy. > > 5> restart your Solr instances and verify they look OK. > > 6> use the collections API ADDREPLICA to build out your collections. > > Best, > Erick > >> On Aug 22, 2020, at 12:10 AM, yaswanth kumar wrote: >> >> Can someone help me on the below issue?? >> >> I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes >> >> All the configs were pushed initially and Also Indexed all the data into >> multiple collections with 3 replicas on each collection >> >> Now part of server maintenance these solr nodes were restarted and once they >> came back solr could became empty.. lost all the collections .. all >> collections specific instance directories in the path /solr/server/solr >> Were deleted ..but data folders are intact nothing lost.. not really sure on >> how to recover from this situation. >> >> Did make sure that the zoo.cfg was properly configured (permanent paths for >> zoo data and logs instead of /tmp )as I am using external zoo instead of the >> one that comes with solr. >> >> Solr data path is a nas storage which is a common for all three solr nodes >> >> Another data point is that I enabled solr basic authentication as well if >> that’s making any difference. Even clusterstate , schema’s, security Json >> were all lost.. really looking for a help in understanding to prevent this >> happening again. >> >> Sent from my iPhone >
Re: All cores gone along with all solr configuration upon reboot
Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults to putting its data in /tmp/zookeeper, see the zookeeper config file. And, of course, when you reboot it goes away. I’ve always disliked this, but the Zookeeper folks did it that way. So if you just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under Solr’s control. As for how to recover, assuming you put your configsets in some kind of version control as we recommend: 0> set up Zookeeper to keep it’s data somewhere permanent. You may want to archive snapshots upon occasion as well. 1> save away the data directory for _one_ replica from each shard of every collection somewhere. You should have a bunch of directories like SOLR_HOME/…./collection1_shard1_replica_n1/data. 2> recreate all your collections with leader-only new collections with the exact same number of shards, i.e. shards with only a single replica. 3> shut down all your Solr instances 4> copy the data directories you saved in <2>. You _MUST_ copy to corresponding shards. The important bit is that a data directory from collection1_shard1 goes back to collection1_shard1. If you copy it back to collection1_shard2 Bad Things Happen. Actually, I’d delete the target data directories first and then copy. 5> restart your Solr instances and verify they look OK. 6> use the collections API ADDREPLICA to build out your collections. Best, Erick > On Aug 22, 2020, at 12:10 AM, yaswanth kumar wrote: > > Can someone help me on the below issue?? > > I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes > > All the configs were pushed initially and Also Indexed all the data into > multiple collections with 3 replicas on each collection > > Now part of server maintenance these solr nodes were restarted and once they > came back solr could became empty.. lost all the collections .. all > collections specific instance directories in the path /solr/server/solr Were > deleted ..but data folders are intact nothing lost.. not really sure on how > to recover from this situation. > > Did make sure that the zoo.cfg was properly configured (permanent paths for > zoo data and logs instead of /tmp )as I am using external zoo instead of the > one that comes with solr. > > Solr data path is a nas storage which is a common for all three solr nodes > > Another data point is that I enabled solr basic authentication as well if > that’s making any difference. Even clusterstate , schema’s, security Json > were all lost.. really looking for a help in understanding to prevent this > happening again. > > Sent from my iPhone