Re: Solr 8.6.1: Can't round-trip nested document from SolrJ

2020-08-22 Thread Munendra S N
Hi Alex,

Currently, Fixing the documentation for nested docs is under progress. More
context is available in this JIRA -
https://issues.apache.org/jira/browse/SOLR-14383.

https://github.com/arafalov/SolrJTest/blob/master/src/com/solrstart/solrj/Main.java

The child doc transformer needs to be specified as part of the fl parameter
like fl=*,[child] so that the descendants are returned for each matching
doc. As the query q=* matches all the documents, they are returned. If only
parent doc needs to be returned with descendants then, we should either use
block join query or query clause which matches only parent doc.

Another thing I noticed in the code is that the child docs are indexed as
anonymous docs (similar to old syntax) instead of indexing them in the new
syntax. With this, the nested block will be indexed but since the schema
has _nested_path_ defined [child] doc transformer won't return any docs.
Anonymous child docs need parentFilter but specifying parentFilter with
_nested_path_ will lead to error
It is due to this check -
https://github.com/apache/lucene-solr/blob/1c8f4c988a07b08f83d85e27e59b43eed5e2ca2a/solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java#L104

Instead of indexing the docs this way,

> SolrInputDocument parent1 = new SolrInputDocument();
> parent1.addField("id", "p1");
> parent1.addField("name", "parent1");
> parent1.addField("class", "foo.bar.parent1");
>
> SolrInputDocument child1 = new SolrInputDocument();
>
> parent1.addChildDocument(child1);
> child1.addField("id", "c1");
> child1.addField("name", "child1");
> child1.addField("class", "foo.bar.child1");
>
>
modify it to indexing

> SolrInputDocument parent1 = new SolrInputDocument();
> parent1.addField("id", "p1");
> parent1.addField("name", "parent1");
> parent1.addField("class", "foo.bar.parent1");
>
> SolrInputDocument child1 = new SolrInputDocument();
>
> parent1.addField("sometag", Arrays.asList(child1));
> child1.addField("id", "c1");
> child1.addField("name", "child1");
> child1.addField("class", "foo.bar.child1");
>
> I think, once the documentation fixes get merged to master, indexing and
searching with the nested documents will become much clearer.

Regards,
Munendra S N



On Sun, Aug 23, 2020 at 5:18 AM Alexandre Rafalovitch 
wrote:

> Hello,
>
> I am trying to get up to date with both SolrJ and Nested Document
> implementation and not sure where I am failing with a basic test
> (
> https://github.com/arafalov/SolrJTest/blob/master/src/com/solrstart/solrj/Main.java
> ).
>
> I am using Solr 8.6.1 with a core created with bin/solr create -c
> solrj (schemaless is still on).
>
> I then index a nested parent/child/grandchild document in and then
> query it back. Looking at debug it seems to go out fine as a nested
> doc but come back as a 3 individual ones.
>
> Output is:
> SolrInputDocument(fields: [id=p1, name=parent1,
> class=foo.bar.parent1], children: [SolrInputDocument(fields: [id=c1,
> name=child1, class=foo.bar.child1], children:
> [SolrInputDocument(fields: [id=gc1, name=grandChild1,
> class=foo.bar.grandchild1])])])
>
> {responseHeader={status=0,QTime=1,params={q=*,wt=javabin,version=2}},response={numFound=3,numFoundExact=true,start=0,docs=[SolrDocument{id=gc1,
> name=[grandChild1], class=[foo.bar.grandchild1],
> _version_=1675769219435724800}, SolrDocument{id=c1, name=[child1],
> class=[foo.bar.child1], _version_=1675769219435724800},
> SolrDocument{id=p1, name=[parent1], class=[foo.bar.parent1],
> _version_=1675769219435724800}]}}
> Found 3 documents
>
> Field: 'id' => 'gc1'
> Field: 'name' => '[grandChild1]'
> Field: 'class' => '[foo.bar.grandchild1]'
> Field: '_version_' => '1675769219435724800'
> Children: false
>
> Field: 'id' => 'c1'
> Field: 'name' => '[child1]'
> Field: 'class' => '[foo.bar.child1]'
> Field: '_version_' => '1675769219435724800'
> Children: false
>
> Field: 'id' => 'p1'
> Field: 'name' => '[parent1]'
> Field: 'class' => '[foo.bar.parent1]'
> Field: '_version_' => '1675769219435724800'
> Children: false
>
> Looking in Admin UI:
> * _root_ element is there and has 3 instances of 'p1' value
> * _nest_path_ (of type _nest_path_ !?!) is also there but is not populated
> * _nest_parent_ is not there
>
> I am not quite sure what that means and what other scheme modification
> (to the _default_) I need to do to get it to work.
>
> I also tried to reproduce the example in the documentation (e.g.
> https://lucene.apache.org/solr/guide/8_6/indexing-nested-documents.html
> and
> https://lucene.apache.org/solr/guide/8_6/searching-nested-documents.html#searching-nested-documents
> )
> but both seem to also want some undiscussed schema (e.g. with ID field
> instead of id) and fail to execute against default schema.
>
> I am kind of stuck. Anybody has a working SolrJ/Nested example or
> ideas of what I missed.
>
> Regards,
>Alex.
>


Solr 8.6.1: Can't round-trip nested document from SolrJ

2020-08-22 Thread Alexandre Rafalovitch
Hello,

I am trying to get up to date with both SolrJ and Nested Document
implementation and not sure where I am failing with a basic test
(https://github.com/arafalov/SolrJTest/blob/master/src/com/solrstart/solrj/Main.java).

I am using Solr 8.6.1 with a core created with bin/solr create -c
solrj (schemaless is still on).

I then index a nested parent/child/grandchild document in and then
query it back. Looking at debug it seems to go out fine as a nested
doc but come back as a 3 individual ones.

Output is:
SolrInputDocument(fields: [id=p1, name=parent1,
class=foo.bar.parent1], children: [SolrInputDocument(fields: [id=c1,
name=child1, class=foo.bar.child1], children:
[SolrInputDocument(fields: [id=gc1, name=grandChild1,
class=foo.bar.grandchild1])])])
{responseHeader={status=0,QTime=1,params={q=*,wt=javabin,version=2}},response={numFound=3,numFoundExact=true,start=0,docs=[SolrDocument{id=gc1,
name=[grandChild1], class=[foo.bar.grandchild1],
_version_=1675769219435724800}, SolrDocument{id=c1, name=[child1],
class=[foo.bar.child1], _version_=1675769219435724800},
SolrDocument{id=p1, name=[parent1], class=[foo.bar.parent1],
_version_=1675769219435724800}]}}
Found 3 documents

Field: 'id' => 'gc1'
Field: 'name' => '[grandChild1]'
Field: 'class' => '[foo.bar.grandchild1]'
Field: '_version_' => '1675769219435724800'
Children: false

Field: 'id' => 'c1'
Field: 'name' => '[child1]'
Field: 'class' => '[foo.bar.child1]'
Field: '_version_' => '1675769219435724800'
Children: false

Field: 'id' => 'p1'
Field: 'name' => '[parent1]'
Field: 'class' => '[foo.bar.parent1]'
Field: '_version_' => '1675769219435724800'
Children: false

Looking in Admin UI:
* _root_ element is there and has 3 instances of 'p1' value
* _nest_path_ (of type _nest_path_ !?!) is also there but is not populated
* _nest_parent_ is not there

I am not quite sure what that means and what other scheme modification
(to the _default_) I need to do to get it to work.

I also tried to reproduce the example in the documentation (e.g.
https://lucene.apache.org/solr/guide/8_6/indexing-nested-documents.html
and  
https://lucene.apache.org/solr/guide/8_6/searching-nested-documents.html#searching-nested-documents)
but both seem to also want some undiscussed schema (e.g. with ID field
instead of id) and fail to execute against default schema.

I am kind of stuck. Anybody has a working SolrJ/Nested example or
ideas of what I missed.

Regards,
   Alex.


Re: All cores gone along with all solr configuration upon reboot

2020-08-22 Thread Erick Erickson
Autopurge shouldn’t matter, that’s just cleaning up old snapshots. That is, it 
should be configured, but having it enabled or not should have no bearing on 
your data disappearing.

Also, are you absolutely certain that you are using your external ZK? Check the 
port on the admin screen. 9983 is the default for embededded ZK.

All that said, nothing in Solr just deletes all this. The fact that you only 
saw this on reboot is highly suspicious, some external-to-Solr process, 
anything from a startup script to restoring a disk image to…. is removing that 
data I suspect.

Best,
Erick

> On Aug 22, 2020, at 9:24 AM, yaswanth kumar  wrote:
> 
> Thanks Eric for looking into this..
> 
> But as I said before I confirmed that the paths in zookeeper were changed to 
> local path than the /tmp that comes default with package. Does the zoo.cfg 
> need to have autopurge settings ??which I don’t have in my config
> 
> Also I did make sure that zoo.cfg inside solr and my external zoo are 
> pointing to the same and have same configs if it matters.
> 
> Sent from my iPhone
> 
>> On Aug 22, 2020, at 9:07 AM, Erick Erickson  wrote:
>> 
>> Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults to 
>> putting its data in /tmp/zookeeper, see the zookeeper config file. And, of 
>> course, when you reboot it goes away.
>> 
>> I’ve always disliked this, but the Zookeeper folks did it that way. So if 
>> you just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under 
>> Solr’s control.
>> 
>> As for how to recover, assuming you put your configsets in some kind of 
>> version control as we recommend:
>> 
>> 0> set up Zookeeper to keep it’s data somewhere permanent. You may want to 
>> archive snapshots upon occasion as well.
>> 
>> 1> save away the data directory for _one_ replica from each shard of every 
>> collection somewhere. You should have a bunch of directories like 
>> SOLR_HOME/…./collection1_shard1_replica_n1/data.
>> 
>> 2> recreate all your collections with leader-only new collections with the 
>> exact same number of shards, i.e. shards with only a single replica.
>> 
>> 3> shut down all your Solr instances
>> 
>> 4> copy the data directories you saved in <2>. You _MUST_ copy to 
>> corresponding shards. The important bit is that a data directory from 
>> collection1_shard1 goes back to collection1_shard1. If you copy it back to 
>> collection1_shard2 Bad Things Happen. Actually, I’d delete the target data 
>> directories first and then copy.
>> 
>> 5> restart your Solr instances and verify they look OK.
>> 
>> 6> use the collections API ADDREPLICA to build out your collections.
>> 
>> Best,
>> Erick
>> 
>>> On Aug 22, 2020, at 12:10 AM, yaswanth kumar  wrote:
>>> 
>>> Can someone help me on the below issue??
>>> 
>>> I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes
>>> 
>>> All the configs were pushed initially and Also Indexed all the data into 
>>> multiple collections with 3 replicas on each collection 
>>> 
>>> Now part of server maintenance these solr nodes were restarted and once 
>>> they came back solr could became empty.. lost all the collections .. all 
>>> collections specific instance directories  in the path /solr/server/solr 
>>> Were deleted ..but data folders are intact nothing lost.. not really sure 
>>> on how to recover from this situation.
>>> 
>>> Did make sure that the zoo.cfg was properly configured (permanent paths for 
>>> zoo data and logs instead of /tmp )as I am using external zoo instead of 
>>> the one that comes with solr.
>>> 
>>> Solr data path is a nas storage which is a common for all three solr nodes
>>> 
>>> Another data point is that I enabled solr basic authentication as well if 
>>> that’s making any difference. Even clusterstate , schema’s, security Json 
>>> were all lost.. really looking for a help in understanding to prevent this 
>>> happening again.
>>> 
>>> Sent from my iPhone
>> 



Re: All cores gone along with all solr configuration upon reboot

2020-08-22 Thread yaswanth kumar
Thanks Eric for looking into this..

But as I said before I confirmed that the paths in zookeeper were changed to 
local path than the /tmp that comes default with package. Does the zoo.cfg need 
to have autopurge settings ??which I don’t have in my config

Also I did make sure that zoo.cfg inside solr and my external zoo are pointing 
to the same and have same configs if it matters.

Sent from my iPhone

> On Aug 22, 2020, at 9:07 AM, Erick Erickson  wrote:
> 
> Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults to 
> putting its data in /tmp/zookeeper, see the zookeeper config file. And, of 
> course, when you reboot it goes away.
> 
> I’ve always disliked this, but the Zookeeper folks did it that way. So if you 
> just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under Solr’s 
> control.
> 
> As for how to recover, assuming you put your configsets in some kind of 
> version control as we recommend:
> 
> 0> set up Zookeeper to keep it’s data somewhere permanent. You may want to 
> archive snapshots upon occasion as well.
> 
> 1> save away the data directory for _one_ replica from each shard of every 
> collection somewhere. You should have a bunch of directories like 
> SOLR_HOME/…./collection1_shard1_replica_n1/data.
> 
> 2> recreate all your collections with leader-only new collections with the 
> exact same number of shards, i.e. shards with only a single replica.
> 
> 3> shut down all your Solr instances
> 
> 4> copy the data directories you saved in <2>. You _MUST_ copy to 
> corresponding shards. The important bit is that a data directory from 
> collection1_shard1 goes back to collection1_shard1. If you copy it back to 
> collection1_shard2 Bad Things Happen. Actually, I’d delete the target data 
> directories first and then copy.
> 
> 5> restart your Solr instances and verify they look OK.
> 
> 6> use the collections API ADDREPLICA to build out your collections.
> 
> Best,
> Erick
> 
>> On Aug 22, 2020, at 12:10 AM, yaswanth kumar  wrote:
>> 
>> Can someone help me on the below issue??
>> 
>> I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes
>> 
>> All the configs were pushed initially and Also Indexed all the data into 
>> multiple collections with 3 replicas on each collection 
>> 
>> Now part of server maintenance these solr nodes were restarted and once they 
>> came back solr could became empty.. lost all the collections .. all 
>> collections specific instance directories  in the path /solr/server/solr 
>> Were deleted ..but data folders are intact nothing lost.. not really sure on 
>> how to recover from this situation.
>> 
>> Did make sure that the zoo.cfg was properly configured (permanent paths for 
>> zoo data and logs instead of /tmp )as I am using external zoo instead of the 
>> one that comes with solr.
>> 
>> Solr data path is a nas storage which is a common for all three solr nodes
>> 
>> Another data point is that I enabled solr basic authentication as well if 
>> that’s making any difference. Even clusterstate , schema’s, security Json 
>> were all lost.. really looking for a help in understanding to prevent this 
>> happening again.
>> 
>> Sent from my iPhone
> 


Re: All cores gone along with all solr configuration upon reboot

2020-08-22 Thread Erick Erickson
Sounds like you didn’t change Zookeeper data dir. Zookeeper defaults to putting 
its data in /tmp/zookeeper, see the zookeeper config file. And, of course, when 
you reboot it goes away.

I’ve always disliked this, but the Zookeeper folks did it that way. So if you 
just copy zoo_sample.cfg to zoo.cfg that’s what you get, not under Solr’s 
control.

As for how to recover, assuming you put your configsets in some kind of version 
control as we recommend:

0> set up Zookeeper to keep it’s data somewhere permanent. You may want to 
archive snapshots upon occasion as well.

1> save away the data directory for _one_ replica from each shard of every 
collection somewhere. You should have a bunch of directories like 
SOLR_HOME/…./collection1_shard1_replica_n1/data.

2> recreate all your collections with leader-only new collections with the 
exact same number of shards, i.e. shards with only a single replica.

3> shut down all your Solr instances

4> copy the data directories you saved in <2>. You _MUST_ copy to corresponding 
shards. The important bit is that a data directory from collection1_shard1 goes 
back to collection1_shard1. If you copy it back to collection1_shard2 Bad 
Things Happen. Actually, I’d delete the target data directories first and then 
copy.

5> restart your Solr instances and verify they look OK.

6> use the collections API ADDREPLICA to build out your collections.

Best,
Erick

> On Aug 22, 2020, at 12:10 AM, yaswanth kumar  wrote:
> 
> Can someone help me on the below issue??
> 
> I have configured solr 8.2 with one zookeeper 3.4 and 3 solr nodes
> 
> All the configs were pushed initially and Also Indexed all the data into 
> multiple collections with 3 replicas on each collection 
> 
> Now part of server maintenance these solr nodes were restarted and once they 
> came back solr could became empty.. lost all the collections .. all 
> collections specific instance directories  in the path /solr/server/solr Were 
> deleted ..but data folders are intact nothing lost.. not really sure on how 
> to recover from this situation.
> 
> Did make sure that the zoo.cfg was properly configured (permanent paths for 
> zoo data and logs instead of /tmp )as I am using external zoo instead of the 
> one that comes with solr.
> 
> Solr data path is a nas storage which is a common for all three solr nodes
> 
> Another data point is that I enabled solr basic authentication as well if 
> that’s making any difference. Even clusterstate , schema’s, security Json 
> were all lost.. really looking for a help in understanding to prevent this 
> happening again.
> 
> Sent from my iPhone