Kevin Bachmann created SOLR-13167:
-------------------------------------

             Summary: Duplicate Child Documents and undeterministic search
                 Key: SOLR-13167
                 URL: https://issues.apache.org/jira/browse/SOLR-13167
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: search, SolrCloud
    Affects Versions: 7.5
         Environment: SOLR 7.5 running on AWS EC2 Instances with an AMI OS 
split to two shards running on two different EC2 instances with the built in 
Zookeeper of SOLR
            Reporter: Kevin Bachmann
         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
screenshot-4.png

i have a product search hosted on a solr cloud with 2 shards and two instances 
hosted on ec2 and the following setup: 

a product has an unlimited amount of children which are small objects with shop 
information. these child documents of the products define the shops where the 
product is available. the requirement from my side is to update / sync the 
whole documents (parent and children) at least once a day. the availability 
information is included in the child-documents with a quantity field.

problem:
 # after every sync the number of child documents (shops) increases and nests 
deeper every sync as the quantity changes and the child documents are 
apparently not updated by id but newly created with the same id (duplicates as 
comparable in SOLR-5211, SOLR-6096, SOLR-12638). 
 # whenever i sync the products with the children with one level of depth 
(parent > child) i get parent > child > child > child > ... depending on how 
many children there are (see screenshot-4.png). these children also can't be 
displayed with nodeType:shop
 # whenever i try to request the products (parents) by a child attribute 
(shopId) the search is underteministic and does not return the correct 
products. a lot of products do contain children that never have been assigned 
to them. some products are flooded with a huuge amount of children (>1000) 
although they have assigned about 10. as you can see in screenshot-1 to 3 there 
are three queries that are exactly the same and give back different products. 
screenshot-1 with 26241 results would be the correct amount and correct data 
but the other two are completely wrong. 

i would really appreciate any workaround or help on these issues. this is a 
huge problem and my business does depend on this (!):(

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to