"I am concerned that the same search gives different results after each search. The top document seems to cycle between 3 different documents"
if you do debug query on the search, are the scores for the top 3 documents the same or not? you can easily have three documents with the same score, so when you have a result set that is ranked 1-1-1-2-3-4.... you can expect 1-1-1 to rotate based on whatever. use a second element like id to your ranking perhaps. On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <webster.ho...@sial.com> wrote: > I am not concerned about deleted documents. I am concerned that the same > search gives different results after each search. The top document seems to > cycle between 3 different documents > > I have an enhanced collections info api call that calls the core admin api > to get the index information for the replica. > When I said the numdocs were the same I meant exactly that. maxdocs and > deleted documents are not the same for the replicas, but the number of > numdocs is. > > Or are you saying that the search is looking at deleted documents wouldn't > that be a very significant bug? > > The four replicas: > shard1 > core_node1 > "numDocs": 383817, > "maxDocs": 611592, > "deletedDocs": 227775, > "size": "2.49 GB", > "lastModified": "2017-09-07T08:18:03.639Z", > "current": true, > "version": 35644, > "segmentCount": 28 > > core_node3 > "numDocs": 383817, > "maxDocs": 571737, > "deletedDocs": 187920, > "size": "2.85 GB", > "lastModified": "2017-09-07T08:18:03.634Z", > "current": false, > "version": 35562, > "segmentCount": 36 > shard2 > core_node2 > "numDocs": 385326, > "maxDocs": 529214, > "deletedDocs": 143888, > "size": "2.13 GB", > "lastModified": "2017-09-07T08:18:03.632Z", > "current": true, > "version": 34783, > "segmentCount": 24 > core_node4 > "numDocs": 385326, > "maxDocs": 488201, > "deletedDocs": 102875, > "size": "1.96 GB", > "lastModified": "2017-09-07T08:18:03.633Z", > "current": true, > "version": 34932, > "segmentCount": 21 > > > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley <ysee...@gmail.com> wrote: > > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <erickerick...@gmail.com > > > > wrote: > > > bq: and deleted documents are irrelevant to term statistics... > > > > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_? > > > > One can make it work either way ;-) > > Whether a document is marked as deleted or not has no effect on term > > statistics (i.e. irrelevant) > > OR documents marked for deletion still count in term statistics (i.e. > > relevant) > > > > I guess I used the former because we don't go out of our way to still > > include deleted documents... it's just a side effect of the index > > structure that we don't (and can't easily) update statistics when a > > document is marked as deleted. > > > > -Yonik > > > > > > > Erick > > > > > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley <ysee...@gmail.com> > wrote: > > >> Different replicas of the same shard can have different numbers of > > >> deleted documents (really just marked as deleted), and deleted > > >> documents are irrelevant to term statistics (like the number of > > >> documents a term appears in). Documents marked for deletion stop > > >> contributing to corpus statistics when they are actually removed (via > > >> expunge deletes, merges, optimizes). > > >> -Yonik > > >> > > >> > > >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer <webster.ho...@sial.com > > > > wrote: > > >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4 > > >>> replicas (total of 4 nodes). > > >>> > > >>> If I run the query multiple times I see the three different top > scoring > > >>> results. > > >>> No data load is running, all data has been commited > > >>> > > >>> I get these three different hits with their scores: > > >>> copperiinitratehemipentahydrate2325919004194 430.61722 > > >>> copperiinitrateoncelite1234598765 > > 432.44238 > > >>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185 > > >>> > > >>> How is it that the same search against the same data can give > different > > >>> responses? > > >>> I looked at the specific cores they look OK the numdocs for the > > replicas in > > >>> a shard match > > >>> > > >>> This is the query: > > >>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial- > > catalog-product/select?defType=edismax&fl=searchmv_ > > en_keywords,%20searchmv_keywords,searchmv_pno,% > 20searchmv_en_s_pri_name,% > > 20search_en_p_pri_name,%20search_pno%20[explain% > > 20style=nl]&group.field=id_s&group.limit=30&group=true& > > group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op= > > OR&q=copper%20nitrate&qf=search_pid > > >>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400% > > 20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr% > > 20searchmv_p_skus_genr%20searchmv_user_term^200% > > 20search_lform^190%20searchmv_en_acronym^180%20search_en_ > > root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_ > > pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_ > > keywords^140%20search_en_sortkey^120%20searchmv_p_skus^ > > 100%20searchmv_chem_comp^90%20searchmv_en_name_suf% > > 20searchmv_cas_number^80%20searchmv_component_cas^70% > > 20search_beilstein^50%20search_color_idx^40% > 20search_ecnumber^30%20search_ > > egecnumber^30%20search_femanumber^20%20searchmv_isbn^ > > 10%20search_mdl_number%20searchmv_en_page_title% > > 20searchmv_en_descriptions%20searchmv_en_attributes% > > 20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_ > > xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_ > > equivalent_pno%20searchmv_xref_exact_pno%20searchmv_ > > xref_exact_sku%20searchmv_component_molform&rows=30& > > sort=score%20desc,sort_en_name%20asc,sort_ds%20asc, > > search_pid%20asc&wt=json > > >>> > > >>> -- > > >>> > > >>> > > >>> This message and any attachment are confidential and may be > privileged > > or > > >>> otherwise protected from disclosure. If you are not the intended > > recipient, > > >>> you must not copy this message or attachment or disclose the contents > > to > > >>> any other person. If you have received this transmission in error, > > please > > >>> notify the sender immediately and delete the message and any > attachment > > >>> from your system. Merck KGaA, Darmstadt, Germany and any of its > > >>> subsidiaries do not accept liability for any omissions or errors in > > this > > >>> message which may arise as a result of E-Mail-transmission or for > > damages > > >>> resulting from any unauthorized changes of the content of this > message > > and > > >>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > >>> subsidiaries do not guarantee that this message is free of viruses > and > > does > > >>> not accept liability for any damages caused by any virus transmitted > > >>> therewith. > > >>> > > >>> Click http://www.emdgroup.com/disclaimer to access the German, > French, > > >>> Spanish and Portuguese versions of this disclaimer. > > > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer. >