Re: Solr still gives old data while faceting from the deleted documents
expungeDeletes wont' do the trick for you, it purges documents in segments with > 10% deleted docs so you'll still have documents. I'd push back on "the requirement is to show facets with 0 count as disabled." Why? What use-case is satisfied here? Effectively this is saying "For my query, show me possible values that have no hits for that query". Optimize is a very costly operation and to really get this behavior you'll need to run it _every_ time the index changes. You really can't afford to run it for every update, so there'll be a period of time when you will still get these facets. Eventually you won't be displaying zero-count facets anyway, assuming that you have room for, say, only 10 facets and sort by frequency. If your index changes only periodically (say once a day) that may be fine. But more often than that and you won't be able to satisfy the requirement anyway. My point is that requirements like this are often created without understanding the consequences and cause a lot of effort to be expended without a good purpose. See: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ Best, Erick On Thu, Apr 12, 2018 at 10:32 PM, girish.vigneshwrote: > mincount will fix this issue for sure. I have tried that but the requirement > is to show facets with 0 count as disabled. > > I think I left with only 2 options. Either go with expungeDelets with update > URL or use optimize in a scheduler. > > Regards, > Vignesh > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr still gives old data while faceting from the deleted documents
mincount will fix this issue for sure. I have tried that but the requirement is to show facets with 0 count as disabled. I think I left with only 2 options. Either go with expungeDelets with update URL or use optimize in a scheduler. Regards, Vignesh -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr still gives old data while faceting from the deleted documents
On 4/12/2018 5:53 AM, girish.vignesh wrote: Solr gives old data while faceting from old deleted or updated documents. For example we are doing faceting on name. name changes frequently for our application. When we index the document after changing the name we get both old name and new name in the search results. After digging more on this I got to know that Solr indexes are composed of segments (write once) and each segment contains set of documents. Whenever hard commit happens these segments will be closed and even if a document is deleted after that it will still have those documents (which will be marked as deleted). These documents will not be cleared immediately. It will not be displayed in the search result though, but somehow faceting is still able to access those data. If all documents with that term are deleted, then this will be fixed by adding a facet.mincount=1 parameter to your facet URL. If you are using the JSON facet API, then there is a mincount parameter that you can place into your JSON request. I've never actually used the JSON facet API, but there is documentation: https://lucene.apache.org/solr/guide/7_2/json-facet-api.html#TermsFacet The mincount parameter might make it unnecessary to optimize. But if you are updating a LOT of your documents on a regular basis, you might find that it gives you better performance, so optimizing once a day during a time when traffic is low might be useful. Thanks, Shawn
Solr still gives old data while faceting from the deleted documents
Solr gives old data while faceting from old deleted or updated documents. For example we are doing faceting on name. name changes frequently for our application. When we index the document after changing the name we get both old name and new name in the search results. After digging more on this I got to know that Solr indexes are composed of segments (write once) and each segment contains set of documents. Whenever hard commit happens these segments will be closed and even if a document is deleted after that it will still have those documents (which will be marked as deleted). These documents will not be cleared immediately. It will not be displayed in the search result though, but somehow faceting is still able to access those data. Optimizing fixed this issue. But we cannot perform this each time customer changes data on production. I tried below options and that did not work for me. 1) *expungeDeletes*. Added this line below in solrconfig.xml 3 false 1 // This is not working. I don't think I can use expungeDeletes like this in solrConfig.xml When I send commit parameters in update URL it is working. 2) Using *TieredMergePolicyFactory* might not help me as the threshold might not reach always and user will see old data during this time. 3) One more way of doing it is calling *optimize*() method which is exposed in solrj daily once. But not sure what impact this will have on performance. 4) Tried to manipulate filterCache, documentCache and queryResultCache in solrConfig.xml This did not solve my issue as well. I do not think any cache is causing this issue. Number of documents we index per server will be maximum 2M-3M. Please suggest if there is any solution to this apart from expungeDeletes/optimize. Let me know if more data needed. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Solr still gives old data while faceting from the deleted documents
Solr gives old data while faceting from old deleted or updated documents. For example we are doing faceting on name. name changes frequently for our application. When we index the document after changing the name we get both old name and new name in the search results. After digging more on this I got to know that Solr indexes are composed of segments (write once) and each segment contains set of documents. Whenever hard commit happens these segments will be closed and even if a document is deleted after that it will still have those documents (which will be marked as deleted). These documents will not be cleared immediately. It will not be displayed in the search result though, but somehow faceting is still able to access those data. Optimizing fixed this issue. But we cannot perform this each time customer changes data on production. I tried below options and that did not work for me. 1) *expungeDeletes*. Added this line below in solrconfig.xml 3 false 1 // This is not working. I do not think I can add expungeDeletes configuration like this. When I make expungeDeletes call using curl command its merging the segments. 2) Using *TieredMergePolicyFactory* might not help me as the threshold might not reach always and user will see old data during this time. 3) One more way of doing it is calling *optimize*() method which is exposed in solrj daily once. But not sure what impact this will have on performance. 4) Tried manipulating filterCache, documentCache and queryResultCache. I do not think whatever the issue I am facing is because of these caches. Number of documents we index per server will be maximum 2M-3M. Please suggest if there is any solution to this. Let me know if more data needed. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html