<again with hopefully less typos> Thanks for all these (main contributor's 😉) valuable inputs!
First thing I did was getting rid of "expungeDeletes". My "single-deletion" unittest failed until I added the optimize-param > updateReques.setParam( "optimize", "true" ); Does this make sense or should JIRA it? How expensive is this "optimization"? BTW: we are on Solr 6.6.0 -----Ursprüngliche Nachricht----- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Samstag, 27. Januar 2018 08:50 An: 'solr-user@lucene.apache.org' <solr-user@lucene.apache.org> Betreff: AW: AW: SolrClient#updateByQuery? Thanks for all these (main contributor's 😉) valuable inputs! First thing I did was getting getting rid of "expungeDeletes". My "single-deletion" unittest failed unti I added the optimize-param > updateReques.setParam( "optimize", "true" ); Does this make sense or should JIRA it? How expensive ist this "optimization"? -----Ursprüngliche Nachricht----- Von: Shawn Heisey [mailto:apa...@elyograg.org] Gesendet: Samstag, 27. Januar 2018 00:49 An: solr-user@lucene.apache.org Betreff: Re: AW: SolrClient#updateByQuery? On 1/26/2018 9:55 AM, Clemens Wyss DEV wrote: > Why do I want to do all this (dumb things)? The context is as follows: > when a document is deleted in an index/core this deletion is not immediately > reflected in the searchresults. Deletions at not really NRT (or has this > changed?). Till now we "solved" this brutely by forcing a commit (with > "expunge deletes"), till we noticed that this results in quite a "heavy > load", to say the least. > Now I have the idea to add a "deleted"-flag to all the documents that is > filtered on on all queries. > When it comes to deletions, I would upate the document's deleted flag and > then effectively delete it. For single deletion this is ok, but what if I > need to re-index? The deleteByQuery functionality is known to have some issues getting along with other things happening at the same time. For best performance and compatibility with concurrent operations, I would strongly recommend that you change all deleteByQuery calls into two steps: Do a standard query with fl=id (or whatever your uniqueKey field is), gather up the ID values (possibly with start/rows pagination or cursorMark), and then proceed to do one or more deleteById calls with those ID values. Both the query and the ID-based delete can coexist with other concurrent operations very well. I would expect that doing atomic updates to a deleted field in your documents is going to be slower than the query/deleteById approach. I cannot be sure this is the case, but I think it would be. It should be a lot more friendly to NRT operation than deleteByQuery. As Walter said, expungeDeletes will result in Solr doing a lot more work than it should, slowing things down even more. It also won't affect search results at all. Once the commit finishes and opens a new searcher, Solr will not include deleted documents in search results. The expungeDeletes parameter can make commits take a VERY long time. I have no idea whether the issues surrounding deleteByQuery can be fixed or not. Thanks, Shawn