<again with hopefully less typos>
Thanks for all these (main contributor's 😉) valuable inputs!

First thing I did was getting rid of "expungeDeletes". My "single-deletion" 
unittest failed until I added the optimize-param
> updateReques.setParam( "optimize", "true" );
Does this make sense or should JIRA it? 
How expensive is this "optimization"?
BTW: we are on Solr 6.6.0

-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Samstag, 27. Januar 2018 08:50
An: 'solr-user@lucene.apache.org' <solr-user@lucene.apache.org>
Betreff: AW: AW: SolrClient#updateByQuery?

Thanks for all these (main contributor's 😉) valuable inputs!

First thing I did was getting getting rid of "expungeDeletes". My 
"single-deletion" unittest failed unti I added the optimize-param
> updateReques.setParam( "optimize", "true" );
Does this make sense or should JIRA it? 
How expensive ist this "optimization"?


-----Ursprüngliche Nachricht-----
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Samstag, 27. Januar 2018 00:49
An: solr-user@lucene.apache.org
Betreff: Re: AW: SolrClient#updateByQuery?

On 1/26/2018 9:55 AM, Clemens Wyss DEV wrote:
> Why do I want to do all this (dumb things)? The context is as follows:
> when a document is deleted in an index/core this deletion is not immediately 
> reflected in the searchresults. Deletions at not really NRT (or has this 
> changed?). Till now we "solved" this brutely by forcing a commit (with 
> "expunge deletes"), till we noticed that this results in quite a "heavy 
> load", to say the least.
> Now I have the idea to add a "deleted"-flag to all the documents that is 
> filtered on on all queries.
> When it comes to deletions, I would upate the document's deleted flag and 
> then effectively delete it. For single deletion this is ok, but what if I 
> need to re-index?

The deleteByQuery functionality is known to have some issues getting along with 
other things happening at the same time.

For best performance and compatibility with concurrent operations, I would 
strongly recommend that you change all deleteByQuery calls into two steps:  Do 
a standard query with fl=id (or whatever your uniqueKey field is), gather up 
the ID values (possibly with start/rows pagination or cursorMark), and then 
proceed to do one or more deleteById calls with those ID values.  Both the 
query and the ID-based delete can coexist with other concurrent operations very 
well.

I would expect that doing atomic updates to a deleted field in your documents 
is going to be slower than the query/deleteById approach.  I cannot be sure 
this is the case, but I think it would be.  It should be a lot more friendly to 
NRT operation than deleteByQuery.

As Walter said, expungeDeletes will result in Solr doing a lot more work than 
it should, slowing things down even more.  It also won't affect search results 
at all.  Once the commit finishes and opens a new searcher, Solr will not 
include deleted documents in search results. The expungeDeletes parameter can 
make commits take a VERY long time.

I have no idea whether the issues surrounding deleteByQuery can be fixed or not.

Thanks,
Shawn

Reply via email to