Re: 'missing content stream' issuing expungeDeletes=true

2015-09-02 Thread Derek Poh
There are around 6+ millions documents in the collection. Each document (or product record) is unqiue in the collection. When we found out the document has a docfreq of 2, we did a query on the document's product id and indeed 2 documents were returned. We suspect 1 of them is deleted but not

Re: 'missing content stream' issuing expungeDeletes=true

2015-09-02 Thread Erick Erickson
bq: When we found out the document has a docfreq of 2, we did a query on the document's product id and indeed 2 documents were returned. We suspect 1 of them is deleted but not remove from the index. This is totally inconsistent with how Solr works _if_ these documents had the same value for

Re: 'missing content stream' issuing expungeDeletes=true

2015-09-01 Thread Upayavira
I wonder if this resolves it [1]. It has been applied to trunk, but not to the 5.x release branch. If you needed it in 5.x, I wonder if there's a way that particular choice could be made configurable. Upayavira [1] https://issues.apache.org/jira/browse/LUCENE-6711 On Tue, Sep 1, 2015, at 02:43

Re: 'missing content stream' issuing expungeDeletes=true

2015-09-01 Thread Derek Poh
Erick Yes, we see documents changing their position in the list due to having deleted docs. In our searchresult,weapply higher boost (bq) to a group of matched documents to have them display at the top tier of the result. At times 1 or 2 of these documentsare not return in the top tier, they

Re: 'missing content stream' issuing expungeDeletes=true

2015-09-01 Thread Erick Erickson
How many document total in your corpus? And how many do you intend to have? My point is that if you are testing this with a small corpus, the results are very likely different than when you test on a reasonable corpus. So if you expect your "real" index will contain many more docs than what

Re: 'missing content stream' issuing expungeDeletes=true

2015-09-01 Thread Erick Erickson
Derek: Why do you care? What evidence do you have that this matters _practically_? If you've look at scoring with a small number of documents, you'll see significant differences due to deleted documents. In most cases, as you get a larger number of documents the ranking of documents in an index

Re: 'missing content stream' issuing expungeDeletes=true

2015-08-31 Thread Upayavira
If you really must expunge deletes, use optimize. That will merge all index segments into one, and in the process will remove any deleted documents. Why do you need to expunge deleted documents anyway? It is generally done in the background for you, so you shouldn't need to worry about it.

Re: 'missing content stream' issuing expungeDeletes=true

2015-08-31 Thread Derek Poh
Hi Upayavira In fact we are using optimize currently but was advised to use expunge deletes as it is less resource intensive. So expunge deletes will only remove deleted documents, it will not merge all index segments into one? If we don't use optimize, the deleted documents in the index

'missing content stream' issuing expungeDeletes=true

2015-08-30 Thread Derek Poh
Hi I tried doing a expungeDeletes=true with the following but get the message 'missing content stream'. What am I missing? I need to provide additional parameters? curl 'http://127.0.0.1:8983/solr/supplier/update/json?expungeDeletes=true'; Thanks, Derek --