Could this be related to thread issues in Solr?

On Fri, Feb 17, 2023 at 4:44 PM Mark Hieber <[email protected]> wrote:

> I agree. However, what we are seeing is that a query *will return the
> document* when we check for eventual consistency, but when we check some
> time later (say 5 minutes), then we do not get the document returned. So it
> got the document correctly, and returned the results, but then later it did
> not return the result for the same query.
>
> On Fri, Feb 17, 2023 at 4:32 PM Walter Underwood <[email protected]>
> wrote:
>
>> Query the database to see whether the document is in the database.
>>
>> The problem happens when you don’t follow the design pattern “single
>> source of truth”. Solr has a delayed version of the true state, so it will
>> sometimes give wrong answers.
>>
>> Single source of truth
>> <https://en.wikipedia.org/wiki/Single_source_of_truth>
>> en.wikipedia.org <https://en.wikipedia.org/wiki/Single_source_of_truth>
>> [image: wikipedia.png]
>> <https://en.wikipedia.org/wiki/Single_source_of_truth>
>> <https://en.wikipedia.org/wiki/Single_source_of_truth>
>>
>> wunder
>> Walter Underwood
>> [email protected]
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Feb 17, 2023, at 11:52 AM, Mark Hieber <[email protected]> wrote:
>>
>> We have a cluster of hosts running Solr 8.4 Each host has an application
>> which listens to an external source for updated documents. When it gets a
>> document we care about, it indexes that document into the correct Solr
>> core
>> (we are not running cloud).
>>
>> In our API service, when we get a request to put this type of document, we
>> first query Solr to see if the document exists. If it does not, we then
>> create a new document in our database and the document is sent to the
>> application to be indexed into Solr. If the document we are trying to
>> *put* (in
>> the API Service) exists in Solr, then we throw an exception back to the
>> user if they have not specified the existing version (not the _version_
>> field from Solr, rather an increasing counter).
>>
>> As part of our write logic, after we put the document into the database,
>> we
>> query the Solr stack until we get the response containing the newly
>> written
>> document. So we know the document was written at this point.
>>
>> Some time later (maybe 5-10 minutes), we get another put request for the
>> same document id. We query Solr, and in some cases, we get no documents
>> returned, even though just before we actually found the document. The
>> document has not been deleted in the interim.
>>
>> We use the same query for both checking for existence at the beginning of
>> the logic, and for checking for eventual consistency after writing to the
>> database,
>>
>> I could add a retry to the first part of the logic (retry if we don't find
>> a document), but the question is why we don't find it the first time (but
>> for the second put).
>>
>> If I query for the document (using the same query), I find the document on
>> each host.
>>
>> Why are we not seeing documents which are actually there?
>>
>>
>>

Reply via email to