Re: Index documents in async way

2020-10-15 Thread David Smiley
> > What I mean here is right now, when we send a batch of documents to Solr. > We still process it as concrete - unrelated documents by indexing one by > one. If indexing the fifth document causing error, that won't affect > already indexed 4 documents. Using this model we can index the batch in

Re: Index documents in async way

2020-10-15 Thread Đạt Cao Mạnh
> The biggest problem I have with this is that the client doesn't know about indexing problems without awkward callbacks later to see if something went wrong. Even simple stuff like a schema problem (e.g. undefined field). It's a useful *option*, any way. > Currently we now guarantee that if

Re: Index documents in async way

2020-10-13 Thread Gus Heck
This is interesting, though it opens a few of cans of worms IMHO. 1. Currently we now guarantee that if solr sends you an OK response the document WILL eventually become searchable without further action. Maintaining that guarantee becomes impossible if we haven't verified that the

Re: Index documents in async way

2020-10-09 Thread David Smiley
On Thu, Oct 8, 2020 at 10:21 AM Cao Mạnh Đạt wrote: > Hi guys, > > First of all it seems that I used the term async a lot recently :D. > Recently I have been thinking a lot about changing the current indexing > model of Solr from sync way like currently (user submit an update request > waiting

Re: Index documents in async way

2020-10-09 Thread Ilan Ginzburg
I like the idea. Two (main) points are not clear for me: - Order of updates: If the current leader fails (its tlog becoming inaccessible) and another leader is elected and indexes some more, what happens when the first leader comes back? What does it do with its tlog and how to know which part

Re: Index documents in async way

2020-10-09 Thread Cao Mạnh Đạt
Thank you Tomas >Atomic updates, can those be supported? I guess yes if we can guarantee that messages are read once and only once. It won't be straightforward since we have multiple consumers on the tlog queue. But it is possible with appropriate locking >I'm guessing we'd need to read messages

Re: Index documents in async way

2020-10-08 Thread Tomás Fernández Löbbe
Interesting idea Đạt. The first questions/comments that come to my mind would be: * Atomic updates, can those be supported? I guess yes if we can guarantee that messages are read once and only once. * I'm guessing we'd need to read messages in an ordered way, so it'd be a single Kafka partition

Re: Index documents in async way

2020-10-08 Thread Đạt Cao Mạnh
> Can there be a situation where the index writer fails after the document was added to tlog and a success is sent to the user? I think we want to avoid such a situation, isn't it? > I suppose failures would be returned to the client one the async response? To make things more clear, the response

Re: Index documents in async way

2020-10-08 Thread Joel Bernstein
I think this model has a lot of potential. I'd like to add another wrinkle to this. Which is to store the information about each batch as a record in the index. Each batch record would contain a fingerprint for the batch. This solves lots of problems, and allows us to confirm the integrity of the

Re: Index documents in async way

2020-10-08 Thread Erick Erickson
I suppose failures would be returned to the client one the async response? How would one keep the tlog from growing forever if the actual indexing took a long time? I'm guessing that this would be optional.. On Thu, Oct 8, 2020, 11:14 Ishan Chattopadhyaya wrote: > Can there be a situation

Re: Index documents in async way

2020-10-08 Thread Ishan Chattopadhyaya
Can there be a situation where the index writer fails after the document was added to tlog and a success is sent to the user? I think we want to avoid such a situation, isn't it? On Thu, 8 Oct, 2020, 8:25 pm Cao Mạnh Đạt, wrote: > > Can you explain a little more on how this would impact

Re: Index documents in async way

2020-10-08 Thread Cao Mạnh Đạt
> Can you explain a little more on how this would impact durability of updates? Since we persist updates into tlog, I do not think this will be an issue > What does a failure look like, and how does that information get propagated back to the client app? I did not be able to do much research but

Re: Index documents in async way

2020-10-08 Thread Mike Drob
Interesting idea! Can you explain a little more on how this would impact durability of updates? What does a failure look like, and how does that information get propagated back to the client app? Mike On Thu, Oct 8, 2020 at 9:21 AM Cao Mạnh Đạt wrote: > Hi guys, > > First of all it seems that