On Thu, Nov 19, 2015 at 4:39 AM, Gerald Richter - ECOS Technology <[email protected]> wrote: > Hi, > > It's a local IndexSearcher. > > I have done a lot of tests and it's really happening. > > Let me give you a little more details, maybe this helps: > > - I call a function that creates a new IndexSearcher and call $hits = > $searcher -> hits. > - I iterate over the first few entries and returns the entries and the $hits > - The documents that were found are deleted from a database, which in turn > deletes the documents from the Lucy index. > - Now I iterate over the next few entries and delete them and so on > > I have made small test where per iteration only two entries are fetch. The > result looks like this: > > id => "8b8bce64e69b52ed244671009c11ee0e", > id => "8b8bce64e69b52ed244671009c4857e7", > id => "4a3dcd6c2e9e3074d2d52b8e72584b68", > id => "8b8bce64e69b52ed244671009c730dc9", > id => "4a3dcd6c2e9e3074d2d52b8e72584d19", > id => "8b8bce64e69b52ed244671009c7e3974", > id => "4a3dcd6c2e9e3074d2d52b8e72585475", > id => "8b8bce64e69b52ed244671009c7e4788", > id => "4a3dcd6c2e9e3074d2d52b8e72585dc2", > id => "8b8bce64e69b52ed244671009c7e2fa6", > > id is some value I store in the document. The result should only contain ids > starting with 8. > > So you see the first two are correct, after deletion of this two (always in a > different process), the next time, the first one I get is wrong the second > one is correct... > > If I do not delete anything I only get the right entries (just commented out > one line the rest is still the same). > > Any clue?
When documents in an old segment are marked as deleted, that information is written to a bitmap deletions file which is written to a new segment. Old readers are not supposed to know about new segments. So for something to go wrong, either 1) information in an old segment would have to be corrupted, 2) a reader would have to somehow find out about information in a new segment, or 3) somthing else unrelated. Indexers write index data (including new deletions data referencing documents in old segments) to temp files in a new segment, which are then consolidated into a single per-segment "compound file" named "cf.dat". When a reader opens, it mmaps cf.dat for each segment in the snapshot. Once the reader successfully opens all the files it needs, it never goes looking for new files. It's hard to imagine a mechanism that would either cause an existing "cf.dat" file to be modified, or persuade a reader to go look at a new "cf.dat" file. So unless my reasoning is wrong, the cause is #3 -- something else unrelated. I really have no idea what that could be, though since you've previously asked some questions about Coro/AnyEvent and other concurrency stuff the most likely prospect would seem to be something unique to your setup. The next step is probably to take the behavior you've been able to reproduce and isolate it in a test case that others can run and analyze. Marvin Humphrey
