Great work, thank you for tracking this down Nicolas!
Best,
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
On Thu, 17 Sep 2020, at 17:26, Nicolas Deschildre wrote:
> Hello,
>
> Ok I found it!! (Or at least, one of them).
> And a sure way to reproduce the crash.
> All this time I was looking at a async/await bug... but it was finally
> another kind of concurrency issue :-)
>
> The bug :
> If you move a file from a folder to another, and the inode cache happens to
> be in a very specific condition (file to be copied uncached, destination
> folder in cache about to be evicted on next inode cache miss, origin folder
> in cache about to be evicted next after).
> --> fs.py:654 Reduce a lot the occurence of this bug by ensuring the origin
> and destination folder are in inode cache
> --> fs.py:658 self._lookup() load the inode of the file to be copied,
> evicting the destination folder inode.
> --> fs.py:719 self.inodes[id_p_old] Load the origin folder which was still in
> cache
> --> fs.py:720 self.inodes[id_p_new] Load the destination folder, evicts the
> origin folder inode.
> --> fs.py:721/723 Changes are made on the origin folder inode, which is no
> longer handled by the inode cache. We have the bug.
>
> Fix here : https://github.com/s3ql/s3ql/pull/199
>
> Thanks for the pointers!
>
> Nicolas Deschildre
>
> On Friday, September 11, 2020 at 8:50:24 AM UTC+2 Nicolas Deschildre wrote:
>> Hello,
>>
>> Thanks, Ok, I now understand the code better.
>> InodeCache holds 100 Inodes as cache in a ring. When a new inode not in
>> cache is requested, an inode in the cache is flushed, and the new inode is
>> stored in the cache instead.
>> The bug : Race condition : 2 inodes are requested from InodeCache at the
>> same time. Thread 1 requesting inode 1 flush and remove inode 2 from cache.
>> Thread 2 got inode 2 before it was removed from cache by Thread 1, but makes
>> changes after it was removed and flushed by Thread 1. Thead 2 ends, there
>> are no longer references to inode 2, python garbage collect it, and this
>> trigger the bug.
>>
>> I see 2 possibles solutions :
>> 1/ https://github.com/s3ql/s3ql/pull/196 : The _Inode class keeps a
>> reference to the InodeCache. On __del__, if we encouter the above bug, we
>> flush ourselves. The problem (and I'm not familiar enough with python) : I
>> guess garbage collection could happen at shutdown, when the InodeCache SQL
>> connection is no longer valid. Do you see a way to make this approach work?
>> 2/ (Not coded yet) : The InodeCache is a ring, first in first out. What if
>> we store access time on InodeCache.__getitem__, and the inode to be removed
>> is the most old accessed one? This solution should reduce a lot (but not
>> eliminate) the race condition. What do you think?
>>
>> Finally : I tried to reproduce locally, with a unencrypted, uncompressed
>> local backend, with mass parralel attribute changes, but I was not able to
>> reproduce the bug.
>>
>> Thanks,
>> Nicolas Deschildre
>>
>> On Wednesday, September 9, 2020 at 5:50:19 PM UTC+2 [email protected] wrote:
>>> Hello Nicolas,
>>>
>>>
>>>>
>>>>> S3QL somehow manages to delete/garbage collects an _Inode object that is
>>>>> dirty (i.e. has an attribute modification that is not persisted in the
>>>>> S3QL database)
>>>>
>>>> So, if I understand correctly, since it is a pending modification on a now
>>>> deleted inode, this is not really a problem, right? Said otherwise, the
>>>> filesystem is not corrupted? [...]
>>>
>>> No, the inode/file does not have to be deleted. There is a pending metadata
>>> modification (access time, modification time, file size, file mode) that
>>> should have been persisted (written in the Sqlite-Database) but it did not
>>> made it in the database.
>>> File data is not corrupted, but some metadata of your files might not be in
>>> the correct state.
>>>
>>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "s3ql" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/s3ql/e1882d14-f582-42ce-8468-1d07d0eada3cn%40googlegroups.com
>
> <https://groups.google.com/d/msgid/s3ql/e1882d14-f582-42ce-8468-1d07d0eada3cn%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/s3ql/c482cb9e-a295-45d6-b4c8-8137029da6f3%40www.fastmail.com.