Hello,

A quick follow-up on this : since I put the patch live, I didn't get a 
single new BUG ALERT. Looks like that was the only one I was getting.

Cheers,
Nicolas

On Thursday, September 17, 2020 at 9:07:32 PM UTC+2 [email protected] wrote:

> Great work, thank you for tracking this down Nicolas!
>
> Best,
> -Nikolaus
>
> --
> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>
>              »Time flies like an arrow, fruit flies like a Banana.«
>
>
>
> On Thu, 17 Sep 2020, at 17:26, Nicolas Deschildre wrote:
>
> Hello,
>
> Ok I found it!! (Or at least, one of them).
> And a sure way to reproduce the crash.
> All this time I was looking at a async/await bug... but it was finally 
> another kind of concurrency issue :-)
>
> The bug :
> If you move a file from a folder to another, and the inode cache happens 
> to be in a very specific condition (file to be copied uncached, destination 
> folder in cache about to be evicted on next inode cache miss, origin folder 
> in cache about to be evicted next after).
> --> fs.py:654 Reduce a lot the occurence of this bug by ensuring the 
> origin and destination folder are in inode cache
> --> fs.py:658 self._lookup() load the inode of the file to be copied, 
> evicting the destination folder inode.
> --> fs.py:719 self.inodes[id_p_old] Load the origin folder which was still 
> in cache
> --> fs.py:720 self.inodes[id_p_new] Load the destination folder, evicts 
> the origin folder inode.
> --> fs.py:721/723 Changes are made on the origin folder inode, which is no 
> longer handled by the inode cache. We have the bug.
>
> Fix here : https://github.com/s3ql/s3ql/pull/199
>
> Thanks for the pointers!
>
> Nicolas Deschildre
>
> On Friday, September 11, 2020 at 8:50:24 AM UTC+2 Nicolas Deschildre wrote:
>
> Hello,
>
> Thanks, Ok, I now understand the code better.
> InodeCache holds 100 Inodes as cache in a ring. When a new inode not in 
> cache is requested, an inode in the cache is flushed, and the new inode is 
> stored in the cache instead.
> The bug : Race condition : 2 inodes are requested from InodeCache at the 
> same time. Thread 1 requesting inode 1 flush and remove inode 2 from cache. 
> Thread 2 got inode 2 before it was removed from cache by Thread 1, but 
> makes changes after it was removed and flushed by Thread 1. Thead 2 ends, 
> there are no longer references to inode 2, python garbage collect it, and 
> this trigger the bug.
>
> I see 2 possibles solutions : 
> 1/ https://github.com/s3ql/s3ql/pull/196 : The _Inode class keeps a 
> reference to the InodeCache. On __del__, if we encouter the above bug, we 
> flush ourselves. The problem (and I'm not familiar enough with python) : I 
> guess garbage collection could happen at shutdown, when the InodeCache SQL 
> connection is no longer valid. Do you see a way to make this approach work?
> 2/ (Not coded yet) : The InodeCache is a ring, first in first out. What if 
> we store access time on InodeCache.__getitem__, and the inode to be removed 
> is the most old accessed one? This solution should reduce a lot (but not 
> eliminate) the race condition. What do you think?
>
> Finally : I tried to reproduce locally, with a unencrypted, uncompressed 
> local backend, with mass parralel attribute changes, but I was not able to 
> reproduce the bug. 
>
> Thanks,
> Nicolas Deschildre
>
> On Wednesday, September 9, 2020 at 5:50:19 PM UTC+2 [email protected] 
> wrote:
>
> Hello Nicolas,
>  
>
>
> S3QL somehow manages to delete/garbage collects an _Inode object that is 
> dirty (i.e. has an attribute modification that is not persisted in the 
> S3QL database) 
>
>
> So, if I understand correctly, since it is a pending modification on a now 
> deleted inode, this is not really a problem, right? Said otherwise, the 
> filesystem is not corrupted? [...]
>
>
> No, the inode/file does not have to be deleted. There is a pending 
> metadata modification (access time, modification time, file size, file 
> mode) that should have been persisted (written in the Sqlite-Database) but 
> it did not made it in the database.
> File data is not corrupted, but some metadata of your files might not be 
> in the correct state.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "s3ql" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/s3ql/e1882d14-f582-42ce-8468-1d07d0eada3cn%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/s3ql/e1882d14-f582-42ce-8468-1d07d0eada3cn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/f901dd31-44f2-441c-8f93-2b5b9fbf0ecan%40googlegroups.com.

Reply via email to