I had to try a dozen backed up versions of a file one after another to find a 
lost edit and I was also greatly annoyed by the delay (~6 minutes with a fast 
USB3 disk). So I noticed this report and also noted that  duplicity indeed does 
many tenths of thousands to half a million sys-calls to stat or open 
/etc/localtime, /etc/group and /etc/passwd. That seems a bit unreasonable to me 
but I'm NOT AN EXPERT AT ALL. The only thing I can tell for sure is that 6 
minutes per restore is some times toooooo long*. In my case it was painful.
 
To test I started the restore of one small file from the GUI and then run 
strace as follows on various python processes that I was noticing with `ps 
ax|grep python`. Before the final duplicity python process started I recorded 
about half a million stat("/etc/localtime") syscalls (you may be interested in 
[1] regarding this behaviour). I then recorded ~100,000 file related syscalls 
to open /etc/passwd and /etc/group. Here are my commands:

# strace  -e trace=file -p 3827  2> /tmp/foo
...
# wc -l /tmp/foo                                                                
    
100892 /tmp/lala4         
# sort /tmp/foo|sed -e 's/", .*/"/'|uniq -c|sort -n|tail  -2
  49912 open("/etc/group"
  49912 open("/etc/passwd"

What seems interesting are the duplicity processes I show starting. I
noticed the same commands running more than once and this doesn't seem
normal at all. Especially the fact that duplicity list-current-files
runs twice with exactly the same options and takes ~2 minutes each time
seems troubling**.

  1  6855 0:24 python2 /usr/bin/duplicity collection-status --gio 
file:///bak/ndbak2 --no-encryption --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=19               
                                       
  2  7686 0:23 python2 /usr/bin/duplicity collection-status --gio 
file:///bak/ndbak2 --no-encryption --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=19               
                                       
  3  8110 2:26 python2 /usr/bin/duplicity list-current-files --gio 
file:///bak/ndbak2 --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=19               
                                                      
  4 10886 0:21 python2 /usr/bin/duplicity restore --gio 
--file-to-restore=home/nd/test --time=2018-06-06T09:23:45Z --force 
file:///bak/ndbak2 /home/nd/test --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=1
    9                                                                           
                                                                                
                                                                                
                       
  5 11241 0:21 python2 /usr/bin/duplicity collection-status --gio 
file:///bak/ndbak2 --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=21               
                                                       
  6 11535 2:03 python2 /usr/bin/duplicity list-current-files --gio 
file:///bak/ndbak2 --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=20               
                                                      
  7 13659 0:29 python2 /usr/bin/duplicity restore --gio 
--file-to-restore=home/nd/test --time=2018-06-06T09:23:45Z --force 
file:///bak/ndbak2 /home/nd/test --verbosity=9 --gpg-options=--no-use-agent 
--archive-dir=/home/nd/.cache/deja-dup --tempdir=/tmp --log-fd=2
    0         

Here's the output of strace  -c -f -p 11535 (duplicity list-current-
files) where you can see that 34.68% of syscalls time is spent on stat
(I _guess_ on stat('/etc/timezone'). Roughly the same times I show on
the other invocation of duplicity list-current-files.

strace: Process 11535 attached with 3 threads
strace: [ Process PID=11535 runs in x32 mode. ]
strace: [ Process PID=11535 runs in 64 bit mode. ]
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 48.67    0.243360           0    632709           write
 34.68    0.173416           1    316355           stat
 15.26    0.076308           1    128272           read
  0.78    0.003885        3885         1           brk
  0.58    0.002923           3       952           munmap
  0.01    0.000046          23         2           unlink
  0.01    0.000039          39         1           mkdir
  0.00    0.000023          23         1           rmdir
  0.00    0.000003           3         1           fstat
  0.00    0.000003           3         1           rt_sigaction
  0.00    0.000000           0         2           close
  0.00    0.000000           0         2           lseek
------ ----------- ----------- --------- --------- ----------------
100.00    0.500006               1078299           total


______________

[1] https://blog.packagecloud.io/eng/2017/02/21/set-environment-
variable-save-thousands-of-system-calls/

_______________
* But don't get me wrong -- I'm still really thankful for all the free fish :-)
** just guessing but this might be a glitch in the way the GUI calls duplicity

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1090725

Title:
  slow scanning because too many times opens /etc/groups and /etc/passwd

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/duplicity/+bug/1090725/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to