Hello,
We are running robinhood v3.0 against a lustre 2.7 filesystem and using 
the LHSM policy to archive the filesystem.

I am doing some testing at the moment of restoring directories using the 
rbh-undelete command and I am running into a segmentation fault when 
using the command to restore a directory that has been deleted.

What I find notable is that the command will reliably restore two files 
from the directory and segfault when restoring the 3rd file, every time. 
If you then run it again, it will again restore another 2 files, and 
segfault on the 3rd.

I've installed debuginfo packages and here is a stacktrace from gdb 
after a crash. You can see that I am trying to restore a directory that 
contained 5 files all in state 'synchro', and the segfault happens after 
the first two files are successfully restored.

Here is the rebind_cmd we are using:

lhsm_config {
  # used for "undelete": command to change the fid of an entry in archive
  rebind_cmd = "/usr/sbin/lhsmtool_posix --hsm_root=/mnt/qstar/rds-d1/lhsm 
--archive {archive_id} --rebind {oldfid} {newfid} {fsroot}";
  # for UUID-based mapping
  uuid {
    xattr = "trusted.lhsm_uuid";
  }
}

[root@rbh-rds-data robinhood-src]# gdb 
rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from 
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete...done.
(gdb) run -L 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
Starting program: 
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -L 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:36:35 [15513/1] CheckFS | '/rds-d1' matches mount point 
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15517)]
[Thread 0x7ffff3ae1700 (LWP 15517) exited]
             rm_time,                        id,     type,       user,      
group,       size,             last_mod,     lhsm.status,                       
              path
 2016/12/26 21:07:32,  [0x200000ddb:0xe45b:0x0],     file,      wjt27,      
wjt27,   13.00 KB,  2016/10/28 08:51:31,         synchro, 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_al7230
b.c
 2016/12/26 21:07:32,  [0x200000ddb:0xe45c:0x0],     file,      wjt27,      
wjt27,    8.62 KB,  2016/10/28 08:51:31,         synchro, 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959
.c
 2016/12/26 21:07:32,  [0x200000ddb:0xe45d:0x0],     file,      wjt27,      
wjt27,   15.34 KB,  2016/10/28 08:51:31,         synchro, 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453
.c
 2016/12/26 21:07:32,  [0x200000ddb:0xe45e:0x0],     file,      wjt27,      
wjt27,   50.44 KB,  2016/10/28 08:51:31,         synchro, 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
 2016/12/26 21:07:32,  [0x200000ddb:0xe45f:0x0],     file,      wjt27,      
wjt27,    7.30 KB,  2016/10/28 08:51:31,         synchro, 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.h
[Inferior 1 (process 15513) exited normally]
Missing separate debuginfos, use: debuginfo-install 
libuuid-2.23.2-33.el7.x86_64 mariadb-libs-5.5.52-1.el7.x86_64 
pcre-8.32-15.el7_2.1.x86_64
(gdb) run -R 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
                                                                                
                                                     
Starting program: 
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -R 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:36:48 [15519/1] CheckFS | '/rds-d1' matches mount point 
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15520)]
[Thread 0x7ffff3ae1700 (LWP 15520) exited]
Restoring 
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_al7230b.c'...
       restore OK (file)
        Entry successfully updated in the dabatase
Restoring 
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_rf2959.c'...
        restore OK (file)
        Entry successfully updated in the dabatase

Program received signal SIGSEGV, Segmentation fault.
malloc_consolidate (av=av@entry=0x7ffff61f3760 <main_arena>) at malloc.c:4146
4146              size = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);
Missing separate debuginfos, use: debuginfo-install 
sssd-client-1.14.0-43.el7_3.4.x86_64
(gdb) bt
#0  malloc_consolidate (av=av@entry=0x7ffff61f3760 <main_arena>) at 
malloc.c:4146
#1  0x00007ffff5eb6385 in _int_malloc (av=av@entry=0x7ffff61f3760 <main_arena>, 
bytes=bytes@entry=4096) at malloc.c:3436
#2  0x00007ffff5eb8fbc in __GI___libc_malloc (bytes=4096) at malloc.c:2893
#3  0x00007ffff5e7b60c in __realpath (name=0x7fffffffbd24 
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
 resolved=0x0) at canonicalize.c:78
#4  0x00007ffff6b6f98a in llapi_search_fsname 
(pathname=pathname@entry=0x7fffffffbd24 
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
 fsname=fsname@entry=0x7fffffff6b70 "")
    at liblustreapi.c:1173
#5  0x00007ffff6b6fb0e in llapi_file_open_param (name=name@entry=0x7fffffffbd24 
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
 flags=flags@entry=65, mode=436, 
    param=param@entry=0x7fffffff6cd0) at liblustreapi.c:685
#6  0x00007ffff6b6ff75 in llapi_file_open_pool (name=name@entry=0x7fffffffbd24 
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
 flags=flags@entry=65, mode=<optimized out>, 
    stripe_size=stripe_size@entry=0, stripe_offset=stripe_offset@entry=-1, 
stripe_count=stripe_count@entry=0, 
stripe_pattern=stripe_pattern@entry=-2147483647, pool_name=pool_name@entry=0x0) 
at liblustreapi.c:849
#7  0x00007ffff6b749f5 in llapi_hsm_import (dst=dst@entry=0x7fffffffbd24 
"/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c",
 archive=archive@entry=1, st=st@entry=0x7fffffff6e00, 
    stripe_size=stripe_size@entry=0, stripe_offset=stripe_offset@entry=-1, 
stripe_count=stripe_count@entry=0, stripe_pattern=<optimized out>, 
stripe_pattern@entry=0, pool_name=pool_name@entry=0x0, 
    newfid=newfid@entry=0x7fffffff6fd0) at liblustreapi_hsm.c:1333
#8  0x00007ffff42f7209 in lhsm_undelete (smi=0x6a0fb0, p_old_id=0x7fffffff9720, 
p_attrs_old_in=0x7fffffffbc00, p_new_id=0x7fffffff6fd0, 
p_attrs_new=0x7fffffff6fe0, already_recovered=<optimized out>) at lhsm.c:915
#9  0x000000000040c289 in undelete_helper (id=id@entry=0x7fffffff9720, 
attrs=attrs@entry=0x7fffffffbc00) at rbh_undelete.c:329
#10 0x000000000040bd37 in undelete () at rbh_undelete.c:440
#11 main (argc=<optimized out>, argv=<optimized out>) at rbh_undelete.c:712



I can then run it again on the same directory and it will again restore 
another two files before segfaulting again.

(gdb) run -L 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: 
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -L 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:37:00 [15521/1] CheckFS | '/rds-d1' matches mount point 
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15522)]
[Thread 0x7ffff3ae1700 (LWP 15522) exited]
             rm_time,                        id,     type,       user,      
group,       size,             last_mod,     lhsm.status,                       
              path
 2016/12/26 21:07:32,  [0x200000ddb:0xe45d:0x0],     file,      wjt27,      
wjt27,   15.34 KB,  2016/10/28 08:51:31,         synchro, 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453
.c
 2016/12/26 21:07:32,  [0x200000ddb:0xe45e:0x0],     file,      wjt27,      
wjt27,   50.44 KB,  2016/10/28 08:51:31,         synchro, 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
 2016/12/26 21:07:32,  [0x200000ddb:0xe45f:0x0],     file,      wjt27,      
wjt27,    7.30 KB,  2016/10/28 08:51:31,         synchro, 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.h
[Inferior 1 (process 15521) exited normally]
(gdb) run -R 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
                                                                                
                                                     
Starting program: 
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -R 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:37:33 [15627/1] CheckFS | '/rds-d1' matches mount point 
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15628)]
[Thread 0x7ffff3ae1700 (LWP 15628) exited]
Restoring 
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_rf_uw2453.c'...
        restore OK (file)
        Entry successfully updated in the dabatase
Restoring 
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.c'...
      restore OK (file)
        Entry successfully updated in the dabatase

Program received signal SIGSEGV, Segmentation fault.
malloc_consolidate (av=av@entry=0x7ffff61f3760 <main_arena>) at malloc.c:4146
4146              size = p->size & ~(PREV_INUSE|NON_MAIN_ARENA);


And then finally running again, it restores one file and then exits 
cleanly.

(gdb) run -L 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: 
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -L 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:37:38 [15629/1] CheckFS | '/rds-d1' matches mount point 
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15630)]
[Thread 0x7ffff3ae1700 (LWP 15630) exited]
             rm_time,                        id,     type,       user,      
group,       size,             last_mod,     lhsm.status,                       
              path
 2016/12/26 21:07:32,  [0x200000ddb:0xe45f:0x0],     file,      wjt27,      
wjt27,    7.30 KB,  2016/10/28 08:51:31,         synchro, 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.h
[Inferior 1 (process 15629) exited normally]
(gdb) run -R 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
Starting program: 
/root/robinhood-src/rpms/BUILD/robinhood-3.0/src/robinhood/rbh-undelete -R 
/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using config file '/etc/robinhood.d/rds_d1.conf'.
2017/01/05 21:37:48 [15633/1] CheckFS | '/rds-d1' matches mount point 
'/rds-d1', type=lustre, fs=10.143.240.30@tcp1:10.143.240.29@tcp1:/rds-d1
[New Thread 0x7ffff3ae1700 (LWP 15634)]
[Thread 0x7ffff3ae1700 (LWP 15634) exited]
Restoring 
'/rds-d1/user/wjt27/HSM-testing/dir1/linux-4.8.5/drivers/net/wireless/zydas/zd1211rw/zd_usb.h'...
      restore OK (file)
        Entry successfully updated in the dabatase

undelete summary:
                1 files
                0 old version
                0 empty files
                0 non-files
                0 no backup
                0 errors
                0 DB errors
[Inferior 1 (process 15633) exited normally]

I was wondering if anyone else using HSM has seen or can reproduce this 
crash? I'm afraid my C experience is very rusty but I am trying to 
understand the code to see if I can spot where it is failing - any 
pointers here would be most welcome!

Kind regards,


-- 
Matt Rásó-Barnett
Research Computing Platforms
University Information Services
High Performance Computing Service
University of Cambridge
Email: mjr...@cam.ac.uk <mailto:mjr...@cam.ac.uk>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to