Hi Fabio,

I suggest you get and build the latest version from github or 
sourceforge repository.
Indeed, there is a recent fix that sounds like your issue:
     625df4e Fix possible crash in db_exec_sql

# git clone https://github.com/cea-hpc/robinhood.git
# cd ./robinhood.git
# ./configure
# make rpm


The error in the logs looks like an invalid/inconsistent value for 
"nlink", but it is not obvious it is related to the segfault.

  Error 7 executing query 'UPDATE ENTRIES set nlink=nlink-1 where 
id='0x2363d5eca:0xf403:0x0'': BIGINT UNSIGNED value is out of range in 
'(`rbhxxx`.`ENTRIES`.`nlink` - 1)'

I suggest you:
1) check if this entry still exists. Execute:
     lfs fid2path /your/fs/mnt  0x2363d5eca:0xf403:0x0
2) if not, delete the entry from robinhood DB:
 > mysql rbh_db
delete from ENTRIES where id='0x2363d5eca:0xf403:0x0';
delete from NAMES where id='0x2363d5eca:0xf403:0x0';
delete from ANNEX_INFO where id='0x2363d5eca:0xf403:0x0';

Regards,
Thomas


On 10/23/14 15:40, Verzelloni Fabio wrote:
> Dear All,
>    I’m experiencing some issue during the running of the cleaning policy, 
> almost at the same point on every run, robinhood seg fault:
>
> /usr/local/bin/start_cleaning_policy.sh: line 13: 23336 Segmentation fault    
>   /usr/sbin/robinhood -P -O -f 
> /etc/robinhood.d/tmpfs/purge_conf/rbh_purge_daily.conf
>
> And in the /var/log/messages I see this:
>
> Oct 23 15:22:07 rbh01 kernel: robinhood[23348]: segfault at 10 ip 
> 0000003806222655 sp 00007fbd57ff5450 error 4 in 
> libmysqlclient.so.18.0.0[3806200000+258000]
>
> The system has not been update since the beginning and these are the 
> specifics:
>
> Linux rbh01 2.6.32-431.5.1.el6.x86_64 #1 SMP Tue Feb 11 13:30:01 CST 2014 
> x86_64 x86_64 x86_64 GNU/Linux
>
> robinhood-adm-2.5.2-1.noarch.x86_64
> robinhood-tmpfs-2.5.2-1.lustre2.5.el6.x86_64
> robinhood-webgui-2.5.2-1.noarch.x86_64
> lustre-client-2.5.1-2.6.32_431.5.1.el6.x86_64.x86_64
> lustre-client-modules-2.5.1-2.6.32_431.5.1.el6.x86_64.x86_64
> robinhood-tmpfs-2.5.2-1.lustre2.5.el6.x86_64
> libmysqlclient16-5.1.69-1.w6.x86_64
> mysql55w-server-5.5.38-1.w6.x86_64
> mysql55w-libs-5.5.38-1.w6.x86_64
> php-mysql-5.3.3-22.el6.x86_64
> mysql55w-devel-5.5.38-1.w6.x86_64
> mysql55w-5.5.38-1.w6.x86_64
> nagios-plugins-mysql-1.4.16-10.el6.x86_64
>
> I have plenty of these messages in my robinhood.log:
>
> 2014/10/23 15:32:36 rbh01[31788/14] EntryProc | Error 7 performing database 
> operation: request error.
> 2014/10/23 15:32:37 rbh01[31788/8] ListMgr | Unhandled error 1690: default 
> conversion to DB_REQUEST_FAILED
> 2014/10/23 15:32:37 rbh01[31788/8] ListMgr | Error 7 executing query 'UPDATE 
> ENTRIES set nlink=nlink-1 where id='0x2363d5eca:0xf403:0x0'': BIGINT UNSIGNED 
> value is out of range in '(`rbhxxx`.`ENTRIES`.`nlink` - 1)'
> 2014/10/23 15:32:37 rbh01[31788/8] EntryProc | Error 7 performing database 
> operation: request error.
>
> The activity log related to the purge running, always stopped around that 
> point:
>
> 2014/10/23 15:21:11 rbh01[23336/4] STATS | ============ Purge stats 
> ============
> 2014/10/23 15:21:11 rbh01[23336/4] STATS | idle purge threads       = 7
> 2014/10/23 15:21:11 rbh01[23336/4] STATS | purge operations pending = 0
> 2014/10/23 15:21:11 rbh01[23336/4] STATS | purge status:
> 2014/10/23 15:21:11 rbh01[23336/4] STATS |     successfully purged            
> = 0
> 2014/10/23 15:21:11 rbh01[23336/4] STATS |     moved or deleted since last 
> update = 219
> 2014/10/23 15:21:11 rbh01[23336/4] STATS |     whitelisted/ignored            
> = 200360
> 2014/10/23 15:21:11 rbh01[23336/4] STATS |     purge error                    
> = 181
> 2014/10/23 15:21:11 rbh01[23336/4] STATS | total purged volume = 0 (0)
> 2014/10/23 15:21:11 rbh01[23336/4] STATS | last file submitted  0 s ago
> 2014/10/23 15:21:11 rbh01[23336/4] STATS | last file handled    0 s ago
> 2014/10/23 15:21:11 rbh01[23336/4] STATS | last file purged     0 s ago
>
> Unfortunately the core dump file get deleted:
>
> Oct 23 14:44:42 rbh01 abrtd: 'post-create' on 
> '/var/spool/abrt/ccpp-2014-10-23-14:44:42-18925' exited with 1
> Oct 23 14:44:42 rbh01 abrtd: Corrupted or bad directory 
> '/var/spool/abrt/ccpp-2014-10-23-14:44:42-18925', deleting
>
> Any suggestion?
>
> Thanks
> Fabio
>
>
>
> --
> - Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
> via Trevano 131 - 6900 Lugano, Switzerland
> Tel: +41 (0)91 610 82 04
>
> ------------------------------------------------------------------------------
> _______________________________________________
> robinhood-support mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/robinhood-support


------------------------------------------------------------------------------
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to