Hi,

So looking into this a bit further, as a site I'm working on is seeing a
similar kind of problem, based on the lack of file existence for the FID in
question (in my case) I think that we might have a race here, where nlinks
is decremented to zero without all threads aware, and or the record remains
locked in a transaction when it should be removed. The simple fix here, to
me seems to just ignore the error, as it appears the FID in question is
removed successfully from the DB.

-cf


On Mon, Nov 2, 2015 at 8:20 PM, Colin Faber <[email protected]> wrote:

> Maybe check to see if the hard links are being recorded correctly? Can you
> update the file with additional hard links and see if nlinks  is updated to
> the correct count in the db record?
> On Nov 2, 2015 7:58 PM, "Andrew Elwell" <[email protected]> wrote:
>
>> Hi again - I mentioned this a while ago but don't see any checkins
>> that seem to cover it
>>
>> When running a purge containing hard links I see repeated errors in
>> the log along the lines of
>>
>> 2015/11/03 10:41:16 robinhood@esDM012[3381/15] ListMgr | Unhandled
>> error 1690: default conversion to DB_REQUEST_FAILED
>> 2015/11/03 10:41:16 robinhood@esDM012[3381/15] ListMgr | Error 7
>> executing query 'UPDATE ENTRIES set nlink=nlink-1 where
>> id='0x20ef066ea:0x1c58c:0x0'': BIGINT UNSIGNED value is out of range
>> in '(`rbh_scratch`.`ENTRIES`.`nlink` - 1)'
>> 2015/11/03 10:41:16 robinhood@esDM012[3381/15] Purge | Error 7
>> removing entry from database.
>>
>>
>> # lfs fid2path /scratch 0x20ef066ea:0x1c58c:0x0
>>
>> /scratch/y82/agolicz/tmp/maker_8W0Bpj/.NFSLock..NFSLock.cleaned_proteins%2Efasta%2Er.mpi.10.5.NFSLock.tmp.5396.20911.7186.28443169536
>>
>> /scratch/y82/agolicz/tmp/maker_8W0Bpj/.NFSLock.cleaned_proteins%2Efasta%2Er.mpi.10.5.NFSLock.share
>>
>> (yes I can see that NFSlock in there - I can't control* what our users
>> dump on scratch...)
>> running stat on them looks reasonable:
>>
>> # stat
>> /scratch/y82/agolicz/tmp/maker_8W0Bpj/.NFSLock..NFSLock.cleaned_proteins%2Efasta%2Er.mpi.10.5.NFSLock.tmp.5396.20911.7186.28443169536
>>   File:
>> `/scratch/y82/agolicz/tmp/maker_8W0Bpj/.NFSLock..NFSLock.cleaned_proteins%2Efasta%2Er.mpi.10.5.NFSLock.tmp.5396.20911.7186.28443169536'
>>   Size: 45         Blocks: 8          IO Block: 4194304 regular file
>> Device: 4b4ed00eh/1263456270d Inode: 148320162553120140  Links: 2
>> Access: (0600/-rw-------)  Uid: (22735/ agolicz)   Gid: (22735/ agolicz)
>> Access: 2015-08-06 01:23:16.000000000 +0800
>> Modify: 2015-08-06 01:23:16.000000000 +0800
>> Change: 2015-08-06 01:23:16.000000000 +0800
>>  Birth: -
>>
>> # stat
>> /scratch/y82/agolicz/tmp/maker_8W0Bpj/.NFSLock.cleaned_proteins%2Efasta%2Er.mpi.10.5.NFSLock.share
>>   File:
>> `/scratch/y82/agolicz/tmp/maker_8W0Bpj/.NFSLock.cleaned_proteins%2Efasta%2Er.mpi.10.5.NFSLock.share'
>>   Size: 45         Blocks: 8          IO Block: 4194304 regular file
>> Device: 4b4ed00eh/1263456270d Inode: 148320162553120140  Links: 2
>> Access: (0600/-rw-------)  Uid: (22735/ agolicz)   Gid: (22735/ agolicz)
>> Access: 2015-08-06 01:23:16.000000000 +0800
>> Modify: 2015-08-06 01:23:16.000000000 +0800
>> Change: 2015-08-06 01:23:16.000000000 +0800
>>  Birth: -
>>
>>
>> but going over the code, the only reference I can see that error
>> message in ./src/list_mgr/mysql_wrapper.c which git blame points to
>>
>> b4e737a5 (Thomas Leibovici                  2014-05-20 13:39:27 +0200
>> 97)         DisplayLog(verb?LVL_MAJOR:LVL_DEBUG, LISTMGR_TAG,
>> b4e737a5 (Thomas Leibovici                  2014-05-20 13:39:27 +0200
>> 98)                    "Unhandled error %d: default conversion to
>> DB_REQUEST_FAILED", err);
>>
>>
>> According to the MySQL docs (I guess maria is the same)
>> https://dev.mysql.com/doc/refman/5.5/en/out-of-range-and-overflow.html
>> that error (1690) is an out of range .
>>
>> The actual ENTRIES db entry for it is
>>
>> MariaDB [rbh_scratch]> select * from ENTRIES where id =
>> '0x20ef066ea:0x1c58c:0x0' \G
>> *************************** 1. row ***************************
>>            id: 0x20ef066ea:0x1c58c:0x0
>>         owner: agolicz
>>       gr_name: agolicz
>>          size: 45
>>        blocks: 8
>>   last_access: 1438795396
>>      last_mod: 1438795396
>>          type: file
>>          mode: 384
>>         nlink: 0
>>     md_update: 1446515593
>>       invalid: 0
>> release_class: NULL
>> rel_cl_update: NULL
>> 1 row in set (0.00 sec)
>>
>>
>>
>> where nlink is 0
>>
>>
>> Any ideas where the failure is creeping in?
>>
>>
>> Many thanks
>>
>> Andrew
>>
>>
>> * well, technically I can, but that might make me more unpopular...
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> robinhood-support mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/robinhood-support
>>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to