Hello, Indeed, there is an issue with Lustre changelog UNLINK records that never report actual deletion (UNLINK_LAST flag is not set) when a file is still opened when deleted. I don't know if there is already a Jira about that. Thomas
-----Message d'origine----- De : Davide Tacchella [mailto:dtacche...@cray.com] Envoyé : jeudi 13 juin 2019 09:30 À : gerald.ho...@hpe.com; robinhood-support@lists.sourceforge.net Objet : Re: [robinhood-support] Missing attribute 'fullpath' Hello, we see this regularly on RH 3.1.5, haven't had time to nail it down but our work-around is to scan FS regularly to purge DB orphans. one possible reproducer is: - open file - delete file - sleep 1h - close file open - delete - close is perfectly acceptable in Unix, as file will be deleted once the last file descriptor is closed. Other possible causes for these DB orphans are Changelog events out of order, causing events to be removed from CL queue without being processed. Davide On Thu, 2019-06-13 at 07:03 +0000, Hofer, Gerald (HPE Pointnext South Pacific Delivery) wrote: > I have an issue with the lhsm_archive where several FIDs produce this > error message during the archive: > > 2019/06/13 16:27:31 [32655/21] Policy | Missing attribute 'fullpath' > for evaluating boolean expression on [0x200128c87:0x1db1d:0x0] > 2019/06/13 16:27:31 [32655/21] Policy | [0x200128c87:0x1db1d:0x0]: > attribute is missing for checking ignore_fileclass rule > 2019/06/13 16:27:31 [32655/21] lhsm_archive | Warning: cannot > determine if entry is whitelisted: skipping it. > 2019/06/13 16:27:31 [32655/21] Policy | [0x200128c87:0x1db1d:0x0]: > attribute is missing for checking fileset 'scratch' > > The reason for this error is that the entry exists in the database > but does not have a path. And because I have a fileclass that is > based on a tree, the check is failing: > > FileClass scratch { > definition { > tree == "/lustre/scratch" > } > } > > The entry is in the database – the only entry that is missing is the > NAMES entry. But it must have existed at some stage since the STRIPE > entries are there. > > > MariaDB [robinhood_lustre]> select * from ENTRIES where > id='0x200128c87:0x1db1d:0x0'; > +-------------------------+----------+---------+--------+--------+--- > ------------+-------------+------------+---------------+------+---- > --+-------+------------+---------+-------------+--------------+---- > ---------+-------------+-------------+-------------+-------------+--- > ----------+-----------+ > | id | uid | gid | size | blocks | > creation_time | last_access | last_mod | last_mdchange | type | > mode | nlink | md_update | invalid | fileclass | class_update | > lhsm_status | lhsm_archid | lhsm_norels | lhsm_noarch | lhsm_lstarc | > lhsm_lstrst | lhsm_uuid | > +-------------------------+----------+---------+--------+--------+--- > ------------+-------------+------------+---------------+------+---- > --+-------+------------+---------+-------------+--------------+---- > ---------+-------------+-------------+-------------+-------------+--- > ----------+-----------+ > | 0x200128c87:0x1db1d:0x0 | n9614532 | default | 815808 | 1600 | > 1560317119 | 1560317125 | 1560317113 | 1560317119 | file | 484 > | 0 | 1560323027 | 0 | +std_files+ | 1560407251 | > modified | 1 | 0 | 0 | 0 > | 0 | NULL | > +-------------------------+----------+---------+--------+--------+--- > ------------+-------------+------------+---------------+------+---- > --+-------+------------+---------+-------------+--------------+---- > ---------+-------------+-------------+-------------+-------------+--- > ----------+-----------+ > 1 row in set (0.00 sec) > > MariaDB [robinhood_lustre]> select * from NAMES where > id='0x200128c87:0x1db1d:0x0'; > Empty set (0.00 sec) > > MariaDB [robinhood_lustre]> select * from STRIPE_INFO where > id='0x200128c87:0x1db1d:0x0'; > +-------------------------+-----------+--------------+-------------+- > ----------+ > | id | validator | stripe_count | stripe_size | > pool_name | > +-------------------------+-----------+--------------+-------------+- > ----------+ > | 0x200128c87:0x1db1d:0x0 | 0 | 1 | 1048576 > | | > +-------------------------+-----------+--------------+-------------+- > ----------+ > 1 row in set (0.00 sec) > > MariaDB [robinhood_lustre]> select * from STRIPE_ITEMS where > id='0x200128c87:0x1db1d:0x0'; > +-------------------------+--------------+--------+---------------- > ------+ > | id | stripe_index | ostidx | > details | > +-------------------------+--------------+--------+---------------- > ------+ > | 0x200128c87:0x1db1d:0x0 | 0 | 12 | > � | > +-------------------------+--------------+--------+---------------- > ------+ > 1 row in set (0.00 sec) > > > That particular file does not exist any more in Lustre: > [root@robinhood robinhood]# lfs fid2path /lustre > 0x200128c87:0x1db1d:0x0 > fid2path: error on FID 0x200128c87:0x1db1d:0x0: No such file or > directory > > So something did go wrong that did not delete the entry out of the > database. > > This happens now fairly regularly. I seem to accumulate these errors > regularly. A restart of robinhood does seem to clear out these > errors, but new ones accumulate. > > I am running robinhood 3.1.5 on RHEL 7.6 and Lustre 2.10.5. > > My assumption at that stage is that I seem to be hitting some timing > issue, where a file gets deleted while it is processed by the > changelog – resulting in an incomplete database entry. > I have not seen that in the past, but I have not changed anything for > a while, so it is odd that this appeared. > > Is this anything someone has seen before? > Any theory how that could happen? > > > Thanks, > Gerald > > > > > Gerald Hofer > Technical Consultant > High Performance Computing > HPE Pointnext South Pacific Delivery > > +61 418 888 567 Mobile > > Brisbane/Queensland > hpe.com/pointnext > > > > > _______________________________________________ > robinhood-support mailing list > robinhood-support@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/robinhood-support _______________________________________________ robinhood-support mailing list robinhood-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/robinhood-support _______________________________________________ robinhood-support mailing list robinhood-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/robinhood-support