Hi, Davide Tacchella <dtacchella@cr...> writes:
> Hello, > > On 07/28/17 09:25, Loris Bennett wrote: > > Hi, > > > > The scan on one of my file systems seems to be stuck > > > > Last filesystem scan: > > status: running > > start: 2017/07/26 08:29:14 (2d 48min 33s ago) > > last action: 2017/07/26 08:30:37 (2d 47min 10s ago) > > > > The log file repeatedly shows the same file being processed throughout > > this period: > > > > 2017/07/28 09:14:14 [5738/1] STATS | ======== General statistics > > ========= > > 2017/07/28 09:14:14 [5738/1] STATS | Daemon start time: 2017/07/26 > > 08:29:14 > > 2017/07/28 09:14:14 [5738/1] STATS | Started modules: > > scan,policy_run(cleanup) > > 2017/07/28 09:14:14 [5738/1] STATS | ======== FS scan statistics > > ========= > > 2017/07/28 09:14:14 [5738/1] STATS | current scan interval = 1.0d > > 2017/07/28 09:14:14 [5738/1] STATS | scan is running: > > 2017/07/28 09:14:14 [5738/1] STATS | started at : 2017/07/26 > > 08:29:14 (2.0d ago) > > 2017/07/28 09:14:14 [5738/1] STATS | last action: 2017/07/26 > > 08:30:37 (2.0d ago) > > 2017/07/28 09:14:14 [5738/1] STATS | progress : 387679 entries > > scanned (0 errors) > > 2017/07/28 09:14:14 [5738/1] STATS | inst. speed (potential): > > 490.02 entries/sec (12.24 ms/entry/thread) > > 2017/07/28 09:14:14 [5738/1] STATS | avg. speed (effective): > > 2.21 entries/sec (1.22 ms/entry/thread) > > 2017/07/28 09:14:14 [5738/1] STATS | ==== EntryProcessor Pipeline Stats > > === > > 2017/07/28 09:14:14 [5738/1] STATS | Idle threads: 0 > > 2017/07/28 09:14:14 [5738/1] STATS | Id constraints count: 10000 (hash > > min=0/max=6/avg=0.6) > > 2017/07/28 09:14:14 [5738/1] STATS | Name constraints count: 10000 (hash > > min=0/max=6/avg=0.6) > > 2017/07/28 09:14:14 [5738/1] STATS | Stage | Wait | Curr | > > Done | Total | ms/op | > > 2017/07/28 09:14:14 [5738/1] STATS | 0: GET_FID | 0 | 0 | > > 0 | 0 | 0.00 | > > 2017/07/28 09:14:14 [5738/1] STATS | 1: GET_INFO_DB | 9438 | 9 | > > 3 | 396274 | 1.85 | > > 2017/07/28 09:14:14 [5738/1] STATS | 2: GET_INFO_FS | 0 | 0 | > > 0 | 396271 | 0.00 | > > 2017/07/28 09:14:14 [5738/1] STATS | 3: PRE_APPLY | 0 | 0 | > > 0 | 396271 | 0.00 | > > 2017/07/28 09:14:14 [5738/1] STATS | 4: DB_APPLY | 379 | 171 | > > 0 | 395721 | 0.21 | 100.00% batched (avg batch size: 116.2) > > 2017/07/28 09:14:14 [5738/1] STATS | 5: RM_OLD_ENTRIES | 0 | 0 | > > 0 | 0 | 0.00 | > > 2017/07/28 09:14:14 [5738/1] STATS | DB ops: > > get=293/ins=1/upd=395720/rm=0 > > 2017/07/28 09:14:14 [5738/1] STATS | --- Pipeline stage details --- > > 2017/07/28 09:14:14 [5738/1] STATS | GET_INFO_DB : first: > > /home/user01/A/B/run02/wrfV2756-0003-00478-000478-01.hdr, status=processing > > 2017/07/28 09:14:14 [5738/1] STATS | GET_INFO_DB : last: > > /home/user01/A/B/run03/swrafV3158-0004-00623-000623-01.img, status=waiting > > 2017/07/28 09:14:14 [5738/1] STATS | DB_APPLY : first: > > 63BBC16B/16033637, status=processing > > 2017/07/28 09:14:14 [5738/1] STATS | DB_APPLY : last: > > 63BBC16B/397701125, status=waiting > > > > Restarting mysql hasn't solved the problem. The scan on my other file > > system runs fine. > > > > Any ideas what the problem is and how it can be addressed? > > > Are you sure the problem is RH ? As the scan on one file system works and the one on the other file system doesn't, my assumption is that there is some problem with one or more of the database tables related to the second file system. > Current STATS snapshot shows DB is slowing you down, 9400 of 10k actions > are queued in get_db_info stage > > for better understanding of the problem would be nice if you could add: > - at least 2 subsequent STATS showing no progress, you only provided > one Here are the last two entries from today: 2017/07/31 09:44:15 [5738/1] STATS | ==================== Dumping stats at 2017/07/31 09:44:15 ===================== 2017/07/31 09:44:15 [5738/1] STATS | ======== General statistics ========= 2017/07/31 09:44:15 [5738/1] STATS | Daemon start time: 2017/07/26 08:29:14 2017/07/31 09:44:15 [5738/1] STATS | Started modules: scan,policy_run(cleanup) 2017/07/31 09:44:15 [5738/1] STATS | ======== FS scan statistics ========= 2017/07/31 09:44:15 [5738/1] STATS | current scan interval = 1.0d 2017/07/31 09:44:15 [5738/1] STATS | scan is running: 2017/07/31 09:44:15 [5738/1] STATS | started at : 2017/07/26 08:29:14 (5.1d ago) 2017/07/31 09:44:15 [5738/1] STATS | last action: 2017/07/26 08:30:37 (5.1d ago) 2017/07/31 09:44:15 [5738/1] STATS | progress : 387679 entries scanned (0 errors) 2017/07/31 09:44:15 [5738/1] STATS | inst. speed (potential): 490.02 entries/sec (12.24 ms/entry/thread) 2017/07/31 09:44:15 [5738/1] STATS | avg. speed (effective): 0.89 entries/sec (1.22 ms/entry/thread) 2017/07/31 09:44:15 [5738/1] STATS | ==== EntryProcessor Pipeline Stats === 2017/07/31 09:44:15 [5738/1] STATS | Idle threads: 0 2017/07/31 09:44:15 [5738/1] STATS | Id constraints count: 10000 (hash min=0/max=6/avg=0.6) 2017/07/31 09:44:15 [5738/1] STATS | Name constraints count: 10000 (hash min=0/max=6/avg=0.6) 2017/07/31 09:44:15 [5738/1] STATS | Stage | Wait | Curr | Done | Total | ms/op | 2017/07/31 09:44:15 [5738/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 | 2017/07/31 09:44:15 [5738/1] STATS | 1: GET_INFO_DB | 9438 | 9 | 3 | 396274 | 1.85 | 2017/07/31 09:44:15 [5738/1] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 396271 | 0.00 | 2017/07/31 09:44:15 [5738/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 396271 | 0.00 | 2017/07/31 09:44:15 [5738/1] STATS | 4: DB_APPLY | 379 | 171 | 0 | 395721 | 0.21 | 100.00% batched (avg batch size: 116.2) 2017/07/31 09:44:15 [5738/1] STATS | 5: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 | 2017/07/31 09:44:15 [5738/1] STATS | DB ops: get=293/ins=1/upd=395720/rm=0 2017/07/31 09:44:15 [5738/1] STATS | --- Pipeline stage details --- 2017/07/31 09:44:15 [5738/1] STATS | GET_INFO_DB : first: /home/user01/A/B/run02/wrfV2756-0003-00478-000478-01.hdr, status=processing 2017/07/31 09:44:15 [5738/1] STATS | GET_INFO_DB : last: /home/user01/A/B/run03/swrafV3158-0004-00623-000623-01.img, status=waiting 2017/07/31 09:44:15 [5738/1] STATS | DB_APPLY : first: 63BBC16B/16033637, status=processing 2017/07/31 09:44:15 [5738/1] STATS | DB_APPLY : last: 63BBC16B/397701125, status=waiting 2017/07/31 09:44:15 [5738/1] STATS | ======= cleanup policy: trigger stats ====== 2017/07/31 09:44:15 [5738/1] STATS | Trigger #0 (scheduled) : running. 2017/07/31 09:44:15 [5738/1] STATS | last run: none 2017/07/31 09:44:15 [5738/1] STATS | total : none 2017/07/31 09:44:15 [5738/1] STATS | ======= cleanup policy: action stats ====== 2017/07/31 09:44:15 [5738/1] STATS | idle threads = 5 2017/07/31 09:44:15 [5738/1] STATS | queued actions = 0 2017/07/31 09:44:15 [5738/1] STATS | action status: 2017/07/31 09:44:15 [5738/1] STATS | action successful = 0 2017/07/31 09:44:15 [5738/1] STATS | no matching rule = 8832 2017/07/31 09:44:15 [5738/1] STATS | 0 actions successful/8832, 0 (0 blocks) 2017/07/31 09:44:15 [5738/1] STATS | last action queued 436418 s ago 2017/07/31 09:44:15 [5738/1] STATS | last action started 436418 s ago 2017/07/31 09:44:15 [5738/1] STATS | last action completed 436418 s ago 2017/07/31 09:59:15 [5738/1] STATS | ==================== Dumping stats at 2017/07/31 09:59:15 ===================== 2017/07/31 09:59:15 [5738/1] STATS | ======== General statistics ========= 2017/07/31 09:59:15 [5738/1] STATS | Daemon start time: 2017/07/26 08:29:14 2017/07/31 09:59:15 [5738/1] STATS | Started modules: scan,policy_run(cleanup) 2017/07/31 09:59:15 [5738/1] STATS | ======== FS scan statistics ========= 2017/07/31 09:59:15 [5738/1] STATS | current scan interval = 1.0d 2017/07/31 09:59:15 [5738/1] STATS | scan is running: 2017/07/31 09:59:15 [5738/1] STATS | started at : 2017/07/26 08:29:14 (5.1d ago) 2017/07/31 09:59:15 [5738/1] STATS | last action: 2017/07/26 08:30:37 (5.1d ago) 2017/07/31 09:59:15 [5738/1] STATS | progress : 387679 entries scanned (0 errors) 2017/07/31 09:59:15 [5738/1] STATS | inst. speed (potential): 490.02 entries/sec (12.24 ms/entry/thread) 2017/07/31 09:59:15 [5738/1] STATS | avg. speed (effective): 0.89 entries/sec (1.22 ms/entry/thread) 2017/07/31 09:59:15 [5738/1] STATS | ==== EntryProcessor Pipeline Stats === 2017/07/31 09:59:15 [5738/1] STATS | Idle threads: 0 2017/07/31 09:59:15 [5738/1] STATS | Id constraints count: 10000 (hash min=0/max=6/avg=0.6) 2017/07/31 09:59:15 [5738/1] STATS | Name constraints count: 10000 (hash min=0/max=6/avg=0.6) 2017/07/31 09:59:15 [5738/1] STATS | Stage | Wait | Curr | Done | Total | ms/op | 2017/07/31 09:59:15 [5738/1] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 | 2017/07/31 09:59:15 [5738/1] STATS | 1: GET_INFO_DB | 9438 | 9 | 3 | 396274 | 1.85 | 2017/07/31 09:59:15 [5738/1] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 396271 | 0.00 | 2017/07/31 09:59:15 [5738/1] STATS | 3: PRE_APPLY | 0 | 0 | 0 | 396271 | 0.00 | 2017/07/31 09:59:15 [5738/1] STATS | 4: DB_APPLY | 379 | 171 | 0 | 395721 | 0.21 | 100.00% batched (avg batch size: 116.2) 2017/07/31 09:59:15 [5738/1] STATS | 5: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 | 2017/07/31 09:59:15 [5738/1] STATS | DB ops: get=293/ins=1/upd=395720/rm=0 2017/07/31 09:59:15 [5738/1] STATS | --- Pipeline stage details --- 2017/07/31 09:59:15 [5738/1] STATS | GET_INFO_DB : first: /home/user01/A/B/run02/wrfV2756-0003-00478-000478-01.hdr, status=processing 2017/07/31 09:59:15 [5738/1] STATS | GET_INFO_DB : last: /home/user01/A/B/run03/swrafV3158-0004-00623-000623-01.img, status=waiting 2017/07/31 09:59:15 [5738/1] STATS | DB_APPLY : first: 63BBC16B/16033637, status=processing 2017/07/31 09:59:15 [5738/1] STATS | DB_APPLY : last: 63BBC16B/397701125, status=waiting 2017/07/31 09:59:15 [5738/1] STATS | ======= cleanup policy: trigger stats ====== 2017/07/31 09:59:15 [5738/1] STATS | Trigger #0 (scheduled) : running. 2017/07/31 09:59:15 [5738/1] STATS | last run: none 2017/07/31 09:59:15 [5738/1] STATS | total : none 2017/07/31 09:59:15 [5738/1] STATS | ======= cleanup policy: action stats ====== 2017/07/31 09:59:15 [5738/1] STATS | idle threads = 5 2017/07/31 09:59:15 [5738/1] STATS | queued actions = 0 2017/07/31 09:59:15 [5738/1] STATS | action status: 2017/07/31 09:59:15 [5738/1] STATS | action successful = 0 2017/07/31 09:59:15 [5738/1] STATS | no matching rule = 8832 2017/07/31 09:59:15 [5738/1] STATS | 0 actions successful/8832, 0 (0 blocks) 2017/07/31 09:59:15 [5738/1] STATS | last action queued 437318 s ago 2017/07/31 09:59:15 [5738/1] STATS | last action started 437318 s ago 2017/07/31 09:59:15 [5738/1] STATS | last action completed 437318 s ago As you can see, the GET_INFO_DB lines still refer to the same files. > - check processes in MySQL (show processlist) $ mysqladmin processlist | grep home | 555 | Query | 437722 | Locked | SELECT id FROM ENTRIES WHERE id='63BBC16B:F39B83' | | 556 | Query | 437722 | Locked | SELECT id FROM ENTRIES WHERE id='63BBC16B:F4A805' | | 557 | Query | 437722 | Locked | SELECT id FROM ENTRIES WHERE id='63BBC16B:F4A806' | | 558 | Query | 437722 | Locked | SELECT id FROM ENTRIES WHERE id='63BBC16B:F42F31' | | 559 | Query | 437722 | Locked | SELECT id FROM ENTRIES WHERE id='63BBC16B:F3CB45' | | 560 | Query | 437722 | Locked | SELECT id FROM ENTRIES WHERE id='63BBC16B:F42F32' | | 561 | Query | 437722 | Locked | SELECT id FROM ENTRIES WHERE id='63BBC16B:F4A837' | | 562 | Query | 437722 | Locked | SELECT id FROM ENTRIES WHERE id='63BBC16B:F39B82' | | 563 | Query | 437722 | Locked | SELECT id FROM ENTRIES WHERE id='63BBC16B:17B3807D' | | 564 | Query | 437722 | Locked | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES ('63BBC16B:F4A765','63BBC16B:F46FDD','wr | | 567 | Query | 437722 | Locked | UPDATE ENTRIES SET uid='501',gid='staff',size=3983,blocks=8,last_access=1487848194,last_mod=14572268 | | 571 | Query | 437722 | statistics | SELECT parent_id, name INTO pid, n from NAMES WHERE id= NAME_CONST('pid',_binary'63BBC16B:E563E8' CO | | 572 | Sleep | 404 | | | | 21103 | Query | 194638 | Locked | SELECT ENTRIES.id AS id FROM ENTRIES WHERE ENTRIES.type='file' AND (ENTRIES.last_access<1498701721 O | On subsequent calls only process 571 seems to change, everything else is locked: | 571 | Query | 437832 | Sorting result | SELECT parent_id, name INTO pid, n from NAMES WHERE id= NAME_CONST('pid',_binary'63BBC16B:E563E8' CO | | 571 | Query | 437873 | Sorting result | SELECT parent_id, name INTO pid, n from NAMES WHERE id= NAME_CONST('pid',_binary'63BBC16B:E563DE' CO | | 571 | Query | 437888 | statistics | SELECT parent_id, name INTO pid, n from NAMES WHERE id= NAME_CONST('pid',_binary'63BBC16B:E563DE' CO | | 571 | Query | 437915 | | SELECT CONCAT( NAME_CONST('n',_binary'RAxML_log.At5g20170_mafftus_besttree.RUN.0' COLLATE 'binary') | > - in RH scan configuration do you have any of these ? > scan_op_timeout = 60min; > exit_on_timeout = TRUE; I don't have any timeouts set. I don't think a change in configuration caused the problem. I think it is more to do with a very large change in the number of files at some point (the file system was filling up and we convinced a number of users to delete stuff). I'm not particularly attached to the data, so what would be the best way to just drop the existing data and rescan? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ robinhood-support mailing list robinhood-support@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/robinhood-support