Hi,

Davide Tacchella <dtacchella@cr...> writes:

> Hello,
> 
> On 07/28/17 09:25, Loris Bennett wrote:
> > Hi,
> >
> > The scan on one of my file systems seems to be stuck
> >
> >    Last filesystem scan:
> >            status:          running
> >            start:           2017/07/26 08:29:14 (2d 48min 33s ago)
> >            last action:     2017/07/26 08:30:37 (2d 47min 10s ago)
> >
> > The log file repeatedly shows the same file being processed throughout
> > this period:
> >
> >    2017/07/28 09:14:14 [5738/1] STATS | ======== General statistics 
> > =========
> >    2017/07/28 09:14:14 [5738/1] STATS | Daemon start time: 2017/07/26 
> > 08:29:14
> >    2017/07/28 09:14:14 [5738/1] STATS | Started modules: 
> > scan,policy_run(cleanup)
> >    2017/07/28 09:14:14 [5738/1] STATS | ======== FS scan statistics 
> > =========
> >    2017/07/28 09:14:14 [5738/1] STATS | current scan interval = 1.0d
> >    2017/07/28 09:14:14 [5738/1] STATS | scan is running:
> >    2017/07/28 09:14:14 [5738/1] STATS |      started at : 2017/07/26 
> > 08:29:14 (2.0d ago)
> >    2017/07/28 09:14:14 [5738/1] STATS |      last action: 2017/07/26 
> > 08:30:37 (2.0d ago)
> >    2017/07/28 09:14:14 [5738/1] STATS |      progress   : 387679 entries 
> > scanned (0 errors)
> >    2017/07/28 09:14:14 [5738/1] STATS |      inst. speed (potential):    
> > 490.02 entries/sec (12.24 ms/entry/thread)
> >    2017/07/28 09:14:14 [5738/1] STATS |      avg. speed  (effective):      
> > 2.21 entries/sec (1.22 ms/entry/thread)
> >    2017/07/28 09:14:14 [5738/1] STATS | ==== EntryProcessor Pipeline Stats 
> > ===
> >    2017/07/28 09:14:14 [5738/1] STATS | Idle threads: 0
> >    2017/07/28 09:14:14 [5738/1] STATS | Id constraints count: 10000 (hash 
> > min=0/max=6/avg=0.6)
> >    2017/07/28 09:14:14 [5738/1] STATS | Name constraints count: 10000 (hash 
> > min=0/max=6/avg=0.6)
> >    2017/07/28 09:14:14 [5738/1] STATS | Stage              | Wait | Curr | 
> > Done |     Total | ms/op |
> >    2017/07/28 09:14:14 [5738/1] STATS |  0: GET_FID        |    0 |    0 |  
> >   0 |         0 |  0.00 |
> >    2017/07/28 09:14:14 [5738/1] STATS |  1: GET_INFO_DB    | 9438 |    9 |  
> >   3 |    396274 |  1.85 |
> >    2017/07/28 09:14:14 [5738/1] STATS |  2: GET_INFO_FS    |    0 |    0 |  
> >   0 |    396271 |  0.00 |
> >    2017/07/28 09:14:14 [5738/1] STATS |  3: PRE_APPLY      |    0 |    0 |  
> >   0 |    396271 |  0.00 |
> >    2017/07/28 09:14:14 [5738/1] STATS |  4: DB_APPLY       |  379 |  171 |  
> >   0 |    395721 |  0.21 | 100.00% batched (avg batch size: 116.2)
> >    2017/07/28 09:14:14 [5738/1] STATS |  5: RM_OLD_ENTRIES |    0 |    0 |  
> >   0 |         0 |  0.00 |
> >    2017/07/28 09:14:14 [5738/1] STATS | DB ops: 
> > get=293/ins=1/upd=395720/rm=0
> >    2017/07/28 09:14:14 [5738/1] STATS | --- Pipeline stage details ---
> >    2017/07/28 09:14:14 [5738/1] STATS | GET_INFO_DB   : first: 
> > /home/user01/A/B/run02/wrfV2756-0003-00478-000478-01.hdr, status=processing
> >    2017/07/28 09:14:14 [5738/1] STATS | GET_INFO_DB   : last: 
> > /home/user01/A/B/run03/swrafV3158-0004-00623-000623-01.img, status=waiting
> >    2017/07/28 09:14:14 [5738/1] STATS | DB_APPLY      : first: 
> > 63BBC16B/16033637, status=processing
> >    2017/07/28 09:14:14 [5738/1] STATS | DB_APPLY      : last: 
> > 63BBC16B/397701125, status=waiting
> >
> > Restarting mysql hasn't solved the problem.  The scan on my other file
> > system runs fine.
> >
> > Any ideas what the problem is and how it can be addressed?
> >
> Are you sure the problem is RH ?

As the scan on one file system works and the one on the other file
system doesn't, my assumption is that there is some problem with one or
more of the database tables related to the second file system.

> Current STATS snapshot shows DB is slowing you down, 9400 of 10k actions 
> are queued in get_db_info stage
> 
> for better understanding of the problem would be nice if you could add:
> - at least 2 subsequent STATS showing no progress, you only provided
> one

Here are the last two entries from today:

  2017/07/31 09:44:15 [5738/1] STATS | ==================== Dumping stats at 
2017/07/31 09:44:15 =====================
  2017/07/31 09:44:15 [5738/1] STATS | ======== General statistics =========
  2017/07/31 09:44:15 [5738/1] STATS | Daemon start time: 2017/07/26 08:29:14
  2017/07/31 09:44:15 [5738/1] STATS | Started modules: scan,policy_run(cleanup)
  2017/07/31 09:44:15 [5738/1] STATS | ======== FS scan statistics =========
  2017/07/31 09:44:15 [5738/1] STATS | current scan interval = 1.0d
  2017/07/31 09:44:15 [5738/1] STATS | scan is running:
  2017/07/31 09:44:15 [5738/1] STATS |      started at : 2017/07/26 08:29:14 
(5.1d ago)
  2017/07/31 09:44:15 [5738/1] STATS |      last action: 2017/07/26 08:30:37 
(5.1d ago)
  2017/07/31 09:44:15 [5738/1] STATS |      progress   : 387679 entries scanned 
(0 errors)
  2017/07/31 09:44:15 [5738/1] STATS |      inst. speed (potential):    490.02 
entries/sec (12.24 ms/entry/thread)
  2017/07/31 09:44:15 [5738/1] STATS |      avg. speed  (effective):      0.89 
entries/sec (1.22 ms/entry/thread)
  2017/07/31 09:44:15 [5738/1] STATS | ==== EntryProcessor Pipeline Stats ===
  2017/07/31 09:44:15 [5738/1] STATS | Idle threads: 0
  2017/07/31 09:44:15 [5738/1] STATS | Id constraints count: 10000 (hash 
min=0/max=6/avg=0.6)
  2017/07/31 09:44:15 [5738/1] STATS | Name constraints count: 10000 (hash 
min=0/max=6/avg=0.6)
  2017/07/31 09:44:15 [5738/1] STATS | Stage              | Wait | Curr | Done 
|     Total | ms/op |
  2017/07/31 09:44:15 [5738/1] STATS |  0: GET_FID        |    0 |    0 |    0 
|         0 |  0.00 |
  2017/07/31 09:44:15 [5738/1] STATS |  1: GET_INFO_DB    | 9438 |    9 |    3 
|    396274 |  1.85 |
  2017/07/31 09:44:15 [5738/1] STATS |  2: GET_INFO_FS    |    0 |    0 |    0 
|    396271 |  0.00 |
  2017/07/31 09:44:15 [5738/1] STATS |  3: PRE_APPLY      |    0 |    0 |    0 
|    396271 |  0.00 |
  2017/07/31 09:44:15 [5738/1] STATS |  4: DB_APPLY       |  379 |  171 |    0 
|    395721 |  0.21 | 100.00% batched (avg batch size: 116.2)
  2017/07/31 09:44:15 [5738/1] STATS |  5: RM_OLD_ENTRIES |    0 |    0 |    0 
|         0 |  0.00 |
  2017/07/31 09:44:15 [5738/1] STATS | DB ops: get=293/ins=1/upd=395720/rm=0
  2017/07/31 09:44:15 [5738/1] STATS | --- Pipeline stage details ---
  2017/07/31 09:44:15 [5738/1] STATS | GET_INFO_DB   : first: 
/home/user01/A/B/run02/wrfV2756-0003-00478-000478-01.hdr, status=processing
  2017/07/31 09:44:15 [5738/1] STATS | GET_INFO_DB   : last: 
/home/user01/A/B/run03/swrafV3158-0004-00623-000623-01.img, status=waiting
  2017/07/31 09:44:15 [5738/1] STATS | DB_APPLY      : first: 
63BBC16B/16033637, status=processing
  2017/07/31 09:44:15 [5738/1] STATS | DB_APPLY      : last: 
63BBC16B/397701125, status=waiting
  2017/07/31 09:44:15 [5738/1] STATS | ======= cleanup policy: trigger stats 
======
  2017/07/31 09:44:15 [5738/1] STATS | Trigger #0 (scheduled)        : running.
  2017/07/31 09:44:15 [5738/1] STATS |     last run: none
  2017/07/31 09:44:15 [5738/1] STATS |     total   : none
  2017/07/31 09:44:15 [5738/1] STATS | ======= cleanup policy: action stats 
======
  2017/07/31 09:44:15 [5738/1] STATS | idle threads       = 5
  2017/07/31 09:44:15 [5738/1] STATS | queued actions     = 0
  2017/07/31 09:44:15 [5738/1] STATS | action status:
  2017/07/31 09:44:15 [5738/1] STATS |     action successful              = 0
  2017/07/31 09:44:15 [5738/1] STATS |     no matching rule               = 8832
  2017/07/31 09:44:15 [5738/1] STATS | 0 actions successful/8832, 0 (0 blocks)
  2017/07/31 09:44:15 [5738/1] STATS | last action queued    436418 s ago
  2017/07/31 09:44:15 [5738/1] STATS | last action started   436418 s ago
  2017/07/31 09:44:15 [5738/1] STATS | last action completed 436418 s ago
  2017/07/31 09:59:15 [5738/1] STATS | ==================== Dumping stats at 
2017/07/31 09:59:15 =====================
  2017/07/31 09:59:15 [5738/1] STATS | ======== General statistics =========
  2017/07/31 09:59:15 [5738/1] STATS | Daemon start time: 2017/07/26 08:29:14
  2017/07/31 09:59:15 [5738/1] STATS | Started modules: scan,policy_run(cleanup)
  2017/07/31 09:59:15 [5738/1] STATS | ======== FS scan statistics =========
  2017/07/31 09:59:15 [5738/1] STATS | current scan interval = 1.0d
  2017/07/31 09:59:15 [5738/1] STATS | scan is running:
  2017/07/31 09:59:15 [5738/1] STATS |      started at : 2017/07/26 08:29:14 
(5.1d ago)
  2017/07/31 09:59:15 [5738/1] STATS |      last action: 2017/07/26 08:30:37 
(5.1d ago)
  2017/07/31 09:59:15 [5738/1] STATS |      progress   : 387679 entries scanned 
(0 errors)
  2017/07/31 09:59:15 [5738/1] STATS |      inst. speed (potential):    490.02 
entries/sec (12.24 ms/entry/thread)
  2017/07/31 09:59:15 [5738/1] STATS |      avg. speed  (effective):      0.89 
entries/sec (1.22 ms/entry/thread)
  2017/07/31 09:59:15 [5738/1] STATS | ==== EntryProcessor Pipeline Stats ===
  2017/07/31 09:59:15 [5738/1] STATS | Idle threads: 0
  2017/07/31 09:59:15 [5738/1] STATS | Id constraints count: 10000 (hash 
min=0/max=6/avg=0.6)
  2017/07/31 09:59:15 [5738/1] STATS | Name constraints count: 10000 (hash 
min=0/max=6/avg=0.6)
  2017/07/31 09:59:15 [5738/1] STATS | Stage              | Wait | Curr | Done 
|     Total | ms/op |
  2017/07/31 09:59:15 [5738/1] STATS |  0: GET_FID        |    0 |    0 |    0 
|         0 |  0.00 |
  2017/07/31 09:59:15 [5738/1] STATS |  1: GET_INFO_DB    | 9438 |    9 |    3 
|    396274 |  1.85 |
  2017/07/31 09:59:15 [5738/1] STATS |  2: GET_INFO_FS    |    0 |    0 |    0 
|    396271 |  0.00 |
  2017/07/31 09:59:15 [5738/1] STATS |  3: PRE_APPLY      |    0 |    0 |    0 
|    396271 |  0.00 |
  2017/07/31 09:59:15 [5738/1] STATS |  4: DB_APPLY       |  379 |  171 |    0 
|    395721 |  0.21 | 100.00% batched (avg batch size: 116.2)
  2017/07/31 09:59:15 [5738/1] STATS |  5: RM_OLD_ENTRIES |    0 |    0 |    0 
|         0 |  0.00 |
  2017/07/31 09:59:15 [5738/1] STATS | DB ops: get=293/ins=1/upd=395720/rm=0
  2017/07/31 09:59:15 [5738/1] STATS | --- Pipeline stage details ---
  2017/07/31 09:59:15 [5738/1] STATS | GET_INFO_DB   : first: 
/home/user01/A/B/run02/wrfV2756-0003-00478-000478-01.hdr, status=processing
  2017/07/31 09:59:15 [5738/1] STATS | GET_INFO_DB   : last: 
/home/user01/A/B/run03/swrafV3158-0004-00623-000623-01.img, status=waiting
  2017/07/31 09:59:15 [5738/1] STATS | DB_APPLY      : first: 
63BBC16B/16033637, status=processing
  2017/07/31 09:59:15 [5738/1] STATS | DB_APPLY      : last: 
63BBC16B/397701125, status=waiting
  2017/07/31 09:59:15 [5738/1] STATS | ======= cleanup policy: trigger stats 
======
  2017/07/31 09:59:15 [5738/1] STATS | Trigger #0 (scheduled)        : running.
  2017/07/31 09:59:15 [5738/1] STATS |     last run: none
  2017/07/31 09:59:15 [5738/1] STATS |     total   : none
  2017/07/31 09:59:15 [5738/1] STATS | ======= cleanup policy: action stats 
======
  2017/07/31 09:59:15 [5738/1] STATS | idle threads       = 5
  2017/07/31 09:59:15 [5738/1] STATS | queued actions     = 0
  2017/07/31 09:59:15 [5738/1] STATS | action status:
  2017/07/31 09:59:15 [5738/1] STATS |     action successful              = 0
  2017/07/31 09:59:15 [5738/1] STATS |     no matching rule               = 8832
  2017/07/31 09:59:15 [5738/1] STATS | 0 actions successful/8832, 0 (0 blocks)
  2017/07/31 09:59:15 [5738/1] STATS | last action queued    437318 s ago
  2017/07/31 09:59:15 [5738/1] STATS | last action started   437318 s ago
  2017/07/31 09:59:15 [5738/1] STATS | last action completed 437318 s ago

As you can see, the GET_INFO_DB lines still refer to the same files.

> - check processes in MySQL (show processlist)

  $ mysqladmin processlist | grep home
  | 555   | Query   | 437722 | Locked         | SELECT id FROM ENTRIES WHERE 
id='63BBC16B:F39B83'                                                    |
  | 556   | Query   | 437722 | Locked         | SELECT id FROM ENTRIES WHERE 
id='63BBC16B:F4A805'                                                    |
  | 557   | Query   | 437722 | Locked         | SELECT id FROM ENTRIES WHERE 
id='63BBC16B:F4A806'                                                    |
  | 558   | Query   | 437722 | Locked         | SELECT id FROM ENTRIES WHERE 
id='63BBC16B:F42F31'                                                    |
  | 559   | Query   | 437722 | Locked         | SELECT id FROM ENTRIES WHERE 
id='63BBC16B:F3CB45'                                                    |
  | 560   | Query   | 437722 | Locked         | SELECT id FROM ENTRIES WHERE 
id='63BBC16B:F42F32'                                                    |
  | 561   | Query   | 437722 | Locked         | SELECT id FROM ENTRIES WHERE 
id='63BBC16B:F4A837'                                                    |
  | 562   | Query   | 437722 | Locked         | SELECT id FROM ENTRIES WHERE 
id='63BBC16B:F39B82'                                                    |
  | 563   | Query   | 437722 | Locked         | SELECT id FROM ENTRIES WHERE 
id='63BBC16B:17B3807D'                                                  |
  | 564   | Query   | 437722 | Locked         | INSERT INTO 
NAMES(id,parent_id,name,path_update,pkn) VALUES 
('63BBC16B:F4A765','63BBC16B:F46FDD','wr |
  | 567   | Query   | 437722 | Locked         | UPDATE ENTRIES SET 
uid='501',gid='staff',size=3983,blocks=8,last_access=1487848194,last_mod=14572268
 |
  | 571   | Query   | 437722 | statistics     | SELECT parent_id, name INTO 
pid, n from NAMES WHERE id= NAME_CONST('pid',_binary'63BBC16B:E563E8' CO |
  | 572   | Sleep   | 404    |                |                                 
                                                                     |
  | 21103 | Query   | 194638 | Locked         | SELECT ENTRIES.id AS id FROM 
ENTRIES WHERE ENTRIES.type='file' AND (ENTRIES.last_access<1498701721 O |

On subsequent calls only process 571 seems to change, everything else is locked:

  | 571   | Query   | 437832 | Sorting result | SELECT parent_id, name INTO 
pid, n from NAMES WHERE id= NAME_CONST('pid',_binary'63BBC16B:E563E8' CO |

  | 571   | Query   | 437873 | Sorting result | SELECT parent_id, name INTO 
pid, n from NAMES WHERE id= NAME_CONST('pid',_binary'63BBC16B:E563DE' CO |

  | 571   | Query   | 437888 | statistics     | SELECT parent_id, name INTO 
pid, n from NAMES WHERE id= NAME_CONST('pid',_binary'63BBC16B:E563DE' CO |

  | 571   | Query   | 437915 |                | SELECT CONCAT(  
NAME_CONST('n',_binary'RAxML_log.At5g20170_mafftus_besttree.RUN.0' COLLATE 
'binary') |

> - in RH scan configuration do you have any of these ?
>      scan_op_timeout        =    60min;
>      exit_on_timeout        =    TRUE;

I don't have any timeouts set.

I don't think a change in configuration caused the problem.  I think it
is more to do with a very large change in the number of files at some
point (the file system was filling up and we convinced a number of users
to delete stuff).

I'm not particularly attached to the data, so what would be the best way
to just drop the existing data and rescan?

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to