Hi All,
I've just moved our scanning service to a new box and was having real
trouble keeping a high scan rate -- typical stats were looking like
this:
==================== Dumping stats at 2016/02/19 04:09:31 =====================
======== General statistics =========
Daemon start time: 2016/02/18 14:09:29
Started modules: scan
======== FS scan statistics =========
current scan interval = 180.0d
scan is running:
started at : 2016/02/18 14:09:29 (14.0h ago)
last action: 2016/02/19 04:09:30 (01s ago)
progress : 6697182 entries scanned (0 errors)
inst. speed (potential): 18.82 entries/sec (425.15 ms/entry/thread)
avg. speed (effective): 132.88 entries/sec (57.54 ms/entry/thread)
==== EntryProcessor Pipeline Stats ===
Idle threads: 12
Id constraints count: 0 (hash min=0/max=0/avg=0.0)
Name constraints count: 0 (hash min=0/max=0/avg=0.0)
Stage | Wait | Curr | Done | Total | ms/op |
0: GET_FID | 0 | 0 | 0 | 41487 | 1.75 |
1: GET_INFO_DB | 0 | 0 | 0 | 6738285 | 0.30 |
2: GET_INFO_FS | 0 | 0 | 0 | 6738285 | 0.01 |
3: REPORTING | 0 | 0 | 0 | 6738285 | 0.01 |
4: PRE_APPLY | 0 | 0 | 0 | 6738285 | 0.00 |
5: DB_APPLY | 0 | 0 | 0 | 6738285 | 0.71 | 95.46%
batched (avg batch size: 70.0)
6: CHGLOG_CLR | 0 | 0 | 0 | 0 | 0.00 |
7: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
DB ops: get=0/ins=6698052/upd=40227/rm=0
which despite the low timings (<2ms per stage) just didn't add up to
why we were seeing such terrible inst.speed (~425ms/entry/thread) and
the scan was, taking aaaaages.
however, one trivial re-enabling of nscd and lo:
==================== Dumping stats at 2016/02/19 07:29:31 =====================
======== General statistics =========
Daemon start time: 2016/02/18 14:09:29
Started modules: scan
======== FS scan statistics =========
current scan interval = 180.0d
scan is running:
started at : 2016/02/18 14:09:29 (17.3h ago)
last action: 2016/02/19 07:29:31 (00s ago)
progress : 8630417 entries scanned (0 errors)
inst. speed (potential): 2575.56 entries/sec (3.11 ms/entry/thread)
avg. speed (effective): 138.30 entries/sec (57.84 ms/entry/thread)
==== EntryProcessor Pipeline Stats ===
Idle threads: 8
Id constraints count: 7726 (hash min=0/max=4/avg=0.5)
Name constraints count: 7727 (hash min=0/max=5/avg=0.5)
Stage | Wait | Curr | Done | Total | ms/op |
0: GET_FID | 0 | 0 | 0 | 60243 | 1.78 |
1: GET_INFO_DB | 0 | 2 | 0 | 8630592 | 0.30 |
2: GET_INFO_FS | 0 | 1 | 7 | 8630591 | 0.01 |
3: REPORTING | 0 | 0 | 0 | 8630584 | 0.01 |
4: PRE_APPLY | 0 | 0 | 0 | 8630584 | 0.00 |
5: DB_APPLY | 6718 | 1000 | 0 | 8622866 | 0.73 | 93.85%
batched (avg batch size: 69.8)
6: CHGLOG_CLR | 0 | 0 | 0 | 0 | 0.00 |
7: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |
DB ops: get=61/ins=8548235/upd=74625/rm=0
--- Pipeline stage details ---
GET_FID : (1 op):
/group/mwaops/PULSAR/src/tempo2/plugin/.libs/findCWs_linux_plug.la,
status=waiting
GET_INFO_DB : (1 op):
/group/mwaops/PULSAR/src/psrfits_utils/config/ltsugar.m4,
status=waiting
GET_INFO_FS : (1 op):
/group/mwaops/PULSAR/src/psrfits_utils/.deps/fold_psrfits-fold_psrfits.Po,
status=waiting
REPORTING : (1 op): /group/mwaops/PULSAR/src/tempo2/sofa/zp.for,
status=waiting
DB_APPLY : first: [0x20000fc74:0x2216:0x0], status=processing
DB_APPLY : last: [0x20000fc74:0xff04:0x0], status=waiting
Where I can see that the bottleneck is now back on the database rather
than "something" that's not shown in the logs.
It may be worth adding this as a recommendation in the wiki (hence why
mailing to list)
Andrew
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support