Re: [lustre-discuss] some clients dmesg filled up with "dirty page discard"
no, on oss we found only the client who reported " dirty page discard " being evicted. we hit this again last night, and on oss we can see logs like: " [Tue Aug 25 23:40:12 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.10.3.223@o2ib ns: filter-public1-OST_UUID lock: 9f1f91cba880/0x3fcc67dad1c65842 lrc: 3/0,0 mode: PR/PR res: [0xde2db83:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->270335) flags: 0x6400020020 nid: 10.10.3.223@o2ib remote: 0xd713b7b417045252 expref: 7081 pid: 25923 timeout: 21386699 lvb_type: 0 [Tue Aug 25 23:40:12 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages [Tue Aug 25 23:40:14 2020] LustreError: 26000:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@9f13259a6300 x1653628454261296/t0(0) o106->public1-OST@10.10.3.223@o2ib:15/16 lens 296/280 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ rc 0/-1 [Tue Aug 25 23:40:14 2020] LustreError: 26000:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 14 previous similar messages [Tue Aug 25 23:40:26 2020] LustreError: 25917:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@9f1339a5c800 x1653628454263632/t0(0) o106->public1-OST0002@10.10.3.223@o2ib:15/16 lens 296/280 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ rc 0/-1 [Tue Aug 25 23:40:26 2020] LustreError: 25917:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages [Tue Aug 25 23:44:59 2020] LustreError: 32485:0:(tgt_grant.c:750:tgt_grant_check()) public1-OST: cli 3a021350-bbe4-b05e-7ddf-95009f8dff7b claims 28672 GRANT, real grant 0 [Tue Aug 25 23:44:59 2020] LustreError: 32485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 5755 previous similar messages [Tue Aug 25 23:49:18 2020] Lustre: public1-OST0002: Connection restored to 87ca2182-98a3-25dd-7d30-989d822381c6 (at 10.10.5.6@o2ib) [Tue Aug 25 23:49:18 2020] Lustre: Skipped 102 previous similar messages [Tue Aug 25 23:55:00 2020] LustreError: 32485:0:(tgt_grant.c:750:tgt_grant_check()) public1-OST0004: cli 3a021350-bbe4-b05e-7ddf-95009f8dff7b claims 577536 GRANT, real grant 0 [Tue Aug 25 23:55:00 2020] LustreError: 32485:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 1121 previous similar messages [Tue Aug 25 23:59:25 2020] Lustre: public1-OST: Connection restored to d45ad9f4-8903-7c80-7b35-bd32037de660 (at 10.10.7.131@o2ib) [Tue Aug 25 23:59:25 2020] Lustre: Skipped 50 previous similar messages [Tue Aug 25 23:59:49 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 156s: evicting client at 10.10.3.223@o2ib ns: filter-public1-OST_UUID lock: 9f130863a880/0x3fcc67dad1cff1d5 lrc: 3/0,0 mode: PR/PR res: [0xde2db83:0x0:0x0].0x0 rrc: 4 type: EXT [0->18446744073709551615] (req 3911680->4173823) flags: 0x620020 nid: 10.10.3.223@o2ib remote: 0xd713b7b417354237 expref: 11891 pid: 26099 timeout: 21387847 lvb_type: 0 [Tue Aug 25 23:59:49 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages [Wed Aug 26 00:00:40 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.10.3.223@o2ib ns: filter-public1-OST0004_UUID lock: 9f2df4a10d80/0x3fcc67dad1d50925 lrc: 3/0,0 mode: PR/PR res: [0xdc95179:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->266239) flags: 0x640020 nid: 10.10.3.223@o2ib remote: 0xd713b7b417549c43 expref: 14594 pid: 26181 timeout: 21387927 lvb_type: 0 [Wed Aug 26 00:00:40 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message [Wed Aug 26 00:02:37 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.10.3.223@o2ib ns: filter-public1-OST_UUID lock: 9f1359e94a40/0x3fcc67dad1dacd8b lrc: 3/0,0 mode: PR/PR res: [0xde609f1:0x0:0x0].0x0 rrc: 4 type: EXT [0->18446744073709551615] (req 1941504->2097151) flags: 0x6400020020 nid: 10.10.3.223@o2ib remote: 0xd713b7b417780209 expref: 5626 pid: 26134 timeout: 21388044 lvb_type: 0 [Wed Aug 26 00:02:37 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message [Wed Aug 26 00:05:00 2020] LustreError: 26199:0:(tgt_grant.c:750:tgt_grant_check()) public1-OST0004: cli 3a021350-bbe4-b05e-7ddf-95009f8dff7b claims 28672 GRANT, real grant 0 [Wed Aug 26 00:05:00 2020] LustreError: 26199:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 14028 previous similar messages [Wed Aug 26 00:09:30 2020] Lustre: public1-OST: Connection restored to 956559c4-4e7c-e6a5-3867-83ab85699688 (at 10.10.6.91@o2ib) [Wed Aug 26 00:09:30 2020] Lustre: Skipped 39 previous similar messages [Wed Aug 26 00:10:27 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 147s: evicting
[lustre-discuss] Complete list of rules for PCC
I am looking for the various policy rules which can be applied for Lustre Persistent Client Cache. In the docs, I see below example using projid, fname and uid. Where can I find a complete list of supported rules. Also is there a way for PCC to only cache content of few folders http://doc.lustre.org/lustre_manual.xhtml#pcc.design.rules The following command adds a PCC backend on a client: client# lctl pcc add /mnt/lustre /mnt/pcc --param "projid={500,1000}={*.h5},uid=1001 rwid=2" The first substring of the config parameter is the auto-cache rule, where "&" represents the logical AND operator while "," represents the logical OR operator. The example rule means that new files are only auto cached if either of the following conditions are satisfied: The project ID is either 500 or 1000 and the suffix of the file name is "h5"; The user ID is 1001; ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] error mounting client
Your output shows Infiniband NIDs (@o2ib). If you are mounting @tcp what is your tcp access method to the Infiniband file system? Multihomed? lnet router? --Jeff On Tue, Aug 25, 2020 at 8:32 AM Peeples, Heath wrote: > We have just build a 2.12.5 cluster. When trying to mount the fs (via > tcp). I get the following errors. Would anyone have an idea what the > problem might be? Thanks in advance > > > > > > [10680.535157] LustreError: 15c-8: MGC192.168.8.8@tcp: The configuration > from log 'ldata-client' failed (-2). This may be the result of > communication errors between this node and the MGS, a bad configuration, or > other errors. See the syslog for more information. > > [10680.883649] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup()) > ldata-clilov-91b118df1000: lov tgt 0 not cleaned! deathrow=0, lovrc=1 > > [10680.886610] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup()) Skipped > 4 previous similar messages > > [10680.890298] LustreError: 12634:0:(obd_config.c:610:class_cleanup()) > Device 9 not setup > > [10680.891816] Lustre: Unmounted ldata-client > > [10680.895178] LustreError: 12634:0:(obd_mount.c:1608:lustre_fill_super()) > Unable to mount (-2) > > [10763.516841] LustreError: 12732:0:(ldlm_lib.c:494:client_obd_setup()) > can't add initial connection > > [10763.518368] LustreError: 12732:0:(obd_config.c:559:class_setup()) setup > ldata-OST0006-osc-91b125029800 failed (-2) > > [10763.519806] LustreError: > 12732:0:(obd_config.c:1835:class_config_llog_handler()) MGC192.168.8.8@tcp: > cfg command failed: rc = -2 > > [10763.522603] Lustre:cmd=cf003 0:ldata-OST0006-osc > 1:ldata-OST0006_UUID 2:172.23.0.116@o2ib > > > > Heath > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] error mounting client
Was this an initial mount of a new file system or a new TCP client being introduced to an existing file system? Can you describe your setup a little more? On Tue, Aug 25, 2020 at 9:32 AM Peeples, Heath wrote: > We have just build a 2.12.5 cluster. When trying to mount the fs (via > tcp). I get the following errors. Would anyone have an idea what the > problem might be? Thanks in advance > > > > > > [10680.535157] LustreError: 15c-8: MGC192.168.8.8@tcp: The configuration > from log 'ldata-client' failed (-2). This may be the result of > communication errors between this node and the MGS, a bad configuration, or > other errors. See the syslog for more information. > > [10680.883649] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup()) > ldata-clilov-91b118df1000: lov tgt 0 not cleaned! deathrow=0, lovrc=1 > > [10680.886610] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup()) Skipped > 4 previous similar messages > > [10680.890298] LustreError: 12634:0:(obd_config.c:610:class_cleanup()) > Device 9 not setup > > [10680.891816] Lustre: Unmounted ldata-client > > [10680.895178] LustreError: 12634:0:(obd_mount.c:1608:lustre_fill_super()) > Unable to mount (-2) > > [10763.516841] LustreError: 12732:0:(ldlm_lib.c:494:client_obd_setup()) > can't add initial connection > > [10763.518368] LustreError: 12732:0:(obd_config.c:559:class_setup()) setup > ldata-OST0006-osc-91b125029800 failed (-2) > > [10763.519806] LustreError: > 12732:0:(obd_config.c:1835:class_config_llog_handler()) MGC192.168.8.8@tcp: > cfg command failed: rc = -2 > > [10763.522603] Lustre:cmd=cf003 0:ldata-OST0006-osc > 1:ldata-OST0006_UUID 2:172.23.0.116@o2ib > > > > Heath > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] some clients dmesg filled up with "dirty page discard"
The I/O was not fully committed after close() from the client. Are you experiencing high numbers of evictions? On Tue, Aug 25, 2020 at 9:12 AM 肖正刚 wrote: > Hi, all > > We found that some clients' dmesg filled up with messages like > " > Aug 24 19:54:34 ln5 kernel: Lustre: > 13565:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x1680f:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13547:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x14246:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13545:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12018:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13567:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12c86:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13566:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12c76:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13550:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12c8e:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13568:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12c66:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13569:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12c7e:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13548:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12c6e:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13570:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12ca6:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13549:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12cbe:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13571:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12cb6:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13551:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12cae:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13572:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12cce:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13573:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12cc6:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13574:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12d56:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13575:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x12d36:0x0]/ may get corrupted (rc -108) > Aug 24 19:54:34 ln5 kernel: Lustre: > 13576:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page > discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: > [0x27a82:0x1429e:0x0]/ may get corrupted (rc -108) > > " > Then, we checked disk array, sas link, multipath, but no error found. > Has anyone ever met the same problem ? > Any suggestions will help! > > Regards. > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] error mounting client
We have just build a 2.12.5 cluster. When trying to mount the fs (via tcp). I get the following errors. Would anyone have an idea what the problem might be? Thanks in advance [10680.535157] LustreError: 15c-8: MGC192.168.8.8@tcp: The configuration from log 'ldata-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [10680.883649] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup()) ldata-clilov-91b118df1000: lov tgt 0 not cleaned! deathrow=0, lovrc=1 [10680.886610] LustreError: 12634:0:(lov_obd.c:839:lov_cleanup()) Skipped 4 previous similar messages [10680.890298] LustreError: 12634:0:(obd_config.c:610:class_cleanup()) Device 9 not setup [10680.891816] Lustre: Unmounted ldata-client [10680.895178] LustreError: 12634:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-2) [10763.516841] LustreError: 12732:0:(ldlm_lib.c:494:client_obd_setup()) can't add initial connection [10763.518368] LustreError: 12732:0:(obd_config.c:559:class_setup()) setup ldata-OST0006-osc-91b125029800 failed (-2) [10763.519806] LustreError: 12732:0:(obd_config.c:1835:class_config_llog_handler()) MGC192.168.8.8@tcp: cfg command failed: rc = -2 [10763.522603] Lustre:cmd=cf003 0:ldata-OST0006-osc 1:ldata-OST0006_UUID 2:172.23.0.116@o2ib Heath ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] some clients dmesg filled up with "dirty page discard"
Hi, all We found that some clients' dmesg filled up with messages like " Aug 24 19:54:34 ln5 kernel: Lustre: 13565:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x1680f:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13547:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x14246:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13545:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12018:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13567:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12c86:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13566:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12c76:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13550:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12c8e:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13568:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12c66:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13569:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12c7e:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13548:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12c6e:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13570:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12ca6:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13549:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12cbe:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13571:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12cb6:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13551:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12cae:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13572:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12cce:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13573:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12cc6:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13574:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12d56:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13575:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x12d36:0x0]/ may get corrupted (rc -108) Aug 24 19:54:34 ln5 kernel: Lustre: 13576:0:(llite_lib.c:2759:ll_dirty_page_discard_warn()) public1: dirty page discard: 10.10.2.11@o2ib:10.10.2.12@o2ib:/public1/fid: [0x27a82:0x1429e:0x0]/ may get corrupted (rc -108) " Then, we checked disk array, sas link, multipath, but no error found. Has anyone ever met the same problem ? Any suggestions will help! Regards. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Disk quota exceeded while quota is not filled
Hi, Still hoping for a reply... It seems to me that old groups are more affected by the issue than new ones that were created after a major disk migration. It seems that the quota enforcement is somehow based on a counter other than the accounting as the accounting produces the same numbers as du. So if quota is calculated separately from accounting, it is possible that quota is broken and keeps values from removed disks, while accounting is correct. So following that suspicion I tried to force the FS to recalculate quota. I tried: lctl conf_param technion.quota.ost=none and back to: lctl conf_param technion.quota.ost=ugp I tried running on mds and all ost: tune2fs -O ^quota and on again: tune2fs -O quota and after each attempt, also: lctl lfsck_start -A -t all -o -e continue But still the problem persists and groups under the quota usage get blocked with "quota exceeded" Best, David On Sun, Aug 16, 2020 at 8:41 AM David Cohen wrote: > Hi, > Adding some more information. > A Few months ago the data on the Lustre fs was migrated to new physical > storage. > After successful migration the old ost were marked as active=0 > (lctl conf_param technion-OST0001.osc.active=0) > > Since then all the clients were unmounted and mounted. > tunefs.lustre --writeconf was executed on the mgs/mdt and all the ost. > lctl dl don't show the old ost anymore, but when querying the quota they > still appear. > As I see that new users are less affected by the "quota exceeded" problem > (blocked from writing while quota is not filled), > I suspect that quota calculation is still summing values from the old ost: > > *lfs quota -g -v md_kaplan /storage/* > Disk quotas for grp md_kaplan (gid 10028): > Filesystem kbytes quota limit grace files quota limit > grace > /storage/ 4823987000 0 5368709120 - 143596 0 > 0 - > technion-MDT_UUID > 37028 - 0 - 143596 - 0 > - > quotactl ost0 failed. > quotactl ost1 failed. > quotactl ost2 failed. > quotactl ost3 failed. > quotactl ost4 failed. > quotactl ost5 failed. > quotactl ost6 failed. > quotactl ost7 failed. > quotactl ost8 failed. > quotactl ost9 failed. > quotactl ost10 failed. > quotactl ost11 failed. > quotactl ost12 failed. > quotactl ost13 failed. > quotactl ost14 failed. > quotactl ost15 failed. > quotactl ost16 failed. > quotactl ost17 failed. > quotactl ost18 failed. > quotactl ost19 failed. > quotactl ost20 failed. > technion-OST0015_UUID > 114429464* - 114429464 - - - > - - > technion-OST0016_UUID > 92938588 - 92938592 - - - - > - > technion-OST0017_UUID > 128496468* - 128496468 - - - > - - > technion-OST0018_UUID > 191478704* - 191478704 - - - > - - > technion-OST0019_UUID > 107720552 - 107720560 - - - > - - > technion-OST001a_UUID > 165631952* - 165631952 - - - > - - > technion-OST001b_UUID > 460714156* - 460714156 - - - > - - > technion-OST001c_UUID > 157182900* - 157182900 - - - > - - > technion-OST001d_UUID > 102945952* - 102945952 - - - > - - > technion-OST001e_UUID > 175840980* - 175840980 - - - > - - > technion-OST001f_UUID > 142666872* - 142666872 - - - > - - > technion-OST0020_UUID > 188147548* - 188147548 - - - > - - > technion-OST0021_UUID > 125914240* - 125914240 - - - > - - > technion-OST0022_UUID > 186390800* - 186390800 - - - > - - > technion-OST0023_UUID > 115386876 - 115386884 - - - > - - > technion-OST0024_UUID > 127139556* - 127139556 - - - > - - > technion-OST0025_UUID > 179666580* - 179666580 - - - > - - > technion-OST0026_UUID > 147837348 - 147837356 - - - > - - > technion-OST0027_UUID > 129823528 - 129823536 - - - > - - > technion-OST0028_UUID > 158270776 - 158270784 - - - > - - > technion-OST0029_UUID > 168762120 - 168763104 - - - > - - > technion-OST002a_UUID > 164235684 - 164235688 - - - > - - > technion-OST002b_UUID > 147512200 - 147512204 - - - > - - > technion-OST002c_UUID > 158046652