Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems
Hi Megan (et al.), I don't understand the behavior, either... I've worked successfully with changelogs in the past, and indeed it is very lightweight. (Since robinhood has not been running anywhere, I'd already removed all the changelog readers from the various MDTs for the reasons you noted.) Whatever my problem is does not manifest as a load issue, on either client or MDT side. It manifests rather as some sort of connection failure. Here's the most recent example, which maybe will generate more ideas as to cause. On our third lustre fs (one we use for backups), I was able to complete a file system scan to populate the database, but then when I activated changelogs, the client almost immediately experienced the disconnections we've seen on the other two systems. Here's the log from the MDT (heinlein, 10.7.17.126). The robinhood client is akebono (10.7.17.122): May 16 16:05:51 heinlein kernel: Lustre: lard-MDD: changelog on May 16 16:05:51 heinlein kernel: Lustre: Modifying parameter general.mdd.lard-MDT*.changelog_mask in log params May 16 16:13:16 heinlein kernel: Lustre: lard-MDT: Client 2d1aedc0-1f5e-2741-689a-169922a2593b (at 10.7.17.122@o2ib) reconnecting May 16 16:13:17 heinlein kernel: Lustre: lard-MDT: Client 2d1aedc0-1f5e-2741-689a-169922a2593b (at 10.7.17.122@o2ib) reconnecting May 16 16:13:17 heinlein kernel: Lustre: Skipped 7458 previous similar messages Here's what akebono (10.7.17.122) reported: May 16 16:13:16 akebono kernel: LustreError: 11-0: lard-MDT-mdc-880fd68d7000: Communicating with 10.7.17.126@o2ib, operation llog_origin_handle_destroy failed with -19. May 16 16:13:16 akebono kernel: Lustre: lard-MDT-mdc-880fd68d7000: Connection to lard-MDT (at 10.7.17.126@o2ib) was lost; in progress operations using this service will wait for recovery to complete May 16 16:13:16 akebono kernel: Lustre: lard-MDT-mdc-880fd68d7000: Connection restored to lard-MDT (at 10.7.17.126@o2ib) May 16 16:13:17 akebono kernel: LustreError: 11-0: lard-MDT-mdc-880fd68d7000: Communicating with 10.7.17.126@o2ib, operation llog_origin_handle_destroy failed with -19. May 16 16:13:17 akebono kernel: LustreError: Skipped 7458 previous similar messages May 16 16:13:17 akebono kernel: Lustre: lard-MDT-mdc-880fd68d7000: Connection to lard-MDT (at 10.7.17.126@o2ib) was lost; in progress operations using this service will wait for recovery to complete May 16 16:13:17 akebono kernel: Lustre: Skipped 7458 previous similar messages May 16 16:13:17 akebono kernel: Lustre: lard-MDT-mdc-880fd68d7000: Connection restored to lard-MDT (at 10.7.17.126@o2ib) May 16 16:13:17 akebono kernel: Lustre: Skipped 7458 previous similar messages May 16 16:13:18 akebono kernel: LustreError: 11-0: lard-MDT-mdc-880fd68d7000: Communicating with 10.7.17.126@o2ib, operation llog_origin_handle_destroy failed with -19. May 16 16:13:18 akebono kernel: LustreError: Skipped 14924 previous similar messages Jessica On 5/19/17 8:58 AM, Ms. Megan Larko wrote: Greetings Jessica, I'm not sure I am correctly understanding the behavior "robinhood activity floods the MDT". The robinhood program as you (and I) are using it is consuming the MDT CHANGELOG via a reader_id which was assigned when the CHANGELOG was enabled on the MDT. You can check the MDS for these readers via "lctl get_param mdd.*.changelog_users". Each CHANGELOG reader must either be consumed by a process or destroyed otherwise the CHANGELOG will grow until it consumes sufficient space to stop the MDT from functioning correctly. So robinhood should consume and then clear the CHANGELOG via this reader_id. This implementation of robinhood is actually a rather light-weight process as far as the MDS is concerned. The load issues I encountered were on the robinhood server itself which is a separate server from the Lustre MGS/MDS server. Just curious, have you checked for multiple reader_id's on your MDS for this Lustre file system? P.S. My robinhood configuration file is using nb_threads = 8, just for a data point. Cheers, megan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems
I think that may be a red herring related to rsyslog? When we most recently rebooted the MDT, this is the log (still on the box, not on the log server): May 3 14:24:22 asimov kernel: LNet: HW CPU cores: 12, npartitions: 4 May 3 14:24:30 asimov kernel: LNet: Added LNI 10.7.17.8@o2ib [8/256/0/180] And lctl list_nids gives it once: [root@asimov ~]# lctl list_nids 10.7.17.8@o2ib Jessica On 5/19/17 10:13 AM, Jeff Johnson wrote: Jessica, You are getting a NID registering twice. Doug noticed and pointed it out. I'd look to see if that is one machine doing something twice or two machines with the same NID. --Jeff On Fri, May 19, 2017 at 05:58 Ms. Megan Larko <dobsonu...@gmail.com <mailto:dobsonu...@gmail.com>> wrote: Greetings Jessica, I'm not sure I am correctly understanding the behavior "robinhood activity floods the MDT". The robinhood program as you (and I) are using it is consuming the MDT CHANGELOG via a reader_id which was assigned when the CHANGELOG was enabled on the MDT. You can check the MDS for these readers via "lctl get_param mdd.*.changelog_users". Each CHANGELOG reader must either be consumed by a process or destroyed otherwise the CHANGELOG will grow until it consumes sufficient space to stop the MDT from functioning correctly. So robinhood should consume and then clear the CHANGELOG via this reader_id. This implementation of robinhood is actually a rather light-weight process as far as the MDS is concerned. The load issues I encountered were on the robinhood server itself which is a separate server from the Lustre MGS/MDS server. Just curious, have you checked for multiple reader_id's on your MDS for this Lustre file system? P.S. My robinhood configuration file is using nb_threads = 8, just for a data point. Cheers, megan On Thu, May 18, 2017 at 2:36 PM, Jessica Otey <jo...@nrao.edu <mailto:jo...@nrao.edu>> wrote: Hi Megan, Thanks for your input. We use percona, a drop-in replacement for mysql... The robinhood activity floods the MDT, but it does not seem to produce any excessive load on the robinhood box... Anyway, FWIW... ~]# mysql --version mysql Ver 14.14 Distrib 5.5.54-38.6, for Linux (x86_64) using readline 5.1 Product: robinhood Version: 3.0-1 Build: 2017-03-13 10:29:26 Compilation switches: Lustre filesystems Lustre Version: 2.5 Address entries by FID MDT Changelogs supported Database binding: MySQL RPM: robinhood-lustre-3.0-1.lustre2.5.el6.x86_64 Lustre rpms: lustre-client-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64 lustre-client-modules-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64 On 5/18/17 11:55 AM, Ms. Megan Larko wrote: With regards to (WRT) Subject "Robinhood exhausting RPC resources against 2.5.5 lustre file systems", what version of robinhood and what version of MySQL database? I mention this because I have been working with robinhood-3.0-0.rc1 and initially MySQL-5.5.32 and Lustre 2.5.42.1 on kernel-2.6.32-573 and had issues in which the robinhood server consumed more than the total amount of 32 CPU cores on the robinhood server (with 128 G RAM) and would functionally hang the robinhood server. The issue was solved for me by changing to MySQL-5.6.35. It was the "sort" command in robinhood that was not working well with the MySQL-5.5.32. Cheers, megan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com <mailto:jeff.john...@aeoncomputing.com> www.aeoncomputing.com <http://www.aeoncomputing.com> t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems
Hi Megan, Thanks for your input. We use percona, a drop-in replacement for mysql... The robinhood activity floods the MDT, but it does not seem to produce any excessive load on the robinhood box... Anyway, FWIW... ~]# mysql --version mysql Ver 14.14 Distrib 5.5.54-38.6, for Linux (x86_64) using readline 5.1 Product: robinhood Version: 3.0-1 Build: 2017-03-13 10:29:26 Compilation switches: Lustre filesystems Lustre Version: 2.5 Address entries by FID MDT Changelogs supported Database binding: MySQL RPM: robinhood-lustre-3.0-1.lustre2.5.el6.x86_64 Lustre rpms: lustre-client-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64 lustre-client-modules-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64 On 5/18/17 11:55 AM, Ms. Megan Larko wrote: With regards to (WRT) Subject "Robinhood exhausting RPC resources against 2.5.5 lustre file systems", what version of robinhood and what version of MySQL database? I mention this because I have been working with robinhood-3.0-0.rc1 and initially MySQL-5.5.32 and Lustre 2.5.42.1 on kernel-2.6.32-573 and had issues in which the robinhood server consumed more than the total amount of 32 CPU cores on the robinhood server (with 128 G RAM) and would functionally hang the robinhood server. The issue was solved for me by changing to MySQL-5.6.35. It was the "sort" command in robinhood that was not working well with the MySQL-5.5.32. Cheers, megan ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems
Update #1. Robinhood change log consumption is also producing the same effect against a native 2.x file system instance. So the 'legacy' aspect of our two production instances does not seem to be a factor... Update #2. Currently running, per Colin Faber's suggestion: find /mnt/lustre -exec lfs path2fid {} \; This does not (so far) provoke a disconnection. Jessica On 5/17/17 2:04 PM, Jessica Otey wrote: We also have a third Lustre file system that originated as 2.4.3, and has since been upgraded to 2.5.5, against which Robinhood is currently operating as expected. This leads me to suppose that the issue may have to do the interaction between Robinhood and a legacy-1.8.x-now-lustre-2.5.5 system. But I don't know. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems
All, We have observed an unfortunate interaction between Robinhood and two Lustre 2.5.5 file systems (both of which originated as 1.8.9 file systems). Robinhood was used successfully against these file systems when they were both 1.8.9, 2.4.3, and then 2.5.3 (a total time span of about 11 months). We also have a third Lustre file system that originated as 2.4.3, and has since been upgraded to 2.5.5, against which Robinhood is currently operating as expected. This leads me to suppose that the issue may have to do the interaction between Robinhood and a legacy-1.8.x-now-lustre-2.5.5 system. But I don't know. The problem manifests itself as follows: Either a Robinhood file scan or the initiation of the consumption of changelogs results in the consumption all the available RPC resources on the MDT. This in turn leads to the MDT not being able to satisfy any other requests from clients, which in turn leads to client disconnections (the MDT thinks they are dead and evicts them). Meanwhile, Robinhood itself is unable to traverse the file system to gather the information it seeks, and so its scans either hang (due to the client disconnect) or run at a rate such that they would never complete (less than 1 file per second). If we don't run robinhood at all, the file system performs (after a remount of the MDT) as expected. Initially, we thought that the difficulty might be that we neglected to activate the FID-in-direct feature when we upgraded to 2.4.3. We did so on one of these systems, and ran an lfsck oi_scrub, but that did not ameliorate the problem. Any thoughts on this matter would be appreciated. (We miss using Robinhood!) Thanks, Jessica More data for those who cannot help themselves: April 2016 - Robinhood comes into production use against both our 1.8.9 file systems. July 2016 - Upgrade to 2.4.3 (on both production lustre file systems) -- Robinhood rebuilt against 2.4.3 client; changelog consumption now included. Lustre "reconnects" (from /var/log/messages on one of the MDTs): July 2016: 4 Aug 2016: 20 Sept 2016: 8 Oct 2016: 8 Nov 4-6, 2016 - Upgrade to 2.5.3 (on both production lustre file systems) -- Robinhood rebuilt against 2.5.3 client. Lustre "reconnects": Nov. 2016: 180 Dec. 2016: 62 Jan. 2017: 96 Feb 1-24, 2017: 2 Feb 24, 2017 - Upgrade to 2.5.5 (on both production lustre file systems) NAASC-Lustre MDT coming back Feb 24 20:46:44 10.7.7.8 kernel: Lustre: Lustre: Build Version: 2.5.5-g22a210f-CHANGED-2.6.32-642.6.2.el6_lustre.2.5.5.x86_64 Feb 24 20:46:44 10.7.7.8 kernel: Lustre: Lustre: Build Version: 2.5.5-g22a210f-CHANGED-2.6.32-642.6.2.el6_lustre.2.5.5.x86_64 Feb 24 20:46:44 10.7.7.8 kernel: LNet: Added LNI 10.7.17.8@o2ib [8/256/0/180] Feb 24 20:46:44 10.7.7.8 kernel: LNet: Added LNI 10.7.17.8@o2ib [8/256/0/180] Feb 24 20:46:45 10.7.7.8 kernel: LDISKFS-fs (md127): mounted filesystem with ordered data mode. quota=off. Opts: Feb 24 20:46:45 10.7.7.8 kernel: LDISKFS-fs (md127): mounted filesystem with ordered data mode. quota=off. Opts: Feb 24 20:46:46 10.7.7.8 kernel: Lustre: MGC10.7.17.8@o2ib: Connection restored to MGS (at 0@lo) Feb 24 20:46:46 10.7.7.8 kernel: Lustre: MGC10.7.17.8@o2ib: Connection restored to MGS (at 0@lo) Feb 24 20:46:47 10.7.7.8 kernel: Lustre: naaschpc-MDT: used disk, loading Feb 24 20:46:47 10.7.7.8 kernel: Lustre: naaschpc-MDT: used disk, loading The night after this upgrade, a regular rsync to the backup Lustre system provokes a failure/client disconnect. (Unfortunately, I don't have the logs to look at Robinhood activity from this time, but I believe I restarted the service after the system came back.) Feb 25 02:14:24 10.7.7.8 kernel: LustreError: 25103:0:(service.c:2020:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.7.17.123@o2ib: deadline 6:11s ago Feb 25 02:14:24 10.7.7.8 kernel: LustreError: 25103:0:(service.c:2020:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.7.17.123@o2ib: deadline 6:11s ago Feb 25 02:14:24 10.7.7.8 kernel: req@88045b3a2050 x1560271381909936/t0(0) o103->bb228923-4216-cc59-d847-38b543af1ae2@10.7.17.123@o2ib:0/0 lens 3584/0 e 0 to 0 dl 1488006853 ref 1 fl Interpret:/0/ rc 0/-1 Feb 25 02:14:24 10.7.7.8 kernel: req@88045b3a2050 x1560271381909936/t0(0) o103->bb228923-4216-cc59-d847-38b543af1ae2@10.7.17.123@o2ib:0/0 lens 3584/0 e 0 to 0 dl 1488006853 ref 1 fl Interpret:/0/ rc 0/-1 Feb 25 02:14:24 10.7.7.8 kernel: Lustre: 25111:0:(service.c:2052:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:11s); client may timeout. req@88045b3a2850 x1560271381909940/t0(0) o103->bb228923-4216-cc59-d847-38b543af1ae2@10.7.17.123@o2ib:0/0 lens 3584/0 e 0 to 0 dl 1488006853 ref 1 fl Interpret:/0/ rc 0/-1 Feb 25 02:14:24 10.7.7.8 kernel:
[lustre-discuss] using an lnet router for certain connections... but not for others
We have three lustre systems, two "production" and one "recovery" system, which is in essence a (partial) backup of ONE of the production systems. The main difference between the two production systems is that one uses IB and one uses 10G ethernet among the OSSes. I am in the process of moving my robinhood instance from one box to another. The previous box was a normal "production" client, meaning it accessed both clients via our lnet routers (which are identical, and are used by both production lustre file systems). The new box is a "data mover" for the backup lustre file system, which was configured WITHOUT an lnet router. Because this box has IB, I am able to connect directly to production lustre file system that also has IB. The thing is, I would also like to keep using robinhood for the other file system, the one using 10gig ethernet. Is there a way to specify in the lustre.conf configuration a set up whereby the lnet routers could be used to access only the production file systems but not the backup file system? Any leads appreciated. Thanks, Jessica -- Jessica Otey System Administrator II North American ALMA Science Center (NAASC) National Radio Astronomy Observatory (NRAO) Charlottesville, Virginia (USA) ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] replacing empty OST
Patrick, If I'm understanding you correctly, I think you just need to pass --replace when you run mkfs.lustre. from man mkfs.lustre --replace Used to initialize a target with the same --index as a previously used target if the old target was permanently lost for some reason (e.g. multiple disk failure or massive corruption). This avoids having the target try to register as a new target with the MGS. This will allow you to specify the index number you used previously. You will probably need --reformat as well if it detects a filesystem already there. I've done this and it doesn't require unmounting anything. Jessica On 01/19/2017 05:35 PM, Patrick Shopbell wrote: Thank you very much for the reply, Marion. But I did indeed use that option. In fact, I think that option is a safety that keeps one from inadvertently overwriting a Lustre volume. If mkfs.lustre detects a Lustre file system on the volume being formatted, it notes that you have to specify "--reformat" to force the format to overwrite the old filesystem. It doesn't seem to have reset the info on the MGS, from what I can tell. I was hoping there is some way to do this without having to unmount the entire filesystem everywhere... But perhaps there is not. Maybe I will just skip OST 9 and move on to 10... Thanks anyway. Patrick On 1/18/17 5:58 PM, Marion Hakanson wrote: Patrick, I'm no guru, but there's a "--reformat" option to the mkfs.lustre command which you can use when re-creating a lost/destroyed OSS. That should tell the MGS that you intend to re-use the index. Regards, Marion To: <lustre-discuss@lists.lustre.org> From: Patrick Shopbell <p...@astro.caltech.edu> Date: Wed, 18 Jan 2017 17:18:05 -0800 Subject: [lustre-discuss] replacing empty OST Hi Lustre gurus - Quick question about replacing a new, empty OST: I installed an OST (#9) briefly from a specific machine, and then ended up aborting that install. (It didn't work due to some version mismatch errors.) I've since solved all those problems, reinstalled the OSS, and reformatted the OST. (Maybe I should not have done that...) Anyway, now I can add the OST, and it sort of works, except it notes that OST #9 is already assigned. So I get an error like this: [date] astrolmgs kernel: LustreErrorL 140-S: Server lustre-OST0009 requested index 9, but that index is already in use. Use --writeconf to force. Since I don't care about the data on there (because there isn't any), is there any shortcut to getting this to work? Or do I just need to shut everything down, run writeconf on the MGS and OSS units, then start everything back up? Is there any way to make the system think that this OST #9 volume is the same as the earlier failed volume - since it really is the same thing, meaning a new empy OST. I know I could just disable OST 9 everywhere and call this one OST 10, but I'd rather not... I am running the old Lustre 2.5.2. Thanks a lot, Patrick -- ** | Patrick Shopbell Department of Astronomy | | p...@astro.caltech.edu Mail Code 249-17| | (626) 395-4097 California Institute of Technology | | (626) 568-9352 (FAX) Pasadena, CA 91125 | | WWW: http://www.astro.caltech.edu/~pls/ | ** ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- Jessica Otey System Administrator II North American ALMA Science Center (NAASC) National Radio Astronomy Observatory (NRAO) Charlottesville, Virginia (USA) ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Round robin allocation (in general and in buggy 2.5.3)
All, I am looking for a more complete understanding of how the two settings qos_prio_free and qos_threshold_rr function together. My current understanding, which may be inaccurate, is the following: *qos_prio_free** * This setting controls how much Lustre prioritizes free space (versus location for the sake of performance) in allocation. The higher this number, the more Lustre takes empty space on an OST into consideration for its allocation. When set to 100%, Lustre uses ONLY empty space as the deciding factor for writes. *qos_threshold_rr** * This setting controls how much consideration should be given to QoS in allocation The higher this number, the more QOS is taken into consideration. When set to 100%, Lustre ignores the QoS variable and hits all OSTs equally I'm looking for several answers: 1) Is my basic understanding of the above settings correct? 2) How does lustre deal with OSTs that are 100% full? I'm curious about this under two conditions. 2a) When you set qos_threshold_rr=100 -- meaning, go and hit all the OSTs the same amount. On one of our 2.5.3 lustre filesystems, the allocator is not working (a known bug, but why it seems to be behaving fine on the other one, I couldn't say...) and so we have configured qos_threshold_rr=100. Since our OSTs are pretty dramatically unbalanced, it has happened that attempts to write to full OSTs have caused write failures. Data deletes have gotten us below 90% on all OSTs now, and while I can certainly take the fullest OSTs them out of write mode if that is needed, it would seem to me that lustre should, no matter what your qos_threshold_rr setting, treat OSTs that are 100% full differently, meaning, it should no longer attempt to write to them. In short, this seems like a bug to me... although, granted, I suppose if you are overriding the allocator, it's caveat user at that point. 2b) When you set qos_threshold_rr != 100 -- meaning, the allocator is working On the other lustre 2.5.3 system, the system defaults (qos_prio_free=91%; qos_threshold_rr=17%) are hitting all the OSTs when I run my test*, so I have not changed them. Several of the OSTs in this file system are at 100%. I get that we are not seeing write failures because the allocator is not allocating to these OSTs as frequently, based on how full they are. But I know from my test that these OSTs are still in the mix... so that implies to me that it would be possible, although less likely, to see a write failure if a write stream is opened on one of the 100% OSTs. I'd love to be able to quantify that "less likely". Basically, I guess my question is: is taking an OST out of write mode the only (or best) way of preventing the fs from attempting to write to it when it is nearly full? Thanks, Jessica -- *To test file allocation on your lustre system, you can use this one-liner from a lustre client. USE IT IN ITS OWN, NEW DIRECTORY! touch t.{1..2000}; lfs getstripe t.*|fgrep -A1 obdidx|fgrep -v obdidx|fgrep -v -- --|awk '{ print $1 }'|sort|uniq -c; rm -f t.* -- Jessica Otey System Administrator II North American ALMA Science Center (NAASC) National Radio Astronomy Observatory (NRAO) Charlottesville, Virginia (USA) ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] syntax/guidance on lfsck
All, If there is anyone who has run as lfsck on a 2.5.3 system, I would appreciate some guidance. The manual has only been of limited help. The situation: We were draining and OST unaware that a but in 2.5.3 causes the inodes to remain--thus rendering the df readout inaccurate (and therefore not useful). The solution: This should be fixable via lfsck--it would simply compared the inodes and the files and bring them back into the correct alignment (i.e., delete the inodes that no longer correspond to data on the OST). The specific issue: I don't know: 1) What exact lfsck command to issue 2) Where to issue it (mds, oss) 3) What to expect as far as output/how to interact with it What I have done: I had issued (on the oss) the following command: [root@naasc-oss-6 ~]# lctl lfsck_start --device naaschpc-OST0014 Started LFSCK on the device naaschpc-OST0014. But the prompt just returns, as if it either done or it is doing something in the background? I don't understand how to tell what it is doing. When I tried to stop it, I got: [root@naasc-oss-6 ~]# lctl lfsck_stop --device naaschpc-OST0014 Fail to stop LFSCK: Operation already in progress Any help interpreting these messages as well as coming up with the proper command to run to correct our inode issue would be helpful. Please keep in mind that many features described in the 2.x manual aren't available because we are only using 2.5.3. (For instance, the --type layout option, which seems to be what we want (check and repair MDT-OST inconsistency) is not available until 2.6. Thanks, Jessica -- Jessica Otey System Administrator II North American ALMA Science Center (NAASC) National Radio Astronomy Observatory (NRAO) Charlottesville, Virginia (USA) ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] FOLLOW UP: MDT filling up with 4 MB files
All, My colleagues in Chile now believe that both of their 2.5.3 file systems are experiencing this same problem with the MDTs filling up with files. We have also come across a report from another user from early 2015 denoting the same issue, also with a 2.5.3 system. See: https://www.mail-archive.com/search?l=lustre-discuss@lists.lustre.org=subject:%22Re%5C%3A+%5C%5Blustre%5C-discuss%5C%5D+MDT+partition+getting+full%22=newest We are confident that these files are not related to the changelog feature. Does anyone have any other suggestions as to what the cause of this problem could be? I'm intrigued that the Lustre version involved in all 3 reports is 2.5.3. Could this be a bug? Thanks, Jessica On Thu, Sep 29, 2016 at 8:58 AM, Jessica Otey <jo...@nrao.edu <mailto:jo...@nrao.edu>> wrote: Hello all, I write on behalf of my colleagues in Chile, who are experiencing a bizarre problem with their MDT, namely, it is filling up with 4 MB files. There is no issue with the number of inodes, of which there are hundreds of millions unused. Â [root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode device /dev/sdb2 mounted by lustre Filesystem features: Â Â Â has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink quota Inode count: Â Â Â Â Â Â Â 239730688 Free inodes: Â Â Â Â Â Â Â 223553405 Inodes per group: Â Â Â Â 32768 Inode blocks per group: Â 4096 First inode: Â Â Â Â Â Â Â 11 Inode size:Â Â Â Â Â 512 Journal inode: Â Â Â Â Â Â 8 Journal backup: Â Â Â Â Â inode blocks User quota inode: Â Â Â Â 3 Group quota inode: Â Â Â Â 4 Has anyone ever encountered such a problem? The only thing unusual about this cluster is that it is using 2.5.3 MDS/OSSes while still using 1.8.9 clientsâsomething I didn't actually believe was possible, as I thought the last version to work effectively with 1.8.9 clients was 2.4.3. However, for all I know, the version gap may have nothing to do with this phenomena. Any and all advice is appreciated. Any general information on the structure of the MDT also welcome, as such info is in short supply on the internet. Thanks, Jessica Below is a look inside the O folder at the root of the MDT, where there are about 48,000 4MB files: [root@jaopost-mds O]# pwd /lustrebackup/O [root@jaopost-mds O]# tree -L 1 . âââ 1 âââ 10 âââ 20003 3 directories, 0 files [root@jaopost-mds O]# ls -l 1 total 2240 drwx-- 2 root root 69632 sep 16 16:25 d0 drwx-- 2 root root 69632 sep 16 16:25 d1 drwx-- 2 root root 61440 sep 16 17:46 d10 drwx-- 2 root root 69632 sep 16 17:46 d11 drwx-- 2 root root 69632 sep 16 18:04 d12 drwx-- 2 root root 65536 sep 16 18:04 d13 drwx-- 2 root root 65536 sep 16 18:04 d14 drwx-- 2 root root 69632 sep 16 18:04 d15 drwx-- 2 root root 61440 sep 16 18:04 d16 drwx-- 2 root root 61440 sep 16 18:04 d17 drwx-- 2 root root 69632 sep 16 18:04 d18 drwx-- 2 root root 69632 sep 16 18:04 d19 drwx-- 2 root root 65536 sep 16 16:25 d2 drwx-- 2 root root 69632 sep 16 18:04 d20 drwx-- 2 root root 69632 sep 16 18:04 d21 drwx-- 2 root root 61440 sep 16 18:04 d22 drwx-- 2 root root 69632 sep 16 18:04 d23 drwx-- 2 root root 61440 sep 16 16:11 d24 drwx-- 2 root root 69632 sep 16 16:11 d25 drwx-- 2 root root 69632 sep 16 16:11 d26 drwx-- 2 root root 69632 sep 16 16:11 d27 drwx-- 2 root root 69632 sep 16 16:25 d28 drwx-- 2 root root 69632 sep 16 16:25 d29 drwx-- 2 root root 69632 sep 16 16:25 d3 drwx-- 2 root root 65536 sep 16 16:25 d30 drwx-- 2 root root 65536 sep 16 16:25 d31 drwx-- 2 root root 69632 sep 16 16:25 d4 drwx-- 2 root root 61440 sep 16 16:25 d5 drwx-- 2 root root 69632 sep 16 16:25 d6 drwx-- 2 root root 73728 sep 16 16:25 d7 drwx-- 2 root root 65536 sep 16 17:46 d8 drwx-- 2 root root 69632 sep 16 17:46 d9 -rw-r--r-- 1 root root 8 ene 4 2016 LAST_ID [root@jaopost-mds d0]# ls -ltr | more total 5865240 -rw-r--r-- 1 root root 252544 ene 4 2016 32 -rw-r--r-- 1 root root 2396224 ene 9 2016 2720 -rw-r--r-- 1 root root 4153280 ene 9 2016 2752 -rw-r--r-- 1 root root 4153280 ene 10 2016 2784 -rw-r--r-- 1 root root 4153280 ene 10 2016 2816 -rw-r--r-- 1 root root 4153280 ene 10 2016 2848 -rw-r--r-- 1 root root 4153280 ene 10 2016 2880 -rw-r--r-- 1 root root 4153280 ene 10 2016 2944 -rw-r--r-- 1 root root 4153280 ene 10 2016 2976 -rw-r--r-- 1 root root 4153280 ene 10 2016 3008 -rw-r--r-- 1 root root 4153280 ene 10 2016 3040
[lustre-discuss] Fwd: Re: MDT filling up with 4 MB files
[Sent on behalf of maxs.simmo...@alma.cl] Colin, We cleared the changelogs on the MDT, but see no space clearance. Any idea how the 4MB files are produced? Thanks. On 29/09/16 13:25, Colin Faber wrote: Yes, if you're not consuming the records, you're going to see them eat up space on the MDT. On Thu, Sep 29, 2016 at 10:04 AM, Jessica Otey <jo...@nrao.edu <mailto:jo...@nrao.edu>> wrote: On 9/29/16 12:36 PM, Colin Faber wrote: Is the changelogs feature enabled? Yes, and.. the output of lfs changelogs gives us 360,000 lines... Do you think that is the source of all the 'extra' data? On Thu, Sep 29, 2016 at 8:58 AM, Jessica Otey <jo...@nrao.edu <mailto:jo...@nrao.edu>> wrote: Hello all, I write on behalf of my colleagues in Chile, who are experiencing a bizarre problem with their MDT, namely, it is filling up with 4 MB files. There is no issue with the number of inodes, of which there are hundreds of millions unused. [root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode device /dev/sdb2 mounted by lustre Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink quota Inode count: 239730688 Free inodes: 223553405 Inodes per group: 32768 Inode blocks per group: 4096 First inode: 11 Inode size: 512 Journal inode:8 Journal backup: inode blocks User quota inode: 3 Group quota inode:4 Has anyone ever encountered such a problem? The only thing unusual about this cluster is that it is using 2.5.3 MDS/OSSes while still using 1.8.9 clients—something I didn't actually believe was possible, as I thought the last version to work effectively with 1.8.9 clients was 2.4.3. However, for all I know, the version gap may have nothing to do with this phenomena. Any and all advice is appreciated. Any general information on the structure of the MDT also welcome, as such info is in short supply on the internet. Thanks, Jessica Below is a look inside the O folder at the root of the MDT, where there are about 48,000 4MB files: [root@jaopost-mds O]# pwd /lustrebackup/O [root@jaopost-mds O]# tree -L 1 . ├── 1 ├── 10 └── 20003 3 directories, 0 files [root@jaopost-mds O]# ls -l 1 total 2240 drwx-- 2 root root 69632 sep 16 16:25 d0 drwx-- 2 root root 69632 sep 16 16:25 d1 drwx-- 2 root root 61440 sep 16 17:46 d10 drwx-- 2 root root 69632 sep 16 17:46 d11 drwx-- 2 root root 69632 sep 16 18:04 d12 drwx-- 2 root root 65536 sep 16 18:04 d13 drwx-- 2 root root 65536 sep 16 18:04 d14 drwx-- 2 root root 69632 sep 16 18:04 d15 drwx-- 2 root root 61440 sep 16 18:04 d16 drwx-- 2 root root 61440 sep 16 18:04 d17 drwx-- 2 root root 69632 sep 16 18:04 d18 drwx-- 2 root root 69632 sep 16 18:04 d19 drwx-- 2 root root 65536 sep 16 16:25 d2 drwx-- 2 root root 69632 sep 16 18:04 d20 drwx-- 2 root root 69632 sep 16 18:04 d21 drwx-- 2 root root 61440 sep 16 18:04 d22 drwx-- 2 root root 69632 sep 16 18:04 d23 drwx-- 2 root root 61440 sep 16 16:11 d24 drwx-- 2 root root 69632 sep 16 16:11 d25 drwx-- 2 root root 69632 sep 16 16:11 d26 drwx-- 2 root root 69632 sep 16 16:11 d27 drwx-- 2 root root 69632 sep 16 16:25 d28 drwx-- 2 root root 69632 sep 16 16:25 d29 drwx-- 2 root root 69632 sep 16 16:25 d3 drwx-- 2 root root 65536 sep 16 16:25 d30 drwx-- 2 root root 65536 sep 16 16:25 d31 drwx-- 2 root root 69632 sep 16 16:25 d4 drwx-- 2 root root 61440 sep 16 16:25 d5 drwx-- 2 root root 69632 sep 16 16:25 d6 drwx-- 2 root root 73728 sep 16 16:25 d7 drwx-- 2 root root 65536 sep 16 17:46 d8 drwx-- 2 root root 69632 sep 16 17:46 d9 -rw-r--r-- 1 root root 8 ene 4 2016 LAST_ID [root@jaopost-mds d0]# ls -ltr | more total 5865240 -rw-r--r-- 1 root root 252544 ene 4 2016 32 -rw-r--r-- 1 root root 2396224 ene 9 2016 2720 -rw-r--r-- 1 root root 4153280 ene 9 2016 2752 -rw-r--r-- 1 root root 4153280 ene 10 2016 2784 -rw-r--r-- 1 root root 4153280 ene 10 2016 2816 -rw-r--r-- 1 root root 4153280 ene 10 2016 2848 -rw-r--r-- 1 root root 4153280 ene 10 2016 2880 -rw-r--r-- 1 root root 4153280 ene 10 20
Re: [lustre-discuss] MDT filling up with 4 MB files
On 9/29/16 12:36 PM, Colin Faber wrote: Is the changelogs feature enabled? Yes, and.. the output of lfs changelogs gives us 360,000 lines... Do you think that is the source of all the 'extra' data? On Thu, Sep 29, 2016 at 8:58 AM, Jessica Otey <jo...@nrao.edu <mailto:jo...@nrao.edu>> wrote: Hello all, I write on behalf of my colleagues in Chile, who are experiencing a bizarre problem with their MDT, namely, it is filling up with 4 MB files. There is no issue with the number of inodes, of which there are hundreds of millions unused. [root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode device /dev/sdb2 mounted by lustre Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink quota Inode count: 239730688 Free inodes: 223553405 Inodes per group: 32768 Inode blocks per group: 4096 First inode: 11 Inode size: 512 Journal inode:8 Journal backup: inode blocks User quota inode: 3 Group quota inode:4 Has anyone ever encountered such a problem? The only thing unusual about this cluster is that it is using 2.5.3 MDS/OSSes while still using 1.8.9 clients—something I didn't actually believe was possible, as I thought the last version to work effectively with 1.8.9 clients was 2.4.3. However, for all I know, the version gap may have nothing to do with this phenomena. Any and all advice is appreciated. Any general information on the structure of the MDT also welcome, as such info is in short supply on the internet. Thanks, Jessica Below is a look inside the O folder at the root of the MDT, where there are about 48,000 4MB files: [root@jaopost-mds O]# pwd /lustrebackup/O [root@jaopost-mds O]# tree -L 1 . ├── 1 ├── 10 └── 20003 3 directories, 0 files [root@jaopost-mds O]# ls -l 1 total 2240 drwx-- 2 root root 69632 sep 16 16:25 d0 drwx-- 2 root root 69632 sep 16 16:25 d1 drwx-- 2 root root 61440 sep 16 17:46 d10 drwx-- 2 root root 69632 sep 16 17:46 d11 drwx-- 2 root root 69632 sep 16 18:04 d12 drwx-- 2 root root 65536 sep 16 18:04 d13 drwx-- 2 root root 65536 sep 16 18:04 d14 drwx-- 2 root root 69632 sep 16 18:04 d15 drwx-- 2 root root 61440 sep 16 18:04 d16 drwx-- 2 root root 61440 sep 16 18:04 d17 drwx-- 2 root root 69632 sep 16 18:04 d18 drwx-- 2 root root 69632 sep 16 18:04 d19 drwx-- 2 root root 65536 sep 16 16:25 d2 drwx-- 2 root root 69632 sep 16 18:04 d20 drwx-- 2 root root 69632 sep 16 18:04 d21 drwx-- 2 root root 61440 sep 16 18:04 d22 drwx-- 2 root root 69632 sep 16 18:04 d23 drwx-- 2 root root 61440 sep 16 16:11 d24 drwx-- 2 root root 69632 sep 16 16:11 d25 drwx-- 2 root root 69632 sep 16 16:11 d26 drwx-- 2 root root 69632 sep 16 16:11 d27 drwx-- 2 root root 69632 sep 16 16:25 d28 drwx-- 2 root root 69632 sep 16 16:25 d29 drwx-- 2 root root 69632 sep 16 16:25 d3 drwx-- 2 root root 65536 sep 16 16:25 d30 drwx-- 2 root root 65536 sep 16 16:25 d31 drwx-- 2 root root 69632 sep 16 16:25 d4 drwx-- 2 root root 61440 sep 16 16:25 d5 drwx-- 2 root root 69632 sep 16 16:25 d6 drwx-- 2 root root 73728 sep 16 16:25 d7 drwx-- 2 root root 65536 sep 16 17:46 d8 drwx-- 2 root root 69632 sep 16 17:46 d9 -rw-r--r-- 1 root root 8 ene 4 2016 LAST_ID [root@jaopost-mds d0]# ls -ltr | more total 5865240 -rw-r--r-- 1 root root 252544 ene 4 2016 32 -rw-r--r-- 1 root root 2396224 ene 9 2016 2720 -rw-r--r-- 1 root root 4153280 ene 9 2016 2752 -rw-r--r-- 1 root root 4153280 ene 10 2016 2784 -rw-r--r-- 1 root root 4153280 ene 10 2016 2816 -rw-r--r-- 1 root root 4153280 ene 10 2016 2848 -rw-r--r-- 1 root root 4153280 ene 10 2016 2880 -rw-r--r-- 1 root root 4153280 ene 10 2016 2944 -rw-r--r-- 1 root root 4153280 ene 10 2016 2976 -rw-r--r-- 1 root root 4153280 ene 10 2016 3008 -rw-r--r-- 1 root root 4153280 ene 10 2016 3040 -rw-r--r-- 1 root root 4153280 ene 10 2016 3072 -rw-r--r-- 1 root root 4153280 ene 10 2016 3104 -rw-r--r-- 1 root root 4153280 ene 10 2016 3136 -rw-r--r-- 1 root root 4153280 ene 10 2016 3168 -rw-r--r-- 1 root root 4153280 ene 10 2016 3200 -rw-r--r-- 1 root root 4153280 ene 10 2016 3232 -rw-r--r-- 1 root root 4153280 ene 10 2016 3264 -rw-r--r-- 1 root root 4153280 ene 10 2016 3296 -rw-r--r-- 1 root root 4153280 ene 10 2016 3328 ___ lustre-discuss mailing list lustre-discuss@list
[lustre-discuss] MDT filling up with 4 MB files
Hello all, I write on behalf of my colleagues in Chile, who are experiencing a bizarre problem with their MDT, namely, it is filling up with 4 MB files. There is no issue with the number of inodes, of which there are hundreds of millions unused. [root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode device /dev/sdb2 mounted by lustre Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink quota Inode count: 239730688 Free inodes: 223553405 Inodes per group: 32768 Inode blocks per group: 4096 First inode: 11 Inode size: 512 Journal inode:8 Journal backup: inode blocks User quota inode: 3 Group quota inode:4 Has anyone ever encountered such a problem? The only thing unusual about this cluster is that it is using 2.5.3 MDS/OSSes while still using 1.8.9 clients—something I didn't actually believe was possible, as I thought the last version to work effectively with 1.8.9 clients was 2.4.3. However, for all I know, the version gap may have nothing to do with this phenomena. Any and all advice is appreciated. Any general information on the structure of the MDT also welcome, as such info is in short supply on the internet. Thanks, Jessica Below is a look inside the O folder at the root of the MDT, where there are about 48,000 4MB files: [root@jaopost-mds O]# pwd /lustrebackup/O [root@jaopost-mds O]# tree -L 1 . ├── 1 ├── 10 └── 20003 3 directories, 0 files [root@jaopost-mds O]# ls -l 1 total 2240 drwx-- 2 root root 69632 sep 16 16:25 d0 drwx-- 2 root root 69632 sep 16 16:25 d1 drwx-- 2 root root 61440 sep 16 17:46 d10 drwx-- 2 root root 69632 sep 16 17:46 d11 drwx-- 2 root root 69632 sep 16 18:04 d12 drwx-- 2 root root 65536 sep 16 18:04 d13 drwx-- 2 root root 65536 sep 16 18:04 d14 drwx-- 2 root root 69632 sep 16 18:04 d15 drwx-- 2 root root 61440 sep 16 18:04 d16 drwx-- 2 root root 61440 sep 16 18:04 d17 drwx-- 2 root root 69632 sep 16 18:04 d18 drwx-- 2 root root 69632 sep 16 18:04 d19 drwx-- 2 root root 65536 sep 16 16:25 d2 drwx-- 2 root root 69632 sep 16 18:04 d20 drwx-- 2 root root 69632 sep 16 18:04 d21 drwx-- 2 root root 61440 sep 16 18:04 d22 drwx-- 2 root root 69632 sep 16 18:04 d23 drwx-- 2 root root 61440 sep 16 16:11 d24 drwx-- 2 root root 69632 sep 16 16:11 d25 drwx-- 2 root root 69632 sep 16 16:11 d26 drwx-- 2 root root 69632 sep 16 16:11 d27 drwx-- 2 root root 69632 sep 16 16:25 d28 drwx-- 2 root root 69632 sep 16 16:25 d29 drwx-- 2 root root 69632 sep 16 16:25 d3 drwx-- 2 root root 65536 sep 16 16:25 d30 drwx-- 2 root root 65536 sep 16 16:25 d31 drwx-- 2 root root 69632 sep 16 16:25 d4 drwx-- 2 root root 61440 sep 16 16:25 d5 drwx-- 2 root root 69632 sep 16 16:25 d6 drwx-- 2 root root 73728 sep 16 16:25 d7 drwx-- 2 root root 65536 sep 16 17:46 d8 drwx-- 2 root root 69632 sep 16 17:46 d9 -rw-r--r-- 1 root root 8 ene 4 2016 LAST_ID [root@jaopost-mds d0]# ls -ltr | more total 5865240 -rw-r--r-- 1 root root 252544 ene 4 2016 32 -rw-r--r-- 1 root root 2396224 ene 9 2016 2720 -rw-r--r-- 1 root root 4153280 ene 9 2016 2752 -rw-r--r-- 1 root root 4153280 ene 10 2016 2784 -rw-r--r-- 1 root root 4153280 ene 10 2016 2816 -rw-r--r-- 1 root root 4153280 ene 10 2016 2848 -rw-r--r-- 1 root root 4153280 ene 10 2016 2880 -rw-r--r-- 1 root root 4153280 ene 10 2016 2944 -rw-r--r-- 1 root root 4153280 ene 10 2016 2976 -rw-r--r-- 1 root root 4153280 ene 10 2016 3008 -rw-r--r-- 1 root root 4153280 ene 10 2016 3040 -rw-r--r-- 1 root root 4153280 ene 10 2016 3072 -rw-r--r-- 1 root root 4153280 ene 10 2016 3104 -rw-r--r-- 1 root root 4153280 ene 10 2016 3136 -rw-r--r-- 1 root root 4153280 ene 10 2016 3168 -rw-r--r-- 1 root root 4153280 ene 10 2016 3200 -rw-r--r-- 1 root root 4153280 ene 10 2016 3232 -rw-r--r-- 1 root root 4153280 ene 10 2016 3264 -rw-r--r-- 1 root root 4153280 ene 10 2016 3296 -rw-r--r-- 1 root root 4153280 ene 10 2016 3328 ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] resolution of LU-4397 [Permanently disabled OST causes clients to hang on df (statfs)]
All, This is a bit of a complex scenario but I am hoping that someone out there can provide some relevant experience. We have a production lustre system whose servers we have just recently upgraded from 1.8.9 to 2.4.3. In testing a few clients (before upgrading them all), we encountered this (known) bug: https://jira.hpdd.intel.com/browse/LU-4397 This bug was actually discovered by one of my NRAO colleagues, Wolfgang, who works in Green Bank, WV (whereas I work in Charlottesville, VA). There are two things with which I would appreciate the list's help: 1) Identifying a version where this bug is FOR SURE fixed. If you read the ticket, it appears that the change was landed for 2.5.2 and 2.6, but that users have reported the bug existing (still?/again?) in 2.5.3. We absolutely need upgrade beyond 2.4.3, but it would be nice to know how far we need to go in order to have functional clients, ideally out of the box. (In addition to the df command, we have critical software that uses statfs). 2) Identifying another person who has experienced this bug IN A LEGACY ENVIRONMENT, i.e., in a system that started a 1.8.x, in which OSTs were made permanently inactive, and then the system upgraded to 2.5.3. In this case, I'd be curious as to whether the workaround described in the ticket by Wolfgang (who used it on 2.5.3) works for you, too. (It DOES NOT work for us on 2.4.3.) At least this way, if the bug still exists in 2.5.3, we'll be more confident that we can use the workaround successfully. Thanks for your time, Jessica -- Jessica Otey System Administrator II North American ALMA Science Center (NAASC) National Radio Astronomy Observatory (NRAO) Charlottesville, Virginia (USA) ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] difficulties mounting client via an lnet router--SOLVED!
All, Thanks to the repliers who contributed to the solution. Here's a rundown: First, here's a way to see if you have connectivity via the router between client and mdt, etc., using NIDS (to list NIDS, use lctl list_nids) ltcl ping If the ping between client and the mdt works, you have connectivity. And indeed, in our case, the problem wasn't router configs or connectivity, but rather our lustre filesystem, which needed a --writeconf because the client had previously connected without the router (meaning directly via tcp rather than o2ib). On 07/11/2016 12:11 PM, Oucharek, Doug S wrote: The router is the not the issue and would be working fine “if” the file system were using the correct NID. So, for example, before having an IB network for the servers, I suspect you had an MDT accessed via NID: 10.7.29.130@tcp. When moving it to the IB network, it should become something like: 10.7.129.130@o2ib0. The file system configuration will still try to get the clients to use 10.7.29.130@tcp until it is updated to the new NID. But I can say that your LNet configurations are correct and will work once the file system starts using the correct NIDs. Here's how to do that (from this link: http://wiki.old.lustre.org/manual/LustreManual20_HTML/LustreMaintenance.html#50438199_31353) Section 14.5 Changing a Server NID To change a server NID: 1.Update the LNET configuration in the /etc/modprobe.conf file so the list of server NIDs (lctl list_nids) is correct.The lctl list_nids command indicates which network(s) are configured to work with Lustre. 2.Shut down the file system in this order: a.Unmount the clients. b. Unmount the MDT. c. Unmount all OSTs. 3.Run the writeconf command on all servers. Run writeconf on the MDT first, and then the OSTs. a.On the MDT, run: $ tunefs.lustre --writeconf b. On each OST, run: $ tunefs.lustre --writeconf c. If the NID on the MGS was changed, communicate the new MGS location to each server. Run: tunefs.lustre --erase-param --mgsnode=<new_nid(s)> --writeconf /dev/.. 4.Restart the file system in this order: a.Mount the MGS (or the combined MGS/MDT). b. Mount the MDT. c. Mount the OSTs. d. Mount the clients. After the writeconf command is run, the configuration logs are re-generated as servers restart, and server NIDs in the updated list_nids file are used. This worked for us, and we were at last able to mount the client via the router! Thanks lustre experts for being there!!! Jessica On 07/11/2016 10:34 AM, Jessica Otey wrote: All, I am, as before, working on a small test lustre setup (RHEL 6.8, lustre v. 2.4.3) to prepare for upgrading at 1.8.9 lustre production system to 2.4.3 (first the servers and lnet routers, then at a subsequent time, the clients). Lustre servers have IB connections, but the clients are 1G ethernet only. For the life of me, I cannot get the client to mount via the router on this test system. (Client will mount fine when router is taken out of the equation.) This is the error I am seeing in the syslog from the mount attempt: Jul 11 10:15:37 tlclient kernel: Lustre: 3605:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1468246532/real 1468246532] req@88032a3f9400 x1539566484848752/t0(0) o38->tlustre-MDT-mdc-88032ad20400@10.7.29.130@tcp:12/10 lens 400/544 e 0 to 1 dl 1468246537 ref 1 fl Rpc:XN/0/ rc 0/-1 Jul 11 10:16:07 tlclient kernel: Lustre: 3605:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1468246557/real 1468246557] req@880629819000 x1539566484848764/t0(0) o38->tlustre-MDT-mdc-88032ad20400@10.7.29.130@tcp:12/10 lens 400/544 e 0 to 1 dl 1468246567 ref 1 fl Rpc:XN/0/ rc 0/-1 Jul 11 10:16:37 tlclient kernel: Lustre: 3605:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1468246582/real 1468246582] req@88062a371000 x1539566484848772/t0(0) o38->tlustre-MDT-mdc-88032ad20400@10.7.29.130@tcp:12/10 lens 400/544 e 0 to 1 dl 1468246597 ref 1 fl Rpc:XN/0/ rc 0/-1 Jul 11 10:16:44 tlclient kernel: LustreError: 2511:0:(lov_obd.c:937:lov_cleanup()) lov tgt 0 not cleaned! deathrow=0, lovrc=1 Jul 11 10:16:44 tlclient kernel: Lustre: Unmounted tlustre-client Jul 11 10:16:44 tlclient kernel: LustreError: 4881:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount (-4) More than one pair of eyes has looked at the configs and confirmed they look okay. But frankly we've got to be missing something since this should (like lustre on a good day) 'just work'. If anyone has seen this issue before and could give some advice, it'd be appreciated. One major question I have is whether the problem is a configuration issue or a procedure issue--perhaps the order in which I am doing things is causing the failure? The order I'm following currently is: 1)
[lustre-discuss] difficulties mounting client via an lnet router
ve_router_check_interval="60" dead_router_check_interval="60" tloss ifconfig [root@tloss ~]# ifconfig #lo omitted em1 Link encap:Ethernet HWaddr 78:2B:CB:4A:7A:F8 inet addr:10.7.29.131 Bcast:10.7.29.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7939328 errors:0 dropped:0 overruns:0 frame:0 TX packets:4920595 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:7016088640 (6.5 GiB) TX bytes:447490407 (426.7 MiB) ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.7.129.131 Bcast:10.7.129.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:484688 errors:0 dropped:0 overruns:0 frame:0 TX packets:62465 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:845062706 (805.9 MiB) TX bytes:919378780 (876.7 MiB) tlmds ifconfig [root@tlmds ~]# ifconfig #lo omitted em1 Link encap:Ethernet HWaddr 78:2B:CB:28:1D:00 inet addr:10.7.29.130 Bcast:10.7.29.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7849519 errors:0 dropped:0 overruns:0 frame:0 TX packets:4847566 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:7049031324 (6.5 GiB) TX bytes:484594569 (462.1 MiB) ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.7.129.130 Bcast:10.7.129.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:532171 errors:0 dropped:0 overruns:0 frame:0 TX packets:64114 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:946230130 (902.3 MiB) TX bytes:821297144 (783.2 MiB) -- Jessica Otey System Administrator II North American ALMA Science Center (NAASC) National Radio Astronomy Observatory (NRAO) Charlottesville, Virginia (USA) ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org