[Lustre-discuss] Multihome question : unable to mount lustre over tcp.
Dear Lustre users, I am using lustre file system(lustre-1.8.1) over ib.Now i required lustre over ib and over Ethernet also. I modified the modprobe.conf on client , mds(mdt),oss(ost). I add below line in modprobe.conf. options lnet networks=o2ib(ib0),tcp1(eth1) I am able to mount lustre over ib on client but not able to mount over Ethernet. I am getting following error on stdout . mount.lustre: mount 172.29.2...@tcp1:/home at /mnt failed: No such file or directory Is the MGS specification correct? Is the file system name correct? If upgrading, is the copied client log valid? (see upgrade docs) and /var/log/message is kernel: LustreError: 6943:0:(ldlm_lib.c:329:client_obd_setup()) can't add initial connection kernel: LustreError: 6943:0:(obd_config.c:370:class_setup()) setup home-MDT-mdc-810f39c8ac00 failed (-2) kernel: LustreError: 6943:0:(obd_config.c:1197:class_config_llog_handler()) Err -2 on cfg command: kernel: Lustre:cmd=cf003 0:home-MDT-mdc 1:home-MDT_UUID 2:172.31.65...@o2ib kernel: LustreError: 15c-8: mgc172.29.2...@tcp1: The configuration from log 'home-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. kernel: LustreError: 6933:0:(llite_lib.c:1171:ll_fill_super()) Unable to process log: -2 kernel: LustreError: 6933:0:(obd_config.c:441:class_cleanup()) Device 2 not setup kernel: LustreError: 6933:0:(ldlm_request.c:1030:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway kernel: LustreError: 6933:0:(ldlm_request.c:1533:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 kernel: Lustre: client 810f39c8ac00 umount complete kernel: LustreError: 6933:0:(obd_mount.c:1997:lustre_fill_super()) Unable to mount (-2) I am able to ping mgs from client . lctl list_nids command output on mgs give following output. 172.31.65...@o2ib 172.29.2...@tcp1 also i able to mount lustre client on mgs itself over Ethernet Following is tunefs.lustre output 1) mgs tunefs.lustre output checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: MGS Index: unassigned Lustre FS: scratch Mount type: ldiskfs Flags: 0x174 (MGS needs_index first_time update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: Permanent disk data: Target: MGS Index: unassigned Lustre FS: scratch Mount type: ldiskfs Flags: 0x174 (MGS needs_index first_time update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr 2) mds(mdt) tunefs.lustre output Read previous values: Target: home-MDT Index: 0 Lustre FS: home Mount type: ldiskfs Flags: 0x1 (MDT ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.31.65...@o2ibmdt.group_upcall=/usr/sbin/l_getgroups Permanent disk data: Target: home-MDT Index: 0 Lustre FS: home Mount type: ldiskfs Flags: 0x1 (MDT ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.31.65...@o2ibmdt.group_upcall=/usr/sbin/l_getgroups 3) oss(ost) tunefs.lustre output Read previous values: Target: home-OST0003 Index: 3 Lustre FS: home Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=172.31.65...@o2ib mdt.quota_type=ug Permanent disk data: Target: home-OST0003 Index: 3 Lustre FS: home Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=172.31.65...@o2ib mdt.quota_type=ug exiting before disk write. I am not able to figure out what is the exact problem. Thanks and regards Vaibhi ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Multihome question : unable to mount lustre over tcp.
Hi, You need to run tunefs.lustre on all the servers to add the new nid @tcp nids. Thanks, Jason From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of vaibhav pol Sent: venerdì, 10. dicembre 2010 09:36 To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Multihome question : unable to mount lustre over tcp. Dear Lustre users, I am using lustre file system(lustre-1.8.1) over ib.Now i required lustre over ib and over Ethernet also. I modified the modprobe.conf on client , mds(mdt),oss(ost). I add below line in modprobe.conf. options lnet networks=o2ib(ib0),tcp1(eth1) I am able to mount lustre over ib on client but not able to mount over Ethernet. I am getting following error on stdout . mount.lustre: mount 172.29.2...@tcp1:/home at /mnt failed: No such file or directory Is the MGS specification correct? Is the file system name correct? If upgrading, is the copied client log valid? (see upgrade docs) and /var/log/message is kernel: LustreError: 6943:0:(ldlm_lib.c:329:client_obd_setup()) can't add initial connection kernel: LustreError: 6943:0:(obd_config.c:370:class_setup()) setup home-MDT-mdc-810f39c8ac00 failed (-2) kernel: LustreError: 6943:0:(obd_config.c:1197:class_config_llog_handler()) Err -2 on cfg command: kernel: Lustre:cmd=cf003 0:home-MDT-mdc 1:home-MDT_UUID 2:172.31.65...@o2ib kernel: LustreError: 15c-8: mgc172.29.2...@tcp1: The configuration from log 'home-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. kernel: LustreError: 6933:0:(llite_lib.c:1171:ll_fill_super()) Unable to process log: -2 kernel: LustreError: 6933:0:(obd_config.c:441:class_cleanup()) Device 2 not setup kernel: LustreError: 6933:0:(ldlm_request.c:1030:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway kernel: LustreError: 6933:0:(ldlm_request.c:1533:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 kernel: Lustre: client 810f39c8ac00 umount complete kernel: LustreError: 6933:0:(obd_mount.c:1997:lustre_fill_super()) Unable to mount (-2) I am able to ping mgs from client . lctl list_nids command output on mgs give following output. 172.31.65...@o2ib 172.29.2...@tcp1 also i able to mount lustre client on mgs itself over Ethernet Following is tunefs.lustre output 1) mgs tunefs.lustre output checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: MGS Index: unassigned Lustre FS: scratch Mount type: ldiskfs Flags: 0x174 (MGS needs_index first_time update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: Permanent disk data: Target: MGS Index: unassigned Lustre FS: scratch Mount type: ldiskfs Flags: 0x174 (MGS needs_index first_time update writeconf ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr 2) mds(mdt) tunefs.lustre output Read previous values: Target: home-MDT Index: 0 Lustre FS: home Mount type: ldiskfs Flags: 0x1 (MDT ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.31.65...@o2ib mdt.group_upcall=/usr/sbin/l_getgroups Permanent disk data: Target: home-MDT Index: 0 Lustre FS: home Mount type: ldiskfs Flags: 0x1 (MDT ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: mgsnode=172.31.65...@o2ib mdt.group_upcall=/usr/sbin/l_getgroups 3) oss(ost) tunefs.lustre output Read previous values: Target: home-OST0003 Index: 3 Lustre FS: home Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=172.31.65...@o2ib mdt.quota_type=ug Permanent disk data: Target: home-OST0003 Index: 3 Lustre FS: home Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=172.31.65...@o2ib mdt.quota_type=ug exiting before disk write. I am not able to figure out what is the exact problem. Thanks and regards Vaibhi ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org
Re: [Lustre-discuss] Patchless client on RHEL6
On Fri, Dec 10, 2010 at 2:29 PM, Sébastien Buisson sebastien.buis...@bull.net wrote: Maybe you've hit the problem addressed by attachment https://bugzilla.lustre.org/attachment.cgi?id=30289 from bug 22375. It was initially designed for Lustre 2.0, but maybe you could adapt it for 1.8. Hi, I have rediffed the patch for Lustre 1.8.5, ran the auto* tools and after that I got the same build error as above. By the way, git master of Lustre 2.x fails in the same way. Regards, Götz Waschk ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] finding performance issues
On 12/10/2010 11:42 AM, Brock Palen wrote: We have an lustre 1.6.x filesystem, 1.6 has been dead for well over a year. End Of Life. 4 OSS, 3 x4500 and 1 ddn s2a6620 Each oss has 4 1gig interfaces bonded, or 1 10gig interface. I have a user who is running a few hundred serial jobs that are all accessing the same 16GB file, we striped the file over all the osts, and are tapped at 500-600MB/s no mater the number of hosts running. IO per OST is around 15-20MB/s (31 total ost's) This set of jobs keeps reading in the same data set, and has been running for about 24 hours (the group of about 900 total jobs). * Is there a recommendation of a better way to do these sorts of jobs? Upgrade to the latest release of Lustre. The compute nodes have 48GB of ram, he does not use much ram for the job just all the IO. * Is there a better way to tune? Yes, you upgrade to the code that has all the tuning fixes/enhancements - Lustre 1.8 What should I be looking for to tune? You are wasting your time tuning here. 1.8 supports many things, including cache on OSTs which would likely help bunches in your case. cliffw Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] finding performance issues
On 2010-12-10, at 12:42, Brock Palen wrote: We have an lustre 1.6.x filesystem, 4 OSS, 3 x4500 and 1 ddn s2a6620 Each oss has 4 1gig interfaces bonded, or 1 10gig interface. I have a user who is running a few hundred serial jobs that are all accessing the same 16GB file, we striped the file over all the osts, and are tapped at 500-600MB/s no mater the number of hosts running. IO per OST is around 15-20MB/s (31 total ost's) How big is the IO size? Are all the clients both reading and writing this same file? Presumably you see better performance when so many jobs are not running against the filesystem? This set of jobs keeps reading in the same data set, and has been running for about 24 hours (the group of about 900 total jobs). * Is there a recommendation of a better way to do these sorts of jobs? The compute nodes have 48GB of ram, he does not use much ram for the job just all the IO. I agree with Cliff that the 1.8 OSS read cache will probably help the performance in this case. OSS read cache does not need a client-side upgrade to work, though of course I'd suggest upgrading the clients anyway. 1.8.5 was just released this week... * Is there a better way to tune? What should I be looking for to tune? Start by looking at /proc/fs/lustre/obdfilter/*/brw_stats on the OSTs. It should be reset before the job (echo 0 to each file) so you get stats relevant to that job only. You can also check iostat on the OSS nodes to see how busy the disks are. They may be imbalanced due to being different hardware, and will only go as fast as the slowest OSTs. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss