[Lustre-discuss] Multihome question : unable to mount lustre over tcp.

2010-12-10 Thread vaibhav pol
Dear Lustre users,
   I am using lustre file system(lustre-1.8.1)  over ib.Now i
required lustre over ib and over Ethernet also.
I modified the modprobe.conf on client , mds(mdt),oss(ost).
I add below line in modprobe.conf.
options lnet networks=o2ib(ib0),tcp1(eth1)
I am able to  mount lustre over ib on client but not able to mount over
Ethernet.
I am getting following error on stdout .

mount.lustre: mount  172.29.2...@tcp1:/home at /mnt failed: No such file
or directory
Is the MGS specification correct?
Is the file system name correct?
If upgrading, is the copied client log valid? (see upgrade docs)

and /var/log/message is

kernel: LustreError: 6943:0:(ldlm_lib.c:329:client_obd_setup()) can't
add initial connection
kernel: LustreError: 6943:0:(obd_config.c:370:class_setup()) setup
home-MDT-mdc-810f39c8ac00 failed (-2)
kernel: LustreError:
6943:0:(obd_config.c:1197:class_config_llog_handler()) Err -2 on cfg
command:
kernel: Lustre:cmd=cf003 0:home-MDT-mdc  1:home-MDT_UUID
2:172.31.65...@o2ib
kernel: LustreError: 15c-8: mgc172.29.2...@tcp1: The configuration from
log 'home-client' failed (-2). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other errors.
See the syslog for more information.
kernel: LustreError: 6933:0:(llite_lib.c:1171:ll_fill_super()) Unable to
process log: -2
kernel: LustreError: 6933:0:(obd_config.c:441:class_cleanup()) Device 2
not setup
kernel: LustreError: 6933:0:(ldlm_request.c:1030:ldlm_cli_cancel_req())
Got rc -108 from cancel RPC: canceling anyway
kernel: LustreError: 6933:0:(ldlm_request.c:1533:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
kernel: Lustre: client 810f39c8ac00 umount complete
kernel: LustreError: 6933:0:(obd_mount.c:1997:lustre_fill_super())
Unable to mount  (-2)

I am able to ping mgs from client .

 lctl list_nids  command output on mgs give following output.

  172.31.65...@o2ib
  172.29.2...@tcp1

also i  able to mount lustre client on mgs  itself  over Ethernet

Following is tunefs.lustre output


1) mgs tunefs.lustre output

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target: MGS
Index:  unassigned
Lustre FS:  scratch
Mount type: ldiskfs
Flags:  0x174
  (MGS needs_index first_time update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:


   Permanent disk data:
Target: MGS
Index:  unassigned
Lustre FS:  scratch
Mount type: ldiskfs
Flags:  0x174
  (MGS needs_index first_time update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr

2) mds(mdt) tunefs.lustre output
 Read previous values:
Target: home-MDT
Index:  0
Lustre FS:  home
Mount type: ldiskfs
Flags:  0x1
  (MDT )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:
mgsnode=172.31.65...@o2ibmdt.group_upcall=/usr/sbin/l_getgroups


   Permanent disk data:
Target: home-MDT
Index:  0
Lustre FS:  home
Mount type: ldiskfs
Flags:  0x1
  (MDT )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:
mgsnode=172.31.65...@o2ibmdt.group_upcall=/usr/sbin/l_getgroups

 3) oss(ost) tunefs.lustre output
   Read previous values:
Target: home-OST0003
Index:  3
Lustre FS:  home
Mount type: ldiskfs
Flags:  0x2
  (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.31.65...@o2ib mdt.quota_type=ug


   Permanent disk data:
Target: home-OST0003
Index:  3
Lustre FS:  home
Mount type: ldiskfs
Flags:  0x2
  (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.31.65...@o2ib mdt.quota_type=ug

exiting before disk write.

I am not able to figure out what is the exact problem.


Thanks and regards
Vaibhi
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Multihome question : unable to mount lustre over tcp.

2010-12-10 Thread Temple Jason
Hi,

You need to run tunefs.lustre on all the servers to add the new nid @tcp nids.

Thanks,

Jason

From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of vaibhav pol
Sent: venerdì, 10. dicembre 2010 09:36
To: lustre-discuss@lists.lustre.org
Subject: [Lustre-discuss] Multihome question : unable to mount lustre over tcp.

Dear Lustre users,
   I am using lustre file system(lustre-1.8.1)  over ib.Now i required 
lustre over ib and over Ethernet also.
I modified the modprobe.conf on client , mds(mdt),oss(ost).
I add below line in modprobe.conf.
options lnet networks=o2ib(ib0),tcp1(eth1)
I am able to  mount lustre over ib on client but not able to mount over 
Ethernet.
I am getting following error on stdout .

mount.lustre: mount  172.29.2...@tcp1:/home at /mnt failed: No such file or 
directory
Is the MGS specification correct?
Is the file system name correct?
If upgrading, is the copied client log valid? (see upgrade docs)

and /var/log/message is

kernel: LustreError: 6943:0:(ldlm_lib.c:329:client_obd_setup()) can't add 
initial connection
kernel: LustreError: 6943:0:(obd_config.c:370:class_setup()) setup 
home-MDT-mdc-810f39c8ac00 failed (-2)
kernel: LustreError: 6943:0:(obd_config.c:1197:class_config_llog_handler()) 
Err -2 on cfg command:
kernel: Lustre:cmd=cf003 0:home-MDT-mdc  1:home-MDT_UUID  
2:172.31.65...@o2ib
kernel: LustreError: 15c-8: mgc172.29.2...@tcp1: The configuration from log 
'home-client' failed (-2). This may be the result of communication errors 
between this node and the MGS, a bad configuration, or other errors. See the 
syslog for more information.
kernel: LustreError: 6933:0:(llite_lib.c:1171:ll_fill_super()) Unable to 
process log: -2
kernel: LustreError: 6933:0:(obd_config.c:441:class_cleanup()) Device 2 not 
setup
kernel: LustreError: 6933:0:(ldlm_request.c:1030:ldlm_cli_cancel_req()) Got 
rc -108 from cancel RPC: canceling anyway
kernel: LustreError: 6933:0:(ldlm_request.c:1533:ldlm_cli_cancel_list()) 
ldlm_cli_cancel_list: -108
kernel: Lustre: client 810f39c8ac00 umount complete
kernel: LustreError: 6933:0:(obd_mount.c:1997:lustre_fill_super()) Unable 
to mount  (-2)

I am able to ping mgs from client .

 lctl list_nids  command output on mgs give following output.

  172.31.65...@o2ib
  172.29.2...@tcp1

also i  able to mount lustre client on mgs  itself  over Ethernet

Following is tunefs.lustre output


1) mgs tunefs.lustre output

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target: MGS
Index:  unassigned
Lustre FS:  scratch
Mount type: ldiskfs
Flags:  0x174
  (MGS needs_index first_time update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:


   Permanent disk data:
Target: MGS
Index:  unassigned
Lustre FS:  scratch
Mount type: ldiskfs
Flags:  0x174
  (MGS needs_index first_time update writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr

2) mds(mdt) tunefs.lustre output
 Read previous values:
Target: home-MDT
Index:  0
Lustre FS:  home
Mount type: ldiskfs
Flags:  0x1
  (MDT )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mgsnode=172.31.65...@o2ib 
mdt.group_upcall=/usr/sbin/l_getgroups


   Permanent disk data:
Target: home-MDT
Index:  0
Lustre FS:  home
Mount type: ldiskfs
Flags:  0x1
  (MDT )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mgsnode=172.31.65...@o2ib 
mdt.group_upcall=/usr/sbin/l_getgroups

 3) oss(ost) tunefs.lustre output
   Read previous values:
Target: home-OST0003
Index:  3
Lustre FS:  home
Mount type: ldiskfs
Flags:  0x2
  (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.31.65...@o2ib mdt.quota_type=ug


   Permanent disk data:
Target: home-OST0003
Index:  3
Lustre FS:  home
Mount type: ldiskfs
Flags:  0x2
  (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.31.65...@o2ib mdt.quota_type=ug

exiting before disk write.

I am not able to figure out what is the exact problem.


Thanks and regards
Vaibhi
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org

Re: [Lustre-discuss] Patchless client on RHEL6

2010-12-10 Thread Götz Waschk
On Fri, Dec 10, 2010 at 2:29 PM, Sébastien Buisson
sebastien.buis...@bull.net wrote:
 Maybe you've hit the problem addressed by attachment
 https://bugzilla.lustre.org/attachment.cgi?id=30289 from bug 22375.
 It was initially designed for Lustre 2.0, but maybe you could adapt it for
 1.8.
Hi,

I have rediffed the patch for Lustre 1.8.5, ran the auto* tools and
after that I got the same build error as above. By the way, git master
of Lustre 2.x fails in the same way.

Regards, Götz Waschk
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] finding performance issues

2010-12-10 Thread Cliff White
On 12/10/2010 11:42 AM, Brock Palen wrote:
 We have an lustre 1.6.x filesystem,

1.6 has been dead for well over a year. End Of Life.

 4 OSS,  3 x4500 and 1 ddn s2a6620

 Each oss has 4 1gig interfaces bonded, or 1 10gig interface.

 I have a user who is running a few hundred serial jobs that are all accessing 
 the same 16GB file, we striped the file over all the osts, and are tapped at 
 500-600MB/s no mater the number of hosts running.   IO per OST is around 
 15-20MB/s  (31 total ost's)

 This set of jobs keeps reading in the same data set, and has been running for 
 about 24 hours (the group of about 900 total jobs).

 *  Is there a recommendation of a better way to do these sorts of jobs?

Upgrade to the latest release of Lustre.

  The compute nodes have 48GB of ram, he does not use much ram for the 
job just all the IO.

 * Is there a better way to tune?

Yes, you upgrade to the code that has all the tuning fixes/enhancements 
- Lustre 1.8

  What should I be looking for to tune?
You are wasting your time tuning here.
1.8 supports many things, including cache on OSTs which would likely 
help bunches in your case.

cliffw


 Thanks!

 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 bro...@umich.edu
 (734)936-1985



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] finding performance issues

2010-12-10 Thread Andreas Dilger
On 2010-12-10, at 12:42, Brock Palen wrote:
 We have an lustre 1.6.x filesystem,
 
 4 OSS,  3 x4500 and 1 ddn s2a6620
 
 Each oss has 4 1gig interfaces bonded, or 1 10gig interface.
 
 I have a user who is running a few hundred serial jobs that are all accessing 
 the same 16GB file, we striped the file over all the osts, and are tapped at 
 500-600MB/s no mater the number of hosts running.   IO per OST is around 
 15-20MB/s  (31 total ost's) 

How big is the IO size?  Are all the clients both reading and writing this same 
file?  Presumably you see better performance when so many jobs are not running 
against the filesystem?

 This set of jobs keeps reading in the same data set, and has been running for 
 about 24 hours (the group of about 900 total jobs).
 
 *  Is there a recommendation of a better way to do these sorts of jobs?  The 
 compute nodes have 48GB of ram, he does not use much ram for the job just all 
 the IO.

I agree with Cliff that the 1.8 OSS read cache will probably help the 
performance in this case.  OSS read cache does not need a client-side upgrade 
to work, though of course I'd suggest upgrading the clients anyway.

1.8.5 was just released this week...

 * Is there a better way to tune?  What should I be looking for to tune?

Start by looking at /proc/fs/lustre/obdfilter/*/brw_stats on the OSTs.  It 
should be reset before the job (echo 0 to each file) so you get stats relevant 
to that job only.  You can also check iostat on the OSS nodes to see how busy 
the disks are.  They may be imbalanced due to being different hardware, and 
will only go as fast as the slowest OSTs.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss