[Lustre-discuss] Lustre crashes periodically

2013-10-09 Thread Arya Mazaheri
Hi everyone, 
I have a problem lately with our Lustre 1.8 deployment. It crashes periodically 
in a way that the nodes can mount the storage and I can't access the Lustre 
server machine neither. So I have to manually restart the machine every time to 
make everything normal again. I tried to see the logs, memory usage and locks 
count to see whether these issues may have the cause of the problem. But, I 
don't think they account for this issue.
An interesting symptom I see every time this problem happens is the Infiniband 
switch network usage lights which blink very fast. I think a huge traffic on 
the Infiniband network to the lustre server may cause the server crash. Does 
this relevance seems logical?

Anyway, I hope some of you may have experience this problem before and could 
help me understand what is happening and how to avoid crashing the server again!

Thanks,___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre crashes periodically

2013-10-09 Thread Arya Mazaheri
Sorry, I have to correct this:  the nodes CANNOT mount the storage and I can't 
access the Lustre server machine neither.


On Wednesday ۱۷ July ۱۳۹۲ at ۱۱:۲۱, Arya Mazaheri wrote:

 Hi everyone,  
 I have a problem lately with our Lustre 1.8 deployment. It crashes 
 periodically in a way that the nodes can mount the storage and I can't access 
 the Lustre server machine neither. So I have to manually restart the machine 
 every time to make everything normal again. I tried to see the logs, memory 
 usage and locks count to see whether these issues may have the cause of the 
 problem. But, I don't think they account for this issue.
 An interesting symptom I see every time this problem happens is the 
 Infiniband switch network usage lights which blink very fast. I think a huge 
 traffic on the Infiniband network to the lustre server may cause the server 
 crash. Does this relevance seems logical?
  
 Anyway, I hope some of you may have experience this problem before and could 
 help me understand what is happening and how to avoid crashing the server 
 again!
  
 Thanks,  

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Kernel Panic error while running lustre 2.0 with infiniband

2011-02-21 Thread Arya Mazaheri
Hi there,
I have configured and ran lustre 2.0 with tcp (OSS and MDS on on the same
server) without problem. Now I am trying to run lustre with infiniband
support. but whenever I mount the mdt storage on server, the process ends
with following error:
kernel panic - not syncing: fatal exception

my /etc/modprobe.conf is:
options lnet networks=o2ib0(ib0)

last lines of dmesg:

kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is
recommended
LDISKFS FS on sda2, internal journal
LDISKFS-fs: recovery complete.
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is
recommended
LDISKFS FS on sda2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Can't query IPoIB
interface ib0: it's down
LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Skipped 1 previous
similar message
eth0: no IPv6 routers present
LustreError: 105-4: Error -100 starting up LNI o2ib
LustreError: Skipped 1 previous similar message
LustreError: 6041:0:(events.c:731:ptlrpc_init_portals()) network
initialisation failed
LustreError: 158-c: Can't load module 'mgs'
LustreError: 6035:0:(genops.c:286:class_newdev()) OBD: unknown type: mgs
LustreError: 6035:0:(obd_config.c:300:class_attach()) Cannot create device
MGS of type mgs : -19
LustreError: 6035:0:(obd_mount.c:502:lustre_start_simple()) MGS attach error
-19
LustreError: 15e-a: Failed to start MGS 'MGS' (-19). Is the 'mgs' module
loaded?
LustreError: 6035:0:(obd_mount.c:1492:server_put_super()) no obd
lustre-MDT
LustreError: 6035:0:(obd_mount.c:137:server_deregister_mount())
lustre-MDT not registered
Lustre: server umount lustre-MDT complete
LustreError: 6035:0:(obd_mount.c:2136:lustre_fill_super()) Unable to mount
(-19)
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is
recommended
LDISKFS FS on sdb1, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is
recommended
LDISKFS FS on sdb1, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
LustreError: 6117:0:(events.c:731:ptlrpc_init_portals()) network
initialisation failed
LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0
lost
LDISKFS-fs: mballoc: 0 generated and it took 0
LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is
recommended
LDISKFS FS on sdb2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is
recommended
LDISKFS FS on sdb2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
LustreError: 6193:0:(events.c:731:ptlrpc_init_portals()) network
initialisation failed
LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0
lost
LDISKFS-fs: mballoc: 0 generated and it took 0
LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is
recommended
LDISKFS FS on sdb3, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS-fs warning: maximal mount count reached, running e2fsck is
recommended
LDISKFS FS on sdb3, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
LustreError: 6269:0:(o2iblnd.c:2501:kiblnd_startup()) Can't query IPoIB
interface ib0: it's down
LustreError: 6269:0:(o2iblnd.c:2501:kiblnd_startup()) Skipped 2 previous
similar messages
LustreError: 105-4: Error -100 starting up LNI o2ib
LustreError: Skipped 2 previous similar messages
LustreError: 6269:0:(events.c:731:ptlrpc_init_portals()) network
initialisation failed
LustreError: 158-c: Can't load module 'mgc'
LustreError: Skipped 2 previous similar messages
LustreError: 6263:0:(genops.c:286:class_newdev()) OBD: unknown type: mgc
LustreError: 6263:0:(genops.c:286:class_newdev()) Skipped 2 previous similar
messages
LustreError: 6263:0:(obd_config.c:300:class_attach()) Cannot create device
MGC0@lo of type mgc : -19
LustreError: 6263:0:(obd_config.c:300:class_attach()) Skipped 2 previous
similar messages
LustreError: 6263:0:(obd_mount.c:502:lustre_start_simple()) MGC0@lo attach
error -19

Re: [Lustre-discuss] Installing Lustre client on 2.6.18-194 kernel

2011-02-21 Thread Arya Mazaheri
Thanks Albert, I really appreciate you...
Now everything is working...


On Mon, Feb 21, 2011 at 7:44 PM, Albert Everett aeever...@ualr.edu wrote:

 Here's what's in our /etc/modprobe.conf related to IB and lustre:

 options ib_mthca msi_x=1
 options lnet networks=o2ib0(ib0)
 options ko2iblnd ipif_name=ib0

 We have Mellanox Infinihost (III?) DDR cards and IPs defined for them.

 $ /sbin/ifconfig ib0
 ib0   Link encap:InfiniBand  HWaddr
 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
  inet addr:192.168.2.1  Bcast:192.168.2.255  Mask:255.255.255.0
  inet6 addr: fe80::202:c902:29:b341/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
  RX packets:1719 errors:0 dropped:0 overruns:0 frame:0
  TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:96564 (94.3 KiB)  TX bytes:2420 (2.3 KiB)

 Albert


 On Feb 20, 2011, at 2:20 PM, Arya Mazaheri wrote:

  I have done what you said. I will test my client to the server tomorrow.
 but would you tell me the tweaks you have done on /etc/modprobe.conf ?

 On Sun, Feb 20, 2011 at 1:56 AM, Albert Everett aeever...@ualr.edu
 wrote:
 For lustre client, we did not need to alter our kernel at all. We just
 made and installed lustre-1.8.5 and lustre-modules-1.8.5 rpms.
 /etc/modprobe.conf needs a tweak.

 For lustre server, I believe you will need to deal with a patched kernel.
 We have not been down this road yet since our vendor includes lustre server
 software with their hardware.

 Albert


 On Feb 19, 2011, at 12:18 PM, Arya Mazaheri wrote:

 Hi Albert,
 It seems that you have made a new kernel in order to run lustre on
 clients. Am I right?
 I don't want to change kernel on clients at all...

 On Sat, Feb 19, 2011 at 8:57 PM, Albert Everett aeever...@ualr.edu
 wrote:
 Our kernel is also 2.6.18_194.17.4.el5.

 We installed OFED 1.5.2 from source, following this guide:


 https://wiki.rocksclusters.org/wiki/index.php/Install_OFED_1.5.x_on_a_Rocks_5.3_cluster

 ... which left us, among other things, a folder /usr/src/ofa_kernel.

 Lustre on the server side is handled by our vendor, so all we needed to
 worry about is the client.

 To build a lustre client, we then installed lustre-1.8.5.tar.gz from
 source, not from rpms. Our first compile produced the error you show below.
 # ./configure --with-linux=/lib/modules/`uname -r`/build
 # make rpms

 To get the lustre installation to use our new OFED, we tried this and it
 worked.

 # ./configure --with-o2ib=/usr/src/ofa_kernel
 --with-linux=/lib/modules/`uname -r`/build
 # make rpms

 RPMs showed up in /usr/src/redhat/RPMS/x86_64, and we are using
 lustre-1.8.5*.rpm and lustre-modules-*.rpm on our client machines.

 Albert


 On Feb 19, 2011, at 8:34 AM, Arya Mazaheri wrote:

 Hi,
 I have installed lustre client packages on a client node. But it doesn't
 mount the lustre file system from lustre server. It gets the following
 famous error:

 $ mount -t lustre 192.168.0.1:/lustre /mnt/lustre
 mount.lustre: mount 172.16.113.232:/lustre at /mnt/lustre failed: No such
 device
 Are the lustre modules loaded?
 Check /etc/modprobe.conf and /proc/filesystems
 Note 'alias lustre llite' should be removed from modprobe.conf


 As I was searching through the mailing list, I have noticed that lustre.ko
 should be present in this directory:
 /lib/modules/2.6.18-194.17.4.el5/kernel/fs/lustre/lustre.ko

 My current kernel is 2.6.18-194.17.4.el5. but lustre.ko is in
 2.6.18-164.11.1.el5 instead. So I guessed that this may be the source of
 problem.

 Any ideas?

 Thanks
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss






___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Kernel Panic error while running lustre 2.0 with infiniband

2011-02-21 Thread Arya Mazaheri
yep! you're right...

On Mon, Feb 21, 2011 at 11:19 PM, Albert Everett aeever...@ualr.edu wrote:

 Here's all we needed. Yours is probably similar.

 # cat /etc/sysconfig/network-scripts/ifcfg-ib0
 DEVICE=ib0
 IPADDR=192.168.2.1
 NETMASK=255.255.255.0
 BOOTPROTO=static
 ONBOOT=yes


 On Feb 21, 2011, at 1:17 PM, Arya Mazaheri wrote:

  problem solved. I was trying to set the IP of ib0 by this command:
 ifconfig ib0 192.168.1.1 netmask 255.255.255.0 up

 but, it leads to the kernel panic. So I tried to set the IP address by
 adding to network-scripts. So, it works now...
 I really don't know why setting IP with ifconfig doesn't work. So weird...


 On Mon, Feb 21, 2011 at 7:31 PM, Albert Everett aeever...@ualr.edu
 wrote:
 What's output of

 # ifconfig ib0

 Albert


 On Feb 21, 2011, at 6:27 AM, Arya Mazaheri wrote:

 Hi there,
 I have configured and ran lustre 2.0 with tcp (OSS and MDS on on the same
 server) without problem. Now I am trying to run lustre with infiniband
 support. but whenever I mount the mdt storage on server, the process ends
 with following error:
 kernel panic - not syncing: fatal exception

 my /etc/modprobe.conf is:
 options lnet networks=o2ib0(ib0)

 last lines of dmesg:
 
 kjournald starting.  Commit interval 5 seconds
 LDISKFS-fs warning: maximal mount count reached, running e2fsck is
 recommended
 LDISKFS FS on sda2, internal journal
 LDISKFS-fs: recovery complete.
 LDISKFS-fs: mounted filesystem with ordered data mode.
 kjournald starting.  Commit interval 5 seconds
 LDISKFS-fs warning: maximal mount count reached, running e2fsck is
 recommended
 LDISKFS FS on sda2, internal journal
 LDISKFS-fs: mounted filesystem with ordered data mode.
 LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Can't query IPoIB
 interface ib0: it's down
 LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Skipped 1 previous
 similar message
 eth0: no IPv6 routers present
 LustreError: 105-4: Error -100 starting up LNI o2ib
 LustreError: Skipped 1 previous similar message
 LustreError: 6041:0:(events.c:731:ptlrpc_init_portals()) network
 initialisation failed
 LustreError: 158-c: Can't load module 'mgs'
 LustreError: 6035:0:(genops.c:286:class_newdev()) OBD: unknown type: mgs
 LustreError: 6035:0:(obd_config.c:300:class_attach()) Cannot create device
 MGS of type mgs : -19
 LustreError: 6035:0:(obd_mount.c:502:lustre_start_simple()) MGS attach
 error -19
 LustreError: 15e-a: Failed to start MGS 'MGS' (-19). Is the 'mgs' module
 loaded?
 LustreError: 6035:0:(obd_mount.c:1492:server_put_super()) no obd
 lustre-MDT
 LustreError: 6035:0:(obd_mount.c:137:server_deregister_mount())
 lustre-MDT not registered
 Lustre: server umount lustre-MDT complete
 LustreError: 6035:0:(obd_mount.c:2136:lustre_fill_super()) Unable to mount
  (-19)
 kjournald starting.  Commit interval 5 seconds
 LDISKFS-fs warning: maximal mount count reached, running e2fsck is
 recommended
 LDISKFS FS on sdb1, internal journal
 LDISKFS-fs: mounted filesystem with ordered data mode.
 kjournald starting.  Commit interval 5 seconds
 LDISKFS-fs warning: maximal mount count reached, running e2fsck is
 recommended
 LDISKFS FS on sdb1, internal journal
 LDISKFS-fs: mounted filesystem with ordered data mode.
 LDISKFS-fs: file extents enabled
 LDISKFS-fs: mballoc enabled
 LustreError: 6117:0:(events.c:731:ptlrpc_init_portals()) network
 initialisation failed
 LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
 LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks,
 0 lost
 LDISKFS-fs: mballoc: 0 generated and it took 0
 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
 kjournald starting.  Commit interval 5 seconds
 LDISKFS-fs warning: maximal mount count reached, running e2fsck is
 recommended
 LDISKFS FS on sdb2, internal journal
 LDISKFS-fs: mounted filesystem with ordered data mode.
 kjournald starting.  Commit interval 5 seconds
 LDISKFS-fs warning: maximal mount count reached, running e2fsck is
 recommended
 LDISKFS FS on sdb2, internal journal
 LDISKFS-fs: mounted filesystem with ordered data mode.
 LDISKFS-fs: file extents enabled
 LDISKFS-fs: mballoc enabled
 LustreError: 6193:0:(events.c:731:ptlrpc_init_portals()) network
 initialisation failed
 LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
 LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks,
 0 lost
 LDISKFS-fs: mballoc: 0 generated and it took 0
 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
 kjournald starting.  Commit interval 5 seconds
 LDISKFS-fs warning: maximal mount count reached, running e2fsck is
 recommended
 LDISKFS FS on sdb3, internal journal
 LDISKFS-fs: mounted filesystem with ordered data mode.
 kjournald starting.  Commit interval 5 seconds
 LDISKFS-fs warning: maximal mount count reached, running e2fsck is
 recommended
 LDISKFS FS on sdb3, internal journal
 LDISKFS-fs: mounted filesystem with ordered data mode.
 LDISKFS-fs: file extents enabled

Re: [Lustre-discuss] Installing Lustre client on 2.6.18-194 kernel

2011-02-20 Thread Arya Mazaheri
I have done what you said. I will test my client to the server tomorrow. but
would you tell me the tweaks you have done on /etc/modprobe.conf ?

On Sun, Feb 20, 2011 at 1:56 AM, Albert Everett aeever...@ualr.edu wrote:

 For lustre client, we did not need to alter our kernel at all. We just made
 and installed lustre-1.8.5 and lustre-modules-1.8.5 rpms. /etc/modprobe.conf
 needs a tweak.

 For lustre server, I believe you will need to deal with a patched kernel.
 We have not been down this road yet since our vendor includes lustre server
 software with their hardware.

 Albert


 On Feb 19, 2011, at 12:18 PM, Arya Mazaheri wrote:

  Hi Albert,
 It seems that you have made a new kernel in order to run lustre on
 clients. Am I right?
 I don't want to change kernel on clients at all...

 On Sat, Feb 19, 2011 at 8:57 PM, Albert Everett aeever...@ualr.edu
 wrote:
 Our kernel is also 2.6.18_194.17.4.el5.

 We installed OFED 1.5.2 from source, following this guide:


 https://wiki.rocksclusters.org/wiki/index.php/Install_OFED_1.5.x_on_a_Rocks_5.3_cluster

 ... which left us, among other things, a folder /usr/src/ofa_kernel.

 Lustre on the server side is handled by our vendor, so all we needed to
 worry about is the client.

 To build a lustre client, we then installed lustre-1.8.5.tar.gz from
 source, not from rpms. Our first compile produced the error you show below.
 # ./configure --with-linux=/lib/modules/`uname -r`/build
 # make rpms

 To get the lustre installation to use our new OFED, we tried this and it
 worked.

 # ./configure --with-o2ib=/usr/src/ofa_kernel
 --with-linux=/lib/modules/`uname -r`/build
 # make rpms

 RPMs showed up in /usr/src/redhat/RPMS/x86_64, and we are using
 lustre-1.8.5*.rpm and lustre-modules-*.rpm on our client machines.

 Albert


 On Feb 19, 2011, at 8:34 AM, Arya Mazaheri wrote:

 Hi,
 I have installed lustre client packages on a client node. But it doesn't
 mount the lustre file system from lustre server. It gets the following
 famous error:

 $ mount -t lustre 192.168.0.1:/lustre /mnt/lustre
 mount.lustre: mount 172.16.113.232:/lustre at /mnt/lustre failed: No such
 device
 Are the lustre modules loaded?
 Check /etc/modprobe.conf and /proc/filesystems
 Note 'alias lustre llite' should be removed from modprobe.conf


 As I was searching through the mailing list, I have noticed that lustre.ko
 should be present in this directory:
 /lib/modules/2.6.18-194.17.4.el5/kernel/fs/lustre/lustre.ko

 My current kernel is 2.6.18-194.17.4.el5. but lustre.ko is in
 2.6.18-164.11.1.el5 instead. So I guessed that this may be the source of
 problem.

 Any ideas?

 Thanks
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Running MGS and OSS on the same machine

2011-02-19 Thread Arya Mazaheri
Yep, I have fixed it with this command instead:
mkfs.lustre --fsname lustre --ost --mgsnode=0@lo /dev/sdb1

2011/2/18 Charland, Denis denis.charl...@imi.cnrc-nrc.gc.ca

 Arya,



 I have the MGS, the MDT and the OST all on the same machine and everything
 works fine. It should not be a problem to have the MGS and the OST on the
 same machine.



 Are your MGS and MDT mounted when you execute mkfs.lustre for the OST?



 Denis



 *Denis Charland, ing. **| P. Eng.***

 Administrateur de Systèmes UNIX | UNIX Systems Administrator

 Tél. | tel. (450) 641-5078 Fax (450) 641-5106

 Courriel | E-mail : denis.charl...@cnrc-nrc.gc.ca



 Institut des matériaux industriels | Industrial Materials Institute

 Conseil national de recherches Canada | National Research Council Canada

 75, de Mortagne, Boucherville, Québec, Canada, J4B 6Y4

 Gouvernement du Canada | Government of Canada





 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Running MGS and OSS on the same machine

2011-02-19 Thread Arya Mazaheri
Thanks Michael
It worked with 0@lo.

thanks again for your suggestion...

On Fri, Feb 18, 2011 at 6:55 PM, Michael Kluge
michael.kl...@tu-dresden.dewrote:

 Hi Arya,

 if I remember well, Lustre uses 0@lo for the localhost address. Does
 using the other NID 192.168.0.10@tcp0 give any error message?


 Michael

 Am 18.02.2011 16:10, schrieb Arya Mazaheri:
  Hi again,
  I have planned to use one server as MGS and OSS simultaneously. But how
  can I format the OSTs as lustre FS?
  for example, the line below tells the ost which it's mgsnode is at
  192.168.0.10@tcp0:
  mkfs.lustre --fsname lustre --ost --mgsnode=192.168.0.10@tcp0/dev/vg00/ost1
 
  But, now mgsnode is the same machine. I tried to put localhost instead
  the ip address. but I didn't work.
 
  What shoud I do?
 
  Arya
 
 
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss


 --
 Michael Kluge, M.Sc.

 Technische Universität Dresden
 Center for Information Services and
 High Performance Computing (ZIH)
 D-01062 Dresden
 Germany

 Contact:
 Willersbau, Room WIL A 208
 Phone:  (+49) 351 463-34217
 Fax:(+49) 351 463-37773
 e-mail: michael.kl...@tu-dresden.de
 WWW:http://www.tu-dresden.de/zih
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Installing Lustre client on 2.6.18-194 kernel

2011-02-19 Thread Arya Mazaheri
Well, I want to install client module on rocks 5.5 x86_64.
there are some packages in source section of lustre download area. I am
confused which one to choose? what are the differences between them?
lustre-client-source-2.0.0.1-2.6.18_164.11.1.0.1.el5_lustre.2.0.0.1.i686.rpm
lustre-client-source-2.0.0.1-2.6.16_60_0.42.8_lustre.2.0.0.1_smp.x86_64.rpm
lustre-client-source-2.0.0.1-2.6.27_23_0.1_lustre.2.0.0.1_default.x86_64.rpm
lustre-client-source-2.0.0.1-2.6.18_164.11.1.0.1.el5_lustre.2.0.0.1.x86_64.rpm
lustre-client-source-2.0.0.1-2.6.16_60_0.42.8_lustre.2.0.0.1_bigsmp.i686.rpm
lustre-client-source-2.0.0.1-2.6.27_23_0.1_lustre.2.0.0.1_default.i686.rpm
lustre-client-source-2.0.0.1-2.6.18_164.11.1.el5_lustre.2.0.0.1.i686.rpm
lustre-client-source-2.0.0.1-2.6.18_164.11.1.el5_lustre.2.0.0.1.x86_64.rpm

and one other thing! where is the source for lustre-client-modules?

On Sat, Feb 19, 2011 at 6:08 PM, Brian J. Murrell br...@whamcloud.comwrote:

 On 11-02-19 09:34 AM, Arya Mazaheri wrote:
 
  As I was searching through the mailing list, I have noticed that
 lustre.ko
  should be present in this directory:
  /lib/modules/2.6.18-194.17.4.el5/kernel/fs/lustre/lustre.ko
 
  My current kernel is 2.6.18-194.17.4.el5. but lustre.ko is in
  2.6.18-164.11.1.el5 instead.

 You need the client modules package that matches your kernel.  If one is
 not available you will have to build it from the source.

 b.


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Installing Lustre client on 2.6.18-194 kernel

2011-02-19 Thread Arya Mazaheri
Hi Albert,
It seems that you have made a new kernel in order to run lustre on clients.
Am I right?
I don't want to change kernel on clients at all...

On Sat, Feb 19, 2011 at 8:57 PM, Albert Everett aeever...@ualr.edu wrote:

 Our kernel is also 2.6.18_194.17.4.el5.

 We installed OFED 1.5.2 from source, following this guide:


 https://wiki.rocksclusters.org/wiki/index.php/Install_OFED_1.5.x_on_a_Rocks_5.3_cluster

 ... which left us, among other things, a folder /usr/src/ofa_kernel.

 Lustre on the server side is handled by our vendor, so all we needed to
 worry about is the client.

 To build a lustre client, we then installed lustre-1.8.5.tar.gz from
 source, not from rpms. Our first compile produced the error you show below.
 # ./configure --with-linux=/lib/modules/`uname -r`/build
 # make rpms

 To get the lustre installation to use our new OFED, we tried this and it
 worked.

 # ./configure --with-o2ib=/usr/src/ofa_kernel
 --with-linux=/lib/modules/`uname -r`/build
 # make rpms

 RPMs showed up in /usr/src/redhat/RPMS/x86_64, and we are using
 lustre-1.8.5*.rpm and lustre-modules-*.rpm on our client machines.

 Albert


 On Feb 19, 2011, at 8:34 AM, Arya Mazaheri wrote:

  Hi,
 I have installed lustre client packages on a client node. But it doesn't
 mount the lustre file system from lustre server. It gets the following
 famous error:

 $ mount -t lustre 192.168.0.1:/lustre /mnt/lustre
 mount.lustre: mount 172.16.113.232:/lustre at /mnt/lustre failed: No such
 device
 Are the lustre modules loaded?
 Check /etc/modprobe.conf and /proc/filesystems
 Note 'alias lustre llite' should be removed from modprobe.conf


 As I was searching through the mailing list, I have noticed that lustre.ko
 should be present in this directory:
 /lib/modules/2.6.18-194.17.4.el5/kernel/fs/lustre/lustre.ko

 My current kernel is 2.6.18-194.17.4.el5. but lustre.ko is in
 2.6.18-164.11.1.el5 instead. So I guessed that this may be the source of
 problem.

 Any ideas?

 Thanks
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Kernel Panic error after lustre 2.0 installation

2011-02-18 Thread Arya Mazaheri
Ww! Thanks for your suggestion. The only thing needed to do is to make
the 'arcmsr.c' and 'arcmsr.h' and finally make the ram disk.
Now everything is working smoothly...

Thanks again... ;)

On Fri, Feb 18, 2011 at 1:16 AM, Kevin Van Maren kevin.van.ma...@oracle.com
 wrote:

 Yep.  All you have to do is rebuild the driver for the Lustre kernel.

 First, bring the system back up with the non-Lustre kernel.



 See the bottom of the readme:

   # cd /usr/src/linux/drivers/scsi/arcmsr
   (suppose /usr/src/linux is the soft-link for
 /usr/src/kernel/2.6.23.1-42.fc8-i386)
   # make -C /lib/modules/`uname -r`/build CONFIG_SCSI_ARCMSR=m SUBDIRS=$PWD
 modules
   # insmod arcmsr.ko

 Except instead of uname -r substitute the lustre kernel's 'uname -r', as
 you want to build for the Lustre kernel.  Be sure you have the Lustre
 kernel-devel RPM installed.

 Note that the insmod will not work (you already have it for the running
 kernel, and the one you built for the Lustre kernel will not work).  You
 will need to rebuild the initrd for the Lustre kernel (see the other
 instructions in the readme, using the Lustre kernel).

 Kevin


 Arya Mazaheri wrote:

 The driver name is arcmsr.ko and I extracted it from driver.img included
 in RAID controller's CD. The following text file may clarify better:


 ftp://areca.starline.de/RaidCards/AP_Drivers/Linux/DRIVER/RedHat/FedoraCore/Redhat-Fedora-core8/1.20.0X.15/Intel/readme.txt

 Please tell me, if you need more information about this issue...

 On Thu, Feb 17, 2011 at 11:33 PM, Brian J. Murrell 
 br...@whamcloud.commailto:
 br...@whamcloud.com wrote:

On Thu, 2011-02-17 at 23:26 +0330, Arya Mazaheri wrote:
 Hi there,

Hi,

 Unable to access resume device (LABEL=SWAP-sda3)
 mount: could not find filesystem 'dev/root'
 setuproot: moving /dev failed: No such file or directory
 setuproot: error mounting /proc: No such file or directory
 setuproot: error mounting /sys: No such file or directory
 swirchroot: mount failed: No such file or directory
 Kernel Panic - not syncing: Attempted to kill init!

 I have no problem with the original kernel installed by centos. I
 guessed this may be related to RAID controller card driver which may
 not loaded by the patched lustre kernel.

That seems like a reasonable conclusion given the information
available.

 so I have added the driver into the initrd.img file.

Where did you get the driver from?  What is the name of the driver?

 But it didn't solve the problem.

Depending on where it came from, yes, it might not.

 Should I install the lustre by building the source?

That may be required, but not necessarily required.  We need more
information.

b.



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
mailto:Lustre-discuss@lists.lustre.org

http://lists.lustre.org/mailman/listinfo/lustre-discuss


 


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Running MGS and OSS on the same machine

2011-02-18 Thread Arya Mazaheri
Hi again,
I have planned to use one server as MGS and OSS simultaneously. But how can
I format the OSTs as lustre FS?
for example, the line below tells the ost which it's mgsnode is at
192.168.0.10@tcp0:
mkfs.lustre --fsname lustre --ost --mgsnode=192.168.0.10@tcp0 /dev/vg00/ost1

But, now mgsnode is the same machine. I tried to put localhost instead the
ip address. but I didn't work.

What shoud I do?

Arya
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Kernel Panic error after lustre 2.0 installation

2011-02-17 Thread Arya Mazaheri
Hi there,
I have got an error after installing lustre 2.0 on the MGS server with RAID
controller card.
The server OS is centOS 5.4 x86_64 and has 1.2TB storage which has
configured by RAID 1+0.
After installing lustre rpm packages and rebooting machine, I face with the
errors below at linux startup:

Unable to access resume device (LABEL=SWAP-sda3)
mount: could not find filesystem 'dev/root'
setuproot: moving /dev failed: No such file or directory
setuproot: error mounting /proc: No such file or directory
setuproot: error mounting /sys: No such file or directory
swirchroot: mount failed: No such file or directory
Kernel Panic - not syncing: Attempted to kill init!

I have no problem with the original kernel installed by centos. I guessed
this may be related to RAID controller card driver which may not loaded by
the patched lustre kernel. so I have added the driver into the initrd.img
file. But it didn't solve the problem.

Should I install the lustre by building the source? Or any other clue to
this problem?

Thanks in advance...
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Kernel Panic error after lustre 2.0 installation

2011-02-17 Thread Arya Mazaheri
The driver name is arcmsr.ko and I extracted it from driver.img included
in RAID controller's CD. The following text file may clarify better:

ftp://areca.starline.de/RaidCards/AP_Drivers/Linux/DRIVER/RedHat/FedoraCore/Redhat-Fedora-core8/1.20.0X.15/Intel/readme.txt

Please tell me, if you need more information about this issue...

On Thu, Feb 17, 2011 at 11:33 PM, Brian J. Murrell br...@whamcloud.comwrote:

 On Thu, 2011-02-17 at 23:26 +0330, Arya Mazaheri wrote:
  Hi there,

 Hi,

  Unable to access resume device (LABEL=SWAP-sda3)
  mount: could not find filesystem 'dev/root'
  setuproot: moving /dev failed: No such file or directory
  setuproot: error mounting /proc: No such file or directory
  setuproot: error mounting /sys: No such file or directory
  swirchroot: mount failed: No such file or directory
  Kernel Panic - not syncing: Attempted to kill init!
 
  I have no problem with the original kernel installed by centos. I
  guessed this may be related to RAID controller card driver which may
  not loaded by the patched lustre kernel.

 That seems like a reasonable conclusion given the information available.

  so I have added the driver into the initrd.img file.

 Where did you get the driver from?  What is the name of the driver?

  But it didn't solve the problem.

 Depending on where it came from, yes, it might not.

  Should I install the lustre by building the source?

 That may be required, but not necessarily required.  We need more
 information.

 b.



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




signature.asc
Description: PGP signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss