Re: [Lustre-discuss] Lustre and iSCSI

2009-08-04 Thread Klaus Steden

Hi David,

I did some experiments last year with Lustre 1.6.x and a Dell iSCSI
enclosure. It was a little slow (proof of concept mainly) due to sharing MDT
and OST traffic on a single GigE strand, but as long as the operating system
presents a valid block device, Lustre works fine.

hth
Klaus

On 7/31/09 11:13 AM, Cliff White cliff.wh...@sun.com etched on stone
tablets:

 David Pratt wrote:
 Hi. I am exploring possibilities for pooled storage for virtual
 machines. Lustre looks quite interesting for both tolerance and speed. I
 have a couple of basic questions:
 
 1) Can Lustre present an iSCSI target
 
 Lustre doesn't present target, we use targets, and we should work fine
 with iSCSI. We don't have a lot of iSCSI users, due to performance
 concerns.
 
 2) I am looking at physical machines with 4 1TB 24x7 drives in each. How
 many machines will I need to cluster to create a solution with provide a
 good level of speed and fault tolerance.
 
 'It depends' - what is a 'good level of speed' for your app?
 
 Lustre IO scales as you add servers. Basically, if the IO is big enough,
 the client 'sees' the bandwidth of multiple servers.  So, if you know
 the  bandwidth of 1 server (sgp_dd or other raw IO tools helps) then
 your total bandwidth is going to be that figure, times the number of
 servers. This assumes whatever network you have is capable of sinking
 this bandwidth.
 
 So, if you know the IO you need, and you know the IO one server can
 drive, you just divide the one by the other.
 
 Fault tolerance at the disk level == RAID.
 Fault tolerance at the server level is done with shared storage
 failover, using linux-ha or other packages.
 hope this helps,
 cliffw
 
 Many thanks.
 
 Regards,
 David
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mounting via cname

2009-06-03 Thread Klaus Steden

I believe so ... I was using host names with Lustre 1.6.2 and had no issues.
Just make sure DNS/name resolution is working properly and consistently.

Klaus

On 6/2/09 9:04 AM, Robert Olson ol...@mcs.anl.gov etched on stone
tablets:

 I'm curious if it's possible / wise to configure client mounts to use
 a cname instead of an IP address (server-cname@tcp:/fsname).
 
 thanks,
 --bob
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-08 Thread Klaus Steden

Hi Michael,

Just want to throw my two cents in with Isaac's posting, as I spent a great
deal of time working with these kinds of features over the course of the
last two years.

In my experience with Lustre 1.6, in the case where multiple NICs were
available, Lustre will default to using the first one exclusively until it
detects a failure and then switches over to the next available. It will also
not distinguish between different NIC types, i.e. IB, GigE, etc., will be
picked based on discovery order not speed or some other metric.

I didn't even touch Lustre bonding, because as you both remark, it's a
little convoluted. I spent a lot of time experimenting with Lustre over
802.3ad (LACP) aggregated links using the Linux bonding driver, and my OSS
nodes produced very respectable to very good numbers. Across a pair of OSS
nodes each with 2 x GigE NICs, I was able to sustain ~ 350 MB/s write speed
when running sandbox tests, so it appears that although the LACP driver
doesn't balance a connection across multiple links (i.e. a 2 GigE LACP bond
doesn't give you 2 Gbit throughput for a single network I/O), the Lustre
implementation somehow manages to squeeze more data through the pipe.

To get it set up, simply configure NIC bonding of whatever flavour suits
your needs on the OS nodes, and then assign 'bond0' to your tcp networks,
something like this:

options lnet networks=tcp0(bond0)

and you should be off to the races.

hth,
Klaus

On 5/7/09 12:57 PM, Isaac Huang he.hu...@sun.com etched on stone
tablets:

 On Thu, May 07, 2009 at 02:50:13PM +0200, Michael Ruepp wrote:
 Hi there,
 ..
 I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0
 and 10.111.20.39-42 oss1
 
 Do I have to make modprobe.conf.local look like this to force lustre
 to use all four interfaces parallel:
 
 options lnet networks=tcp0(eth0,eth1,eth2,eth3)
 Because on Page 138 the 1.8 Manual says:
 Note ? In the case of TCP-only clients, the first available non-
 loopback IP interface
 is used for tcp0 since the interfaces are not specified. 
 
 Correct.
 
 or do I have to specify it like this:
 options lnet networks=tcp
 Because on Page 112 the lustre 1.6 Manual says:
 Note ? In the case of TCP-only clients, all available IP interfaces
 are used for tcp0
 
 Wrong. It needs to be updated as well, Sheila?
 
 ..
 My goal ist to let lustre utilize all four Gb Links parallel. And my
 Lustre Clients are equipped with two Gb links which should be utilized
 by the lustre clients as well (eth0, eth1)
 
 Or is bonding the better solution in terms of performance?
 
 I don't have any performance comparisons between the two approaches,
 but I'd suggest to go with Linux bonding instead (let's call the
 tcp0(eth0,...ethN) approach Lustre bonding), because:
 1. With Lustre bonding it's rather tricky to get routing right,
 especially when all NICs reside in a same IP subnet. Lustre tcp
 network driver, as its name suggests, works at TCP layer and the
 decision as to which outgoing interface to use depends on Linux IP
 layer routing. When all NICs live in a same IP subnet, it's very
 possible that all outgoing packets would go through the interface of
 the 1st route in the Linux routing table, unless some tweaking has
 been done to also take source IPs into account. Incoming packets could
 also come in via unexpected NICs, depending on your settings in
 /proc/sys/net/ipv4/conf/*/arp_ignore and your ethernet topology.
 
 2. Linux bonding does a good job of detecting link status via either
 the ARP monitor or the MII monitor, but no such mechanism exists in
 Lustre bonding.
 
 In fact, the Lustre bonding is an officially obsoleted feature if I
 remember correctly.
 
 Thanks,
 Isaac
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Launch of the redesigned lustre.org wiki

2009-04-16 Thread Klaus Steden

Looks good! Glad to see Sun is keeping this resource online.

Klaus

On 4/16/09 10:21 AM, Sheila Barthel sheila.bart...@sun.com etched on
stone tablets:

 The Lustre Group is pleased to announce the launch of the redesigned
 lustre.org wiki, which includes a new top-level design, added pages, and
 an updated color scheme and logo. Check it out at: http:/www.lustre.org.
 Over the next few months, more pages will be added and content on
 existing pages will be refreshed.
 
 We are interested in your feedback on the look and usability of the
 redesigned site, and ways we can improve it. To submit comments, use the
 feedback link on the home page.
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Recovery fails if clients not connected

2009-01-21 Thread Klaus Steden

Hi Roger,

I believe you can connect the OSSs once the MDS has booted, and in fact, I¹m
pretty sure that the five in the Œconnected_clients: 0/5¹ are in fact your
OSS nodes. Each OST maintains a connection to the MDS while the file system
is mounted, so they will be included in the connection count on the MDS.

However, regardless of the state ‹ if your MDS is online and the MDT is
mounted, you can start up the OSS nodes and corresponding OSTs at any time;
clients attempting to make transactions will have their I/O operations block
(or fail, depending on the MDS config) until the missing nodes come back
online.

hth,
Klaus


On 1/20/09 3:05 PM, Roger Spellman ro...@terascala.com etched on stone
tablets:

 I have 2 MDS, configured as an active/standby pair.  I have 5 OSTs that are
 NOT active/standby.  I
 have 5 clients.
  
 I am using Lustre 1.6.5, due to bug 18232
 https://bugzilla.lustre.org/show_bug.cgi?id=18232  which only affects 1.6.6.
 Using Lustre 1.6.5, when I
 reset my active node, the standby takes over.  This is quite reliable.
  
 Today, I did the following in this order:
   Unmounted all the clients
   Rebooted all the clients
   Stopped Linux HA from running
   Unmounted the OSTs
   Unmounted the MDS
   Rebooted the OSTs
   Rebooted both MDSes
  
 When the MDSes started up, Linux HA chose one to be active.  That system
 mounted the MDT.
  
 I looked at the file  /proc/fs/lustre/mds/tacc-MDT/recovery_status, and it
 showed:
  
 [r...@ts-tacc-01 ~]# cat /proc/fs/lustre/mds/tacc-MDT/recovery_status
 status: RECOVERING
 recovery_start: 0
 time_remaining: 0
 connected_clients: 0/5
 completed_clients: 0/5
 replayed_requests: 0/??
 queued_requests: 0
 next_transno: 17768
  
  
 * Note that recovery_start and time_remaining are both zero. *
  
 I waited a several minutes, and this file was the same.
  
 I was waiting for recovery to complete before trying to mount the OSTs.
 However, it appears that
 this would never occur!
  
 Does this look like a bug?
  
 ---
  
 I format my MDT using the following command.  The command is run from
 10.2.43.1, and the failnode
 is 10.2.43.2:
  
 mkfs.lustre --reformat --fsname tacc --mdt --mgs --device-size=1000
 --mkfsoptions=' -m 0 -O
 mmp' --failnode=10.2.4...@o2ib0 /dev/sdb
  
 I format the OSTs using the following command:
  
 /usr/bin/time -p mkfs.lustre --reformat --ost --mkfsoptions='-J
 device=/dev/sdc1 -m 0' --fsname
 tacc --device-size=4 --mgsnode=10.2.4...@o2ib0
 --mgsnode=10.2.4...@o2ib0 /dev/sdb
  
 I mount the clients using:
  
 mount -t lustre 10.2.4...@o2ib:10.2.4...@o2ib:/tacc /mnt/lustre
  
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre locking

2009-01-18 Thread Klaus Steden

Hi Mag,

If I'm not mistaken, only qmaster writes to the DB, the execd process relays
queries through a listening daemon using RPC on the qmaster host which
speaks BDB on the back end.

hth,
Klaus

On 1/16/09 4:22 PM, Mag Gam magaw...@gmail.com etched on stone tablets:

 Thanks Andreas.
 
 We also run Sun Grid Engine for our engineering department. Out setup
 is basically like this:
 
 Master -- QMASTER (1 server)
 Slaves -- EXECD (300 servers)
 
 
 They are share a filesystem which is running of Lustre. Grid Engine
 has a Berkeley Database as its backend. I am wondering if I need to
 change all of my slaves and master to distributed locking or local
 locking.
 
 Any thoughts?
 
 TIA
 
 On Fri, Jan 16, 2009 at 10:10 AM, Andreas Dilger adil...@sun.com wrote:
 On Jan 16, 2009  00:52 -0500, Mag Gam wrote:
 At our university many of our students and professors use SQLite and
 Berkley DB for their projects. Probally, BDB more than SQLite. Would I
 we need to have Lustre mounted up a certain way to avoid corruption
 via file locking? Any thoughts about this?
 
 That depends on how they use it.  Mounting Lustre with -o localflock
 will provide locking on a single node without any performance impact,
 which is enough for single-node databases like SQLite and Berkley DB.
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] tcp0 for maximum effect

2009-01-12 Thread Klaus Steden

Hi folks,

Lustre doesn't support any inherent link aggregation, it simply utilizes the
device node the OS presents. If this is a bonded NIC, it will use it no
problem, but the underlying device driver takes care of load balancing and
distribution.

I've used Lustre 1.6.x quite successfully with load-balanced 802.3ad
configurations; in some of my tests I was able to get about 350 MB/s
aggregate sustained across two OSS nodes with 2 x GigE bonded each.

802.3ad link aggregation is a standard NIC bonding protocol, and is
supported on all good quality L3 switches and by vendors like Cisco,
Foundry, Extreme, and Juniper.

cheers,
Klaus

On 1/11/09 11:37 AM, Peter Grandi pg_...@lus.for.sabi.co.uk etched on
stone tablets:

 I have two boxes that have this:
 
 [r...@lustrethree Desktop]# ifconfig
 eth0  Link encap:Ethernet  HWaddr 00:1B:21:2A:17:76
   inet addr:192.168.0.19  Bcast:192.168.0.255  Mask:255.255.255.0
   RX bytes:120168321 (114.6 MiB)  TX bytes:5300070662 (4.9 GiB)
 [ ... ]
 eth1  Link encap:Ethernet  HWaddr 00:1B:21:2A:1C:DC
   inet addr:192.168.0.20  Bcast:192.168.0.255  Mask:255.255.255.0
   RX bytes:55673426 (53.0 MiB)  TX bytes:846 (846.0 b)
 [ ... another 4 like that, 192.168.0.21-24 ... ]
 
 That's a very bizarre network configuration, you have 5
 interfaces on the same subnet (presumably all plugged into the
 same switch) with no load balancing, as all the outgoing traffic
 goes via 'eth0'.
 
 You have some better alternatives:
 
 * Use bonding (if the switch supports to ties together the 5
   interfaces as one virtual interface with a single IP address.
 
 * Use something like 'nexthop' routing (and a couple other
   tricks) to split the load across the several interfaces. This
   is easier for the outgoing traffic than the incoming traffic,
   but it seems you have a lot more outgoing traffic.
 
 * Use 1 10Gb/s card per server and a 1Gb/s switch with 2 10Gb/s
   ports. 10Gb/s cards and switches have fallen in price a lot
   recently (check Myri.com), and a server that can do several
   hundred MB/s really deserves a nice 10Gb/s interface.
 
 IIRC 'lnet' has something like bonding built in, but I am not
 sure that it handles multiple addresses in the same subnet well.
 
 Would it be better to have these two boxes as OSS's or as MDT
 or MGS machines?  Currently they are configured 1 as a MGS and
 the other as the MDT.
 
 If these are the two servers with gigantic disk arrays, I'd have
 on each both MDS and OSS. Possibly with the OSTs replicated
 across both machines in an active/passive configuration.
 
 The question is does LNET use the available tcp0 connections
 different from the OSS perspective as opposed to the MDT or
 MGS perspective?
 
 Not sure that the question means.
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Using ib0 and tcp

2008-11-13 Thread Klaus Steden

Hi Joseph,

Lustre attaches itself to the first listed/available interface, in your
case, ib0.

I don't remember off the top of my head if there is (or what the right way
is) to do what you want to do.

hth,
Klaus

On 11/13/08 4:18 PM, Joseph Farran [EMAIL PROTECTED] etched on
stone tablets:

 Howdy.
 
 Newbie here. I have Lustre 1.6.6. setup and running. For the
 network, we have eth0 and eth1 bonded (channel bonding) and we also have
 Infiniband (SilverStorm).
 
  /etc/modprobe.conf file looks like this:
 alias bond0 bonding
 options bond0 mode=balance-alb miimon=100
 options lnet networks=o2ib(ib0),tcp(bond0)
 
 Lctl on one of my OSS's shows my network as:
 # lctl list_nids
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 
 How can I get Lustre to use both ib0 and bond0 (eth0 / eth1) for the
 data nework? Currently it only uses Infiband (ib0) and not bond0.
 
 Thanks,
 Joseph
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Network aliasing and HA

2008-09-25 Thread Klaus Steden

Hi Timh,

If you're using Linux-HA, you can configure how quickly failover takes
place. I have mine set to 90 seconds before the primary is marked dead and
the secondary takes over.

When this occurs, any Lustre transactions not yet in flight will block until
the ones that were in progress at the time of the failure have either had a
chance to complete or have timed out.

I'm not sure how to modify Lustre-specific settings for recovery time,
though.

cheers,
Klaus


On 9/25/08 1:54 PM, Timh Bergström [EMAIL PROTECTED]did etch on
stone tablets:

 To follow up on this matter, i've currently set ha/drbd as suggested,
 formatted the ost's with double mgsserver directives and also mounted
 with double addresses on the clients, as [EMAIL PROTECTED]:[EMAIL 
 PROTECTED]:/fsname -
 though, if i fail mgs/mdt 1 it does not recover (in a resonable time),
 what kinds of tuning/settings will affect this?
 
 //Timh
 
 2008/9/23 Timh Bergström [EMAIL PROTECTED]:
 Thank you, that's the path i've taken from the last message on this
 list, as I misunderstood some of the drbd/ha setups before. However,
 using 4 mgsnode paths, is that recommended or should I use one
 mgspath per node and use the other as some sort of manual failover?
 
 Regards,
 Timh
 
 2008/9/23 Kevin Van Maren [EMAIL PROTECTED]:
 Note that you do not normally use IP takeover with Lustre/Heartbeat: you set
 the failover IP addresses with the mkfs.lustre command, and Lustre
 reconnects to the _other_ address when it is disconnected.
 
 In your case, you would have 2 fixed addresses for each node (w/o heartbeat
 - do NOT use the heartbeat virtual IP addresses), and specify both those
 failover NIDs (rather than just 1).
 
 Lustre1.6 is a bit different from a lot of HA/Heartbeat users: Lustre
 _knows_ about the multiple paths/addresses, and simply requires Heartbeat to
 ensure it is mounted on exactly one node in the failover pair: it does NOT
 rely on IP takeover for HA.
 
 Kevin Van Maren
 
 
 Timh Bergström wrote:
 
 2008/9/23 Brian J. Murrell [EMAIL PROTECTED]:
 
 
 On Tue, 2008-09-23 at 15:06 +0200, Timh Bergström wrote:
 
 
 Hi,
 
 
 Hi,
 
 
 Hi again, and thanks for the quick reply!
 
 
 
 My (current) modprobe:
 
 options lnet networks=tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50
 
 
 This syntax is incorrect.  For some examples of multi-homed
 configurations see the manual at
 
 http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigu
 rations.html#50642998_20213
 
 
 Yes that's the link i've been consulting, perhaps im not looking hard
 enough.
 
 
 
 This is the errors i get:
 LustreError: 10f-e: Error parsing
 'networks=tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50'
 
 
 When you specify networks because you specify the interfaces to use,
 you don't need to specify the ip address.  I think you are confusing the
 networks and ipnets options.
 
 
 The problem here exactly is that the physical interfaces is there, but
 not with the ip-addresses i want the mdt to listen on - the NIDs,
 they are added later through heartbeat as aliases (IPaddr2::10.4.21.50
 IPaddr2::10.4.22.50), but before mounting the mdt-resource (drbd).
 
 
 
 LustreError: 110-0: here...|-|
 LustreError: 4527:0:(events.c:707:ptlrpc_init_portals()) network
 initialisation failed
 (along with a bunch of errors since this module does not load)
  I've tried with tcp0(eth0:0) which fails with about the same error,
 i've tried tcp0(eth0,eth1) which gives me the wrong addresses (machine
 ones) but works.
 
 
 What is the topology exactly?  Are there two nics or one nic with two
 addresses?  Are the two nics on the same physical network or separate
 physical networks?
 
 
 eth0 and eth1 are physical interfaces, they have statically assigned
 ip's (for management, supervision etc), heartbeat then adds addresses
 to theese two interfaces if the node is primary.
 
 If it matters - eth0 and eth1 has separated physical paths to
 everything, this is because we want to survive a physical fail on the
 network before failing over to another physical server.
 
 As I read the manual, i format my OST's with more than one --mgsnode
 option, which in turn will make the OST know about both path's to
 the MDS/MGS server(s). As in, if first MGS does not work (physical
 network failure on side A) - try second (Physical side B).
 
 What we healthcheck on is the data/disks/server hardware which will
 tell heartbeat to fail over to server 2 which takes over network path
 A and network path B (on 10.4.[21,22].50), and the OST's/clients
 should continue working without noticing.
 
 
 
 b.
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Timh Bergström
 System Administrator
 Diino AB - www.diino.com
 :wq
 
 
 

___
Lustre-discuss mailing list

Re: [Lustre-discuss] Lustre 1.6.5.1 on the box with 2 interfaces

2008-09-24 Thread Klaus Steden

Hi Lukas,

Do you want to use these interfaces together as a logical unit? The syntax
below would construct two separate LNET networks.

Additionally, you would need to qualify the mount path on the client side to
bind Lustre to a specific LNET instance, i.e. 'mount /mnt/lustre
[EMAIL PROTECTED]:/lustre', or else it will default to the first available
LNET instance. The order doesn't really matter in the module declaration,
though.

Looking at my OSS nodes, I can see entries in /proc/fs/lustre/devices, as
well as in the output of 'df'. Did you specify a file system type? Normally
when mounting either an OST or an MDT, you need to provide a '-t lustre'
argument to 'mount', or list that as the fstype in /etc/fstab. Otherwise,
I'm assuming it mounts the device as ext3, which works, but isn't usable by
Lustre.

hth,
Klaus

On 9/24/08 12:41 PM, Lukas Hejtmanek [EMAIL PROTECTED]did etch on
stone tablets:

 Hello,
 
 I have some difficulties to setup a server acting as metadata server and OST
 server. The box has two interfaces, eth0 and eth2. eth0 is the primary
 interface and has the ip which is associated with the hostname.
 
 However, I would like to use the eth2 interface for Lustre.
 In an ideal case, I would like to use Lustre on both interfaces. But for now,
 it would be sufficient to use just the non-primary one.
 
 I use
 options lnet networks=tcp0(eth2),tcp1(eth0)
 
 I created the metadata:
 /usr/local/lustre/sbin/mkfs.lustre --fsname=l_smaug2 --reformat --mdt --mgs
 /dev/Scratch_VG/Scratch_1
 
 I mounted the mdt:
 mount /dev/Scratch_VG/Scratch_1 /mnt/lustre/mdt
 
 Then I created the OST
 /usr/local/lustre/sbin/mkfs.lustre --fsname=l_smaug2 --reformat --ost
 [EMAIL PROTECTED] /dev/Scratch_VG/Scratch_2
 
 I mounted it:
 mount /dev/Scratch_VG/Scratch_2 /mnt/lustre/ost0
 
 However, cat /proc/fs/lustre/devices shows that no OST is attached at all.
 
 What am I missing?
 
 eth0 has a public address,
 eth2 has the 192.168.1.1 address.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST failover

2008-08-27 Thread Klaus Steden

Hi,

You interconnect OSS1 and OSS2, OSS2 and OSS3, OSS3 and OSS4, and OSS4 and
OSS1 together via an out-of-band communication fabric, install Linux-HA
heartbeat software to monitor mount points, and configure nodes to take over
each other¹s mounts in case of failure.

The implementation I have here uses only single point-to-point links between
OSS pairs (there is no crossover between failover partners) but the princple
is the same; your implementation should work pretty much the same way, with
a few minor differences. I¹m assuming you¹ve got unused NICs on your OSSes,
so if you connect those to a standard GigE edge switch protected by UPS, you
can create an HA fabric which will give you the coverage you need.

Lustre itself doesn¹t support failover natively, you need to implement any
failover using a third-party package; most of us out there use Linux-HA to
great effect.

cheers,
Klaus

On 8/27/08 2:30 AM, sd a [EMAIL PROTECTED]did etch on stone tablets:

 Hi the list,
 
 I have 4 OSS servers , each OSS has 1 gigabit network interface and 2 OSTs,
 first OST use /dev/sdb1, second OST use /dev/sdb2.
 
 OSS1: OST1   OST2
 
 OSS2: OST3  OST4
 
 OSS3: OST5  OST6
 
 OSS4: OST7  OST8
 
 
 Each file is striped across 4 OSTs : OST1,2,3,4
 
 And I want to setup failover as:
 
 OST2 failover with OST3
 
 OST4 failover with OST5
 
 OST6 failover with OST7
 
 OST8 failover with OST1
 
 
 How do I do?
 
 Thanks.
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] multiple NLI's/interfaces to listen on?

2008-08-25 Thread Klaus Steden

Hi Robert,

You'd need to adjust the lnet options line in /etc/modprobe.conf to force
Lustre to bind to all your NICs, I believe it binds to the first one
available unless instructed otherwise.

Try this:

-- cut --
options lnet networks=tcp0(eth1),tcp1(eth2),tcp3(eth3)
-- cut --

which will serve up the Lustre volume on each of your interfaces via three
different LNET networks. You may be able to organize them all under a single
LNET network, but without the right routing in place, it gets very
complicated.

hth,
Klaus

On 8/25/08 5:23 AM, Robert Hassing [EMAIL PROTECTED]did etch on stone
tablets:

 Hi All,
 
 Got this little problem wich is driving me nuts.
 
 I have a small network with a combined Lustre MDT/MGS/OST server and 3
 clients
 
 I am trying to let the server listen on 3 different NIC's in a seperate
 network.
 
 e.g.:
 
 eth1: 10.1.34.50
 eth2: 10.1.35.50
 eth3: 10.1.36.50
 
 I want to connect each client directly to one of the interfaces but when
 trying that only eth1 works.
 
 from dmesg:
 Lustre: Added LNI [EMAIL PROTECTED] [8/256]
 Lustre: Accept secure, port 988
 
 it appears the server only is listening on eth1 and not the other NIC's
 Is there a possibility to let the other NIC's also listen so i can use them?
 
 Kind regards
 Robert H

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Log file opens/reads/etc?

2008-08-19 Thread Klaus Steden

Hello Andreas,

My apologies for not explaining myself. :-)

The trusted computing standards I'm talking about (there are a few, some
good, some not so much) are effectively based on US Department of Defense C2
(aka Orange Book) security standards:

http://en.wikipedia.org/wiki/TCSEC

The best audit trail implementations I've seen are based on Sun's BSM,
adopted and implemented by both FreeBSD and Apple in their auditing code.

http://docs.sun.com/app/docs/doc/806-1789

http://www.apple.com/support/security/commoncriteria/

http://www.freebsd.org/doc/en/books/handbook/audit.html

BSM-based auditing systems define classes of system calls, users, and groups
of users that are of interest -- file create, file read, login, socket
opens, people in the 'wheel' group, etc. -- and record a realtime log of
events as they occur within the kernel. This information is stored in a
packed binary format, and can be exploded into ASCII for parsing and
analysis using built-in tools, allowing you to establish a complete audit
trail of the operations of interest.

How Lustre would implement this I'm not sure, since it's object-based and
BSM auditing records file names ... but the idea is important, especially in
digital media where auditability keeps lawyers from the MPAA and the big
studios at bay.

cheers,
Klaus

On 8/18/08 9:13 PM, Andreas Dilger [EMAIL PROTECTED]did etch on stone
tablets:

 On Aug 18, 2008  17:18 -0700, Klaus Steden wrote:
 Hrm. Who should I contact to find out more, then?
 
 Nathan is working on the Changelog code, but I think the main issue
 is that neither of us know what compliant with Trusted Computing standards
 really means.
 
 On 8/18/08 4:44 PM, Andreas Dilger [EMAIL PROTECTED]did etch on stone
 tablets:
 
 On Aug 18, 2008  12:53 -0700, Klaus Steden wrote:
 Will this be compliant with Trusted Computing standards? i.e. will it be
 possible to use this information for auditing purposes?
 
 I don't know enough about that to make a useful answer, sorry.
 
 On 8/18/08 3:43 AM, Andreas Dilger [EMAIL PROTECTED]did etch on stone
 tablets:
 
 On Aug 09, 2008  05:06 -0700, daledude wrote:
 Is there is a tool that shows what files are being accessed? Sort of
 like inotify, but not inotify? I'd like to compile file access
 statistics to try and balance the most accessed files across the OST's
 better.
 
 There is a feature being worked on for Lustre 2.0 called Changelogs
 that will allow recording all files that are modified.
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Log file opens/reads/etc?

2008-08-18 Thread Klaus Steden

Hi Andreas,

Will this be compliant with Trusted Computing standards? i.e. will it be
possible to use this information for auditing purposes?

thanks,
Klaus

On 8/18/08 3:43 AM, Andreas Dilger [EMAIL PROTECTED]did etch on stone
tablets:

 On Aug 09, 2008  05:06 -0700, daledude wrote:
 Is there is a tool that shows what files are being accessed? Sort of
 like inotify, but not inotify? I'd like to compile file access
 statistics to try and balance the most accessed files across the OST's
 better.
 
 There is a feature being worked on for Lustre 2.0 called Changelogs
 that will allow recording all files that are modified.
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Log file opens/reads/etc?

2008-08-18 Thread Klaus Steden

Hi Andreas,

Hrm. Who should I contact to find out more, then?

thanks,
Klaus

On 8/18/08 4:44 PM, Andreas Dilger [EMAIL PROTECTED]did etch on stone
tablets:

 On Aug 18, 2008  12:53 -0700, Klaus Steden wrote:
 Will this be compliant with Trusted Computing standards? i.e. will it be
 possible to use this information for auditing purposes?
 
 I don't know enough about that to make a useful answer, sorry.
 
 On 8/18/08 3:43 AM, Andreas Dilger [EMAIL PROTECTED]did etch on stone
 tablets:
 
 On Aug 09, 2008  05:06 -0700, daledude wrote:
 Is there is a tool that shows what files are being accessed? Sort of
 like inotify, but not inotify? I'd like to compile file access
 statistics to try and balance the most accessed files across the OST's
 better.
 
 There is a feature being worked on for Lustre 2.0 called Changelogs
 that will allow recording all files that are modified.
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mounting an OST from another node attached to a fibre channel switch

2008-08-15 Thread Klaus Steden

Yes.

The MGS contains all the configuration information for the file system it
serves, including the locations and network paths for all the MDS and OSS
nodes within the file system.

What you want to do is tell the MGS that your OSS(es) have been moved to new
addresses and have it update its records. At that point, your file system
will be usable in the new configuration.

hth,
Klaus

On 8/15/08 2:18 PM, Ron [EMAIL PROTECTED]did etch on stone tablets:

 Which section of which manual? Please.  I.e are you talking about:
 Part No. 820-3681-10
 Lustre manual version: Lustre_1.6_man_v1.10
 December 2007
 
 4.2.3.2 Running the Writeconf Command
 and/or 4.2.3.3 Changing a Server NID
 
 Our MGS is not changing nodes, but the OSTs are.
 Is there really just one simple MGS only operation?
 Those sections do not quite fit (to my way of thinking, which is
 in the process of being adjusted :}
 
 Thanks,
 Ron
 
 On Aug 14, 4:33 pm, Klaus Steden [EMAIL PROTECTED] wrote:
 Yes. There is an entry in the manual on this topic. You'll have to stop
 Lustre and update the MGS configuration, but it's a pretty quick operation.
 
 cheers,
 Klaus
 
 On 8/14/08 2:29 PM, Ron [EMAIL PROTECTED]did etch on stone tablets:
 
 Hi,
 We have set up a couple of test systems where the OSTs are mounted on
 the same node as the MDT.   The OSTs are LUNS on a SATA Beast
 controller accessible from multiple systems attached to a fibre
 channel switch,. We would like to umount an OST from the MDT system
 and mount it on another system. We've tried doing this and even though
 there is network traffic between the new system and the mds system,
 the mds system seems to be ignoring the OST mount.  Can the an OSTs
 OSS change?
 Thanks,
 Ron
 ___
 Lustre-discuss mailing list
 [EMAIL PROTECTED]
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 [EMAIL PROTECTED]://lists.lustre.org/mailman/listinfo/lustr
 e-discuss
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mounting an OST from another node attached to a fibre channel switch

2008-08-14 Thread Klaus Steden

Yes. There is an entry in the manual on this topic. You'll have to stop
Lustre and update the MGS configuration, but it's a pretty quick operation.

cheers,
Klaus

On 8/14/08 2:29 PM, Ron [EMAIL PROTECTED]did etch on stone tablets:

 Hi,
 We have set up a couple of test systems where the OSTs are mounted on
 the same node as the MDT.   The OSTs are LUNS on a SATA Beast
 controller accessible from multiple systems attached to a fibre
 channel switch,. We would like to umount an OST from the MDT system
 and mount it on another system. We've tried doing this and even though
 there is network traffic between the new system and the mds system,
 the mds system seems to be ignoring the OST mount.  Can the an OSTs
 OSS change?
 Thanks,
 Ron
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] NFS and lustre

2008-08-06 Thread Klaus Steden

Hello Mag,

Some people on-list have tried it before, but it generally performs poorly.
I believe HP contributed some optimizations, but the consensus as recently
as the start of this year was that it will generally suck. Check the list
archives for more information, I believe CIFS was a better performer.

If you're exporting to Linux clients, going with native Lustre clients is
definitely a better option than using NFS (although you may not be doing
so).

cheers,
Klaus

On 8/5/08 7:25 PM, Mag Gam [EMAIL PROTECTED]did etch on stone tablets:

 Is there a guide for Lustre and NFS? At our university, we have a
 lustre server which also exports NFS data. We are using Centos, and I
 was wondering if there are any tuning and best practices guides
 available.
 
 TIA
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Does how to know OST to have data?

2008-08-06 Thread Klaus Steden

Hi Johnyla,

Use 'lfs getstripe file', i.e.

root# lfs getstripe /mnt/lustre/testfile1
OBDS:
0: lustre-OST_UUID ACTIVE
1: lustre-OST0001_UUID ACTIVE
/mnt/lustre/testfile1
obdidx   objid  objidgroup
 0  3157260x4d14e0
root#

This tells me that 'testfile1' is stored entirely on OST0.

good luck,
Klaus

On 8/6/08 6:09 AM, Johnlya [EMAIL PROTECTED]did etch on stone tablets:

 Does how to know OST to have data when mount lustre OST?
 My disk is FCSAN.
 Thank you!
 
 [EMAIL PROTECTED] ~]# uname -a
 Linux OSS2_MASTER 2.6.9-67.0.7.EL_lustre.1.6.5smp #1 SMP Mon May 12
 22:02:50 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Klaus Steden

Hi Brock,

I've been using Sun X2200s with Lustre in a similar configuration (IPMI,
STONITH, Linux-HA, FC storage) and haven't had any issues like this
(although I would typically panic the primary node during testing using
Sysrq) ... is the behaviour consistent?

Klaus

On 7/31/08 1:57 PM, Brock Palen [EMAIL PROTECTED]did etch on stone
tablets:

 I have two machines I am setting up as my first mds failover pair.
 
 The two sun x4100's  are connected to a FC disk array.  I have set up
 heartbeat with IPMI for STONITH.
 
 Problem is when I run a test on the host that currently has the mds/
 mgs mounted  'killall -9 heartbeat'  I see the IPMI shutdown and when
 the second 4100 tries to mount the filesystem it does a kernel panic.
 
 Has anyone else seen this behavior?  Is there something I am running
 into?  If I do a 'hb_takelover' or shutdown heartbeat cleanly all is
 well.  Only if I simulate heartbeat failing does this happen.  Note I
 have not tired yanking power yet, but I want to simulate a MDS in a
 semi dead state and ran into this.
 
 
 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover

2008-07-31 Thread Klaus Steden

netdump is indeed good for this, but you may have to take two or three
cracks at it ... it doesn't always dump the complete core image, and you
can't really do a whole lot with the incomplete version.

Klaus

On 7/31/08 5:50 PM, Kilian CAVALOTTI [EMAIL PROTECTED]did etch on
stone tablets:

 On Thursday 31 July 2008 17:22:28 Brock Palen wrote:
 Whats a good tool to grab this? Its more than one page long, and the
 machine does not have serial ports.
 
 If your servers do IPMI, you probably can configure Serial-over-LAN to get
 a console and capture the logs.
 
 But a way more convenient solution is netdump. As long as the network
 connection is working on the panicking machine, you should be able to
 transmit the kernel panic info, as well as a stack trace, to a
 netump-server, which will store it in a file.
 
 See http://www.redhat.com/support/wpapers/redhat/netdump/
 
 
 Cheers,

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] multiple OSTs accessing the same shared storage simultaneously?

2008-07-29 Thread Klaus Steden



On 7/29/08 6:44 AM, Aaron Knister [EMAIL PROTECTED]did etch on
stone tablets:

 
 Oh and to answer your question- an OST cannot be mounted twice simoltaneously.
 

... well, you ­can- mount it from two locations, you¹re just inevitably
going to corrupt the heck out of the volume in question ... so don¹t do
that. ;-)

I would also echo Aaron¹s remarks about running Lustre servers on VMs ...
big waste of compute power, and you will run into significant
latency/contention issues.

cheers,
Klaus
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Performance Data for Simultaneous Reads and Writes from Multiple Clients

2008-07-25 Thread Klaus Steden

Hi Daniel,

I don¹t believe so.

Various people have posted informal results from their own tests in the
field, but none have ever been formally collated. There are some rough
numbers on the Wikipedia page for CFS for GigE, IB, and 10GigE, but they
assume particular things about configuration, disk throughput, striping, OST
counts, etc.

Because Lustre is so versatile, information like this can be hard to nail
down ‹ running a GigE network with ATA drives is obviously not going to get
the best performance compared to 8 GB Fibre Channel, but both are equally
valid Lustre configurations.

hth,
Klaus

On 7/25/08 9:03 AM, Daniel Ferber [EMAIL PROTECTED]did etch on stone
tablets:

 
 I¹m working with someone who is modeling a customer system, and wants to
 partially model Lustre performance as part of that.
 
 What they would like is the following data, or similar, for a given network. I
 say given in that pick any network config and any given stripe size and any
 given file size (can be 250MB to 1GB), and then supply the following data:
 
 * From a single client, the read I/O start and stop time, or ³bandwidth²
 * From a single client, the write I/O start and stop time, or ³bandwidth²
 * Then introduce additional clients doing reads or writes and study the
 impact, as in one client writing and four clients reading simultaneously, and
 their start/stop times for IO, or bandwidth, and then one client reading and
 four clients writing simultaneously, and their individual IO bandwidths
 
 The objective really is to know how concurrent reads and writes impact Lustre
 performance. 
 
 Does this data exist, or would someone need to go do and collect this data?
 
 Thanks,
 Dan
 


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] multihomed OST's configuration

2008-07-10 Thread Klaus Steden

Hi Mario,

Lustre will, if not instructed otherwise, bind to all available NICs on the
system. I've used Lustre extensively with LACP aggregate groups, and it
performs quite well.

Configuring multiple NICs from the same host into the same VLAN is something
of a non-sensical configuration unless you're running some kind of bizarre
failover scenario, but if they're all going to the same switch, that's an
impossibility. This kind of configuration would also make ordinary TCP/IP
routing somewhat funky.

Use NIC bonding, and configure your switch as appropriate to do likewise.
Cisco, Foundry, Extreme, Juniper, Alcatel, Netgear and a number of others
all support LACP in their L3 edge switches, and it's a standard feature of
any core switch.

Once you've set up the switch and the OS, instruct Lustre to use the bond by
putting options lnet networks=tcp(bond0) in your /etc/modprobe.conf and it
will take care of the rest.

cheers,
Klaus

On 7/9/08 5:07 AM, mdavid [EMAIL PROTECTED]did etch on stone tablets:

 hi Brian
 I was mislead by what it says in the ops manual, 12.1 chapter
 
 Lustre can use multiple NICs without bonding. There is a difference in
 performance when Lustre uses multiple NICs versus when it uses bonding
 NICs.
 
 though here it says multiple NICS not multihomed configurations.
 
 Anyway I still don't know how to configure multiple NICS both from
 the point of view of the OS and Lustre
 note all the ethXX are in the same LAN, and connected to the same card
 in the switch
 if on the Lustre OST's I put
 options lnet networks=tcp(eth0,eth1,eth2,eth3)
 
 how is it configured each ethX
 in principle I would have a single IP for the server
 
 cheers
 
 Mario David
 
 On Jul 8, 1:25 pm, Brian J. Murrell [EMAIL PROTECTED] wrote:
 On Mon, 2008-07-07 at 03:13 -0700, mdavid wrote:
 hi list
 I am a new to lustre (1 week old) and this list.
 I have some Dell PE1950 servers with MD1000 enclosures (scientific
 linux 5 == RHEL5 x86_54) on them and lustre 1.6.5, with lustre patched
 kernels on them
 
 on a first try (indeed it was the second), I managed to have a lustre
 up and running OK, now
 
 each dell server has 4 times 1Gb interfaces, and I want to take profit
 from them all
 either I try bonding them, or go for multihomed (which is my first
 try)
 
 If what you want is to get the bandwidth of all 4 interfaces to the
 Lustre servers then you really do want bonding.
 
 Can you explain why you think you want multihoming vs. bonding?  Maybe
 I'm misunderstanding your goal.
 
 b.
 
  signature.asc
 1KDownload
 
 ___
 Lustre-discuss mailing list
 [EMAIL PROTECTED]://lists.lustre.org/mailman/listinfo/lustr
 e-discuss
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Correct way to unmount a lustre client?

2008-07-08 Thread Klaus Steden

Hello Hans-Juergen,

I usually try a combination of searching the process table for any running
tasks that are blocking the umount request and killing them, then doing a
'umount -k', and then using 'lctl modules |awk '{print $2}' |xargs rmmod -v'
to deactivate the kernel modules. This last step sometimes leads to
complaints that the network module is busy, so you have to tell it to shut
down LNET in order to completely unload the remaining modules.

However, I don't find this is a problem very frequently with my nodes ...
maybe you've got an uncooperative application that needs to be looked at?

cheers,
Klaus

On 7/8/08 9:30 AM, Hans-Juergen Schnitzer [EMAIL PROTECTED]did
etch on stone tablets:

 
 Hello,
 
 what is the correct way to unmount a lustre client
 when a simple umount filesystem responds with
 device is busy? A umount -l filesystem does
 unmount the filesystem however when I reboot the
 machine subsequently the shutdown process hangs.
 The last messages on console are:
 
 LustreError: 131-3: Received notification of device removal
 Please shutdown LNET to allow this to proceed
 
 Best regards,
 Hans Schnitzer
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Client and OST on Same Blade

2008-07-08 Thread Klaus Steden

You mean ... you have a FUSE client and a FUSE server for the same file
system running on the same node?

Klaus

On 7/8/08 11:38 AM, Kevin Fox [EMAIL PROTECTED]did etch on stone
tablets:

 FUSE did it some how, didn't they?
 
 On Mon, 2008-07-07 at 14:12 -0700, Brian J. Murrell wrote:
 On Mon, 2008-07-07 at 17:00 -0400, Roger Spellman wrote:
 Is there any problem having a Client and OST on the same blade?
 
 If by same blade you mean on the same kernel sharing the same memory
 pool, yes, the problems that there were still are.  They are inherent
 problems in which the client and OST share the same memory pool and an
 effort to relieve memory pressure (by the client) requires memory be
 available to the OST.  Of course if the client is experiencing memory
 pressure so is the OST and the OST might not get the memory it needs
 to
 help the client get the memory it needs since it's all one pool of
 memory.  Indeed it's a deadlock.
 
 I've discussed this with one of the more VM knowledgeable engineers
 than
 I and IIRC his feeling that here really is no fool-proof fix for this.
 Perhaps somebody more expert in the VM wants to explain further.
 
 b.
 
 
 
 
 plain text document attachment (ATT1031826.txt), ATT1031826.txt
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Failover Setup MDS/MDT

2008-06-26 Thread Klaus Steden

Hello Heiko,

If I'm not mistaken, 'MDS' refers to the metadata _server_, while 'MDT'
refers to the metadata _target_, i.e. the distinction is akin to that
between 'OSS' and 'OST'. The MDS is a server node; the MDT is the volume
where all the metadata for your volume is stored.

The handbook recommends an MDT size about 1-2% of the size of your total
volume size, i.e. if your total CFS volume is 10 TB, the MDT would be about
200 GB. This is fairly conservative, so you may want to err on the side of
growth by using a larger volume than that. If you can spare the disk, you're
certainly not sacrificing anything by over-provisioning your MDS.

hope this helps,
Klaus

On 6/25/08 5:29 AM, Heiko Schroeter [EMAIL PROTECTED]did
etch on stone tablets:

 Am Mittwoch, 25. Juni 2008 14:19:11 schrieb Brian J. Murrell:
 On Wed, 2008-06-25 at 07:36 +0200, Heiko Schroeter wrote:
 How can one determine the size for the MDT partition or is that the same
 as the MDS device ?
 (As far as i can see the MDT takes the DIR info etc. So it should be
 larger than the MDS.)
 
 An MDT is the device (i.e. the disk) that Lustre in an MDS (the server)
 uses to manage the metadata.  Maybe that clears it up?
 
 Well yes ok, but what about the sizes of the partitions ?
 
 The docs present an example calculating the inode space needed on an 'MDS'.
 (3.2.2 Calculating MDS Size)
 
 That what actually confuses me a bit.
 
 So when the MDS partitions holds the inodes of the lustre system what will be
 the partition size of the MDT device ?
 Or should it read 'MDT' partition size in the docs and the MDS partition size
 doesn't matter at all ?
 
 Thanks and Regards
 Heiko
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Confusion with failover

2008-06-26 Thread Klaus Steden

Hopefully this isn't a stupid question ... but have you considered using
Lustre 1.6 instead, if it's an option? It's much, much easier to work with.

As for failover itself ... Lustre provides redundancy in the design, i.e.
you can have a secondary MDS that comes online when the primary has failed,
and OSSes can assume control of one another's OSTs in case of host failure,
but it does not implement any of this functionality. You'll have to use
something like Linux-HA to get your server nodes to gracefully resume
service for failed nodes.

Again, I'd strongly recommend using 1.6, as the latest version of Linux-HA
has native support for Lustre, which can use standard Linux 'mount'
commands. If you go with 1.4, you'll have to noodle with XML and scripts
(which will still work) which is more of a hassle, while 1.6 just works.

cheers,
Klaus

On 6/25/08 11:09 PM, Dhruv [EMAIL PROTECTED]did etch on stone
tablets:

 Hello Everybody,
 I am a novice in using Lustre. Wanted some help.
 I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am
 facing some problems.
 
 Case1:
 I tried the ost failover . Following is the config file to generate
 xml file.
 
 rm -f failover_2node.xml
 ./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 --
 nettype tcp
 ./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 --
 nettype tcp
 ./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 --
 nettype tcp
 ./lmc -m failover_2node.xml --add net --node client --nid '*' --
 nettype tcp
 
 # Cofigure MDS
 ./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test --
 fstype ldiskfs --dev /dev/sdb5
 
 # Cofigure LOV
 ./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test --
 stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0
 
 # Configures OSTs
 ./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test
 --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1
 ./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test
 --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7
 
 # Configure client (this is a 'generic' client used for all client
 mounts)
 ./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/
 lustre --mds mds_test --lov lov_test
 
 .
 Following were my lconf commands.
 
 1. lconf --reformat --node node-ost1 failover_2node.xml    on sm02
 2. lconf --reformat --node node-ost2 --service=ost1
 failover_2node.xml  on sm06
 3. lconf --reformat --node node-mds failover_2node.xml on sm01
 4. lconf --node client failover_2node.xml ... on sm02 and sm06
 
 So my intention is to keep a failover ost node incase one fails. MDS
 is on a seperate node. I tried different scenarios where one ost goes
 down and still data can be retrieved from other. New files can be
 created and old can be deleted on the failover ost. Data was available
 most of time.
 
 So my question is whether Linux HA required to configure such failover
 scenario?
 
 Case2:
 I tried with same sort of formula as shown above for a failover MDS.
 But when the main MDS fails, it doesnt switches to new MDS. Also when
 the main MDS comes up again, the file system doesnt recover. I brought
 down the client and again brought up. Then it was working.
 
 So is Linux HA or similar program necessary for configuring failover??
 
 Dhruv
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MGS disk size and activity

2008-06-17 Thread Klaus Steden


 I don't think it would
 be much,  such that could it share spindles with the journal for the
 MDS file system?
 
 Hrm.  Given it's relatively low use, I'd think that would be fine.
 
I have a question ... if the MGS is used so infrequently relative to the use
of the MDS, why is it (is it?) problematic to locate it on the same volume
as the MDT?

thanks,
Klaus

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Rule of thumb for setting up lustre resources...

2008-06-16 Thread Klaus Steden

Hi Mark,

See my comments inline below.

cheers,
Klaus

On 6/14/08 11:22 AM, Mark True [EMAIL PROTECTED]did etch on stone
tablets:

 
 Hello!
 
 I am new to the list, but I have been researching Lustre for quite some time
 and finally have an occasion to use it.  I am trying to do some capacity
 planning and I am wondering if there are some general rules of thumb for
 configuring a Lustre environment.
 
 Specifically:
 
 A If increasing the number of OSTs increases throughput, is there a
 relationship that can be used to determine how many OSTs we're likely to need
 at the outset to establish a baseline minimum throughput.  For examples, if I
 want to get 3GB sustained throughput how many OSTs will facilitate this.
 
 B Does the MGS and MDS have to be separate for best performance, or can they
 be consolidated into one server without causing too much hardship
 

 C  Right now I am looking at a model where I am connecting all the OSTs, and
 the MDS/MGS together using infiniband, and connecting the storage via
 fibrechannel.   Is this the ideal solution or am I going in the wrong
 direction.  

This is a good solution, and will give you good performance overall,
although you can mix different storage technologies and network technologies
within the same storage environment and it should remain relatively
transparent. I've got a cluster that handles both FC storage and iSCSI
storage, but I know there are people out there using DRBD, and I'm dying to
try Infiniband-based storage as well. Anything that presents a block device
to an OSS should be suitable for use with Lustre, but some will perform
better than others.

Bottom line, I think, is pick the best technology for your price range and
performance needs. Infiniband + FC is pretty much the top of the mountain,
though.
 
 D Just wondering what clustering software people use on the front end with
 Lustre typically, if they are going to be using this as a filesystem for some
 kind of HPC environment, what is the most popular clustering technology for
 this.
 
Our CFS clusters are all organized as part of ROCKS clusters. I know a
number of people on this list are on the ROCKS list, so there's good
cross-pollination between technologies. It's a mature cluster architecture
designed for HPC, and bundles a number of useful solutions and tools onboard
(MPI, SGE, Torque, distributed compilers, visualization, etc.). It's also
relatively easy to integrate with Lustre, as you can simply drop in the
pre-built Lustre RPMs into the cluster installer and be ready to go in a few
minutes.

 E Does Heartbeat install next to whatever HPC clustering technology you have?

I'm using Linux-HA, and it wasn't built into my cluster software distro, but
it was easy enough to drop into the mix, and as of late last year had native
disk support for Lustre file systems.
 
 Thanks, and I hope that I can soon be someone who contributes rather than just
 asking questions :)
 
 --Mark T.
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] CIFS

2008-06-12 Thread Klaus Steden

Yes, if you export the CFS volume from one of the client nodes using Samba,
you¹ll be able to access it via CIFS from a Windows client.

Performance will be well below customary Lustre performance, though, so
don¹t expect miracles. :-)

cheers,
Klaus

On 6/11/08 12:50 PM, [EMAIL PROTECTED] [EMAIL PROTECTED]did etch
on stone tablets:

 
 Hi all, 
 
 Can the windows clients access to the data under the lustre file system via
 CIFS? 
 
 Thank you. 
 Regards, 
 Harutyun 
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre file system with failover

2008-06-12 Thread Klaus Steden

Hi Trupti,

It depends on how your failover is implemented. The bottom line is that if
you have a transaction in-flight when your MDT is disconnected, all new
transactions will block until queued and in-flight transactions either
complete or time out. If your failover window is a few seconds or less, you
shouldn¹t notice more than a minor blip as the failover MDS recovers state
for the downed MDS and finishes any transactions in progress.

Your results will vary depending on the hardware you use and the settings of
your failover; in one of my clusters, my failover window is actually quite
long (about a minute and a half) due to the way the storage is implemented
(the FC buses spend considerable time polling each visible LUN looking for
the Lustre ones and I never bothered with device multi-pathing), but my
transactions complete as expected once the MDT becomes active on the
failover MDS.

cheers,
Klaus

On 6/11/08 5:07 AM, trupti shete [EMAIL PROTECTED]did etch on stone
tablets:

 Hi
 I am very new to lustre file system.
 I want to know how can I test whether the failover is working or not?
 
 I am having the following scenario of lustre file system--
 MGS- /dev/sdb on node1 and failnode node2
 MDT- /dev/sdc on node1 and failnode node2
 (I am using iSCSI for the sharing discs /dev/sdb  /dev/sdc between node1 
 node2)
 OST1- /dev/sda1 on node2
 OST2- /dev/sda2 on node2
 And there are 3 clients.
 
 If client1 opens a file to write and at that time if I umount the MDT (which
 will be mounted on node1) will node2 take care of? Will client1 experience any
 difference?
 
 -Trupti
 
   
 
  Explore your hobbies and interests. Click here to begin.
 http://in.rd.yahoo.com/tagline_groups_6/*http://in.promos.yahoo.com/groups/
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre and multi path

2008-06-05 Thread Klaus Steden

Hi Brock,

I've got a Sun StorageTek array hooked up to one of our clusters, and I'm
using labels instead of multi-pathing. We've got it hooked up in a similar
fashion as Stuart; it's a bit slow and sloppy when initializing, but it
works well enough and there are no problems once OSTs are online.

Klaus

On 6/5/08 3:57 PM, Brock Palen [EMAIL PROTECTED]did etch on stone
tablets:

 Our new lustre hardware arrived from sun today.  Looking at the duel
 MDS and FC disk array for it.  We will need multipath.
 Has anyone ever used multipath with lustre?  Is there any issues?  If
 we set up regular multipath via LVM lustre won't care as far as I can
 tell and browsing archives.
 
 What about multipath without LVM?  Our StorageTek array has dual
 controllers with dual ports going to dual port FC cards in the
 MDS's.  Each MDS has a connection to both controllers so we will need
 multipath to get any advantage to this.
 
 Comments?
 
 
 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre and multi path

2008-06-05 Thread Klaus Steden

Hi Brock,

Yeah, that's likely to be an issue if each host has more than one path ...

What about using HA to force one path to be inactive at the device level? I
know QLogic FC cards support this functionality, although it requires
changing the options used by the driver kernel module ... mind you,
comparing that to a solution using the multipath daemon, that's six of one,
half a dozen of the other, I'd think.

Klaus

On 6/5/08 5:09 PM, Brock Palen [EMAIL PROTECTED]did etch on stone
tablets:

 This would be for the MDS/MGS only, but thats good to know.  Problem
 is our two MDS servers (active/passive) will have two connections
 each to the same lun, so there could be issues.
 
 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985
 
 
 
 On Jun 5, 2008, at 7:52 PM, Klaus Steden wrote:
 
 Hi Brock,
 
 I've got a Sun StorageTek array hooked up to one of our clusters,
 and I'm
 using labels instead of multi-pathing. We've got it hooked up in a
 similar
 fashion as Stuart; it's a bit slow and sloppy when initializing,
 but it
 works well enough and there are no problems once OSTs are online.
 
 Klaus
 
 On 6/5/08 3:57 PM, Brock Palen [EMAIL PROTECTED]did etch on stone
 tablets:
 
 Our new lustre hardware arrived from sun today.  Looking at the duel
 MDS and FC disk array for it.  We will need multipath.
 Has anyone ever used multipath with lustre?  Is there any issues?  If
 we set up regular multipath via LVM lustre won't care as far as I can
 tell and browsing archives.
 
 What about multipath without LVM?  Our StorageTek array has dual
 controllers with dual ports going to dual port FC cards in the
 MDS's.  Each MDS has a connection to both controllers so we will need
 multipath to get any advantage to this.
 
 Comments?
 
 
 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Planning a Lustre Quick Start Guide

2008-05-23 Thread Klaus Steden

Hi Brock,

I'm experimenting with a Dell iSCSI array in one of our labs here ... so
far, it behaves pretty typically for a Lustre, although the performance
isn't blazing -- but that's due to limitations of the network
infrastructure.

I didn't notice anything in the iSCSI gear that would indicate that it
couldn't do failover ... on my OSS nodes, I'm using disk labels rather than
device IDs, and the OSTs are interchangeable on both OSSes.

Just fyi, I don't know if there's value in it. I hadn't planned on testing
failover with this config as it was mostly proof-of-concept of iSCSI Lustre
for management, but I could make it happen at some point.

Klaus

On 5/23/08 8:53 AM, Brock Palen [EMAIL PROTECTED]did etch on stone
tablets:

 
 
 This is not quite a HOWTO but has some interesting suggestions
 for HA and backup (even if I think that the only sensible way to
 backup a large Lustre storage pool is another Lustre storage
 pool):
 
   http://indico.cern.ch/contributionDisplay.py?
 contribId=24sessionId=12confId=27391
 
 Interesting they are using DRBD, We thought about this, and there is
 a request about it in bugzilla, but nothing appears to have been done
 about it.  I have used it before for NFS and LVM with Xen virtual
 machines without issue.
 
 We also asked sun about using an iSCSI array for the shared storage
 for failover with the MDT/MGS.  We were told it had not been tested
 and to use FC inplace.  Some what disapointed, FC more than doubled
 the cost of the MDT/MGS setup when you put in the FC adapters and the
 cabinet.  We wanted to be safe though.
 
 On you replacing the thumper sata_mv driver with the one from sun, i
 hope this fixes the lacking performance.
 
 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 [EMAIL PROTECTED]
 (734)936-1985
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Help reviving a 1.4.x volume with a destroyed OST

2008-05-16 Thread Klaus Steden

Hello there,

We had a bit of an accident in one of our labs earlier today, and it
effectively destroyed one of the OSTs in the Lustre file system. From what I
can figure (I wasn't there at the time), one of the OSSes re-provisioned
itself accidentally, and installed its OS information on one of the OSTs in
the cluster. So now we've got a file system with 16 OSTs, one of which is
actually a regular Linux OS install.

We're not quite so worried about the data that's been lost, but it would be
good to bring the file system back online with the hole in place to inspect
it for damage, and then subsequently reformat the damaged piece and
re-insert it into the existing file system.

I've tried doing an 'lctl --inactive UUID config.xml' on the OSS in
question, but it always errors out. I can't pull the UUID off the disk
itself presumably because it was destroyed when the disk was rewritten. From
the config.xml, the UUIDs all look pretty generic -- 'ost2_UUID',
'ost7_UUID', etc. -- but if I use 'blkid' on any of the corresponding LUNs,
I get strings that resemble actual real-world UUIDs.

Is there any place I can extract the
previously-generated-and-now-sadly-destroyed UUID for the damaged OST?

Is the generic-looking UUID field in the XML file an actual UUID?

When it comes time to re-insert the OST in question back into the file
system, is it simply a matter of adding it the same way as adding a new OST,
or will I have to remove information about the previous OST if I want to
replace it inline?

I looked through the manual and Google fairly extensively, but I couldn't
quite find the information I was looking for.

Any help would be greatly appreciated!

thanks,
Klaus

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre with Juniper switches?

2008-04-23 Thread Klaus Steden

Hi there,

Has anyone out in the Lustre community set theirs up in an environment that
used Juniper switches? We're testing one out in the lab with ours, and
something about its configuration isn't working. The same setup has been
tested and operated successfully with Extreme, Cisco, and Alcatel, with no
changes to the Lustre setup itself.

I'm thinking it's more than likely not the Lustre setup, but something about
the switch configuration - VLAN membership, IGMP config, etc.

Is there a checklist of required network protocols/behaviours we can use to
verify our switch configuration?

cheers,
Klaus

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] LMT 2.6.0

2008-04-09 Thread Klaus Steden

Hi Jim,

Out of curiosity (and bear with me if this is a stupid question), do you
know if anyone's integrated LMT with Ganglia? The clusters we have that use
Lustre are all ROCKS-based, which uses Ganglia for monitoring ... I'm not
familiar with Cerebro, so I'm not sure how you're using it, but it would be
nice to be able to integrate LMT's statistics into an existing monitoring
solution rather than implementing a second one.

Other than that, I'm looking forward to screenshots -- this sounds like a
nice add-on.

cheers,
Klaus

On 4/9/08 8:59 AM, Jim Garlick [EMAIL PROTECTED]did etch on stone
tablets:

 Hi Chris,
 
 I think Herb is reporting for jury duty today so I'll speak up and let him
 follow up later if I missed anything.
 
 There are text clients (ltop and lstat).  I haven't tried running
 the GUI remotely and tunneling the X session, or running the client
 locally and tunneling mysql protocol, but I don't know of anything
 architectural that would prevent either from working.
 
 We'll get some screenshots out there.  Herb was reticent to publish
 LMT without good documentation, but I encouraged him because there were
 a couple of people anxious to try it out.  So the lack of screenshots/
 documentation is my bad - we'll try to fix that as time permits.
 
 Regarding the dependencies: it's always a tradeoff between reinventing the
 wheel and adding dependencies.  Sigh.  In this case, we leveraged MySQL for
 the database of historical data; we managed to get a graphics group here
 which is a Java shop to code the clients, so they interface directly to
 MySQL using the Java MySQL bindings; and we leveraged the Cerebro multicast
 based monitoring tool for the data collection because it introduced no
 new dependencies on lustre servers at our site (since we already use it
 to monitor other things), and is very lightweight.
 
 I think this is a good architecture: the clients and data collection
 parts are independent with interfaces defined by the database schema,
 so they could be independently replaced, for example if someone wanted to
 write text clients in C/curses/MySQL, or replace the backend with some
 other data collection infrastructure that they are already using or feel
 is superior.
 
 Jim
 
 On Wed, Apr 09, 2008 at 08:47:39AM -0600, Chris Worley wrote:
 Are there any screenshots/FAQs?
 
 I looked at the package, and the depandancy list is long.
 
 As most of my work is remote, does the GUI run remotely over an SSH tunnel?
 
 Are there any similar console/text-based utilities w/ a shorter
 dependency list and lighter weight for remote access?
 
 Thanks,
 
 Chris
 On Mon, Apr 7, 2008 at 5:54 PM, Herb Wartens [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA512
 
 
  Hi All,
  We have a new and improved version of LMT released
  on Sourceforge.  For those who have never used it, LMT is the Lustre
  Monitoring Tool developed at LLNL.  It provides realtime monitoring of a
  Lustre filesystem (or multiple filesystems).  It also graphs data over time
  for a set of attributes specified by the user.  Please feel free to try it
  out.
 
  Below are the release notes.
  (please forgive this if it is a repost as it looks like lustre-discuss
  may have finally come back up)
 
 
  http://sourceforge.net/projects/lmt
 
  
  Release Notes for LMT 2.6.0  07 Apr 2008
  
 
  * This version of LMT 2 has been tested to work properly with Lustre 1.6.x
   It is no longer necessary to keep the old lustre *.xml configuration and
   there are no more xwatch-lustre.conf or lmt.conf files to set up.
 
  * LMT now uses Cerebro to transport the data collected from the Lustre
   filesystem being monitored. For more info see:
   http://sourceforge.net/projects/cerebro
 
  * Building the rpms requires ibm-java since this is what we use in Chaos 4.
   This can be changed in the specfile to require Sun Java if that is
   preferred (the main issue is that we need a full fledged JVM that
   supports swing).
 
  * There is now a dependency on mysql to store the collected data. Since the
   resolution of the collection is ~5 seconds this can quickly consume a lot
   of space.  The clients and servers are completely decoupled.
 
  * Please refer to the documentation for instructions on how to install and
   configure LMT 2.
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.7 (GNU/Linux)
  Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
 
  iD8DBQFH+rQhP/62XqEEbMYRCqY6AJ9woGAmDo+lqEoOrr8fhSqGGL1aXwCg1Ml5
  NwuPz3GIhOU0zAeZDrueQhA=
  =QWFx
  -END PGP SIGNATURE-
 
 
 ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing 

Re: [Lustre-discuss] lustre cross IP network routing

2008-03-07 Thread Klaus Steden

Hi Andrew,

1. No.

2. Not sure, check the Lustre manual for info on routing. Assuming TCP/IP
can see the whole path, it should work once the configuration for Lustre is
correct.

On 3/7/08 12:55 PM, Lundgren, Andrew [EMAIL PROTECTED]did etch
on stone tablets:

 Is there any restriction that lustre nodes on TCP must be on the same IP
 subnets?
  
 Is there anything special that needs to be done to make a client on one
 network see an MGS/OSSes on another network?
  
 Thanks!
  
 --
 Andrew
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] configuration question 1.6.4; multiple NICs on OSS

2008-03-03 Thread Klaus Steden

Hi Jim,

I use bonding in one of our configurations here (LACP-based, to an Extreme
Summit series switch), and the overhead is not bad. My best performance test
so far provided about 340-350 MB/s sustained read performance across two OSS
nodes, each with two GigE striped together using LACP for a total of 4 GigE
from the file system.

Single link performance with the same equipment was about 200 MB/s (a single
NIC on each OSS), so for me, the overhead of LACP is worth it, since the
overall performance goes up significantly. With the right switch, you can
get some pretty impressive results using plain ol' vanilla GigE.

However ... that's just a suggestion.

From my experience, in order to do what I think you want to do ...

Each OSS would communicate on either eth0 or eth1, and thus its' LNET config
would look like this in /etc/modprobe.conf:

options lnet networks=tcp0(eth0),tcp1(eth1)

On the client side, in order to take advantage of the split networking, your
LNET config would look like this in /etc/modprobe.conf:

options lnet networks=tcp0(eth0)

or this:

options lnet networks=tcp1(eth1)

since with what you're attempting, Lustre will push all its traffic over the
first available link in the case of multiple paths -- so if your clients
were able to choose between one or the other, you'd simply saturate the tcp0
path and nothing would really happen on the tcp1 path.

This gets to be a bit of a hassle to manage, as the administrator has to
take a hand in the load balancing aspect, determining which clients use
which LNET network. This can be handled relatively trivial with some modulo
arithmetic in a Kickstart file (where you'd generate the LNET entries your
client node would use), but really ... it's extra work and extra hassle.

Using bonding on the OSSes, you would see balanced usage of all the
participating NICs and respectable overall throughput, but you don't have to
fool around with multiple LNETs or IP subnetting.

That's just my two cents, and I'm happy to be proven wrong, but for my money
(and labour), it is easier to implement Lustre using a solid NIC bonding
framework than it was to attempt to split up multiple LNETs and keep it all
sorted in my head and on paper.

cheers,
Klaus

On 3/3/08 11:37 AM, Jim Albin [EMAIL PROTECTED]did etch on stone
tablets:

 Hello,
   We're trying to see if we can use multiple NIC's on a pair of OSS's
 without bonding. Trying to decipher the Multi-Home example in the
 Operations Manual 1.6_v1.10 Chapter 7 and I must be missing something. I
 have not attempted bonding yet, the manual seems to suggest you can use
 multiple NIC's without bonding and avoid the overhead of bonding. We're
 looking for either failover or load balancing advantages over a single
 NIC in the OSS.
 
  Could someone please post an example of a configuration similar to
 this:
 
 mdt - eth0 only
 oss1,oss2 - eth0  eth1
 client configuration
 
 If you could include the modprobe.conf entry, mount commands and
 anything else to try or verify with I'd appreciate it very much.
 Thanks in advance.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] how do you mount mountconf (i.e. 1.6) lustre on your servers?

2008-02-14 Thread Klaus Steden

At this point, recovery of the primary server is a manual process -- so in
the case of a failure, the secondary would assume service for the failed
node, which gets powered off; an administrator is required to intervene to
recover the primary.

Klaus

On 2/14/08 2:51 PM, Andreas Dilger [EMAIL PROTECTED]did etch on stone
tablets:

 On Feb 14, 2008  11:17 -0800, Klaus Steden wrote:
 Here's a mount line from our first OSS node:
 
 LABEL=lustre-OST/mnt/lustreost0 lustre  defaults0 0
 LABEL=lustre-OST0001/mnt/lustreost1 lustre  defaults,noauto 0 0
 
 It has a partner, and the lines in that fstab swap the 'noauto' flag.
 
 Klaus, if you have the backup node mounting the filesystem because of
 primary server failure, how do you prevent the primary server from mounting
 the filesystem again as soon as it boots?
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Off topic -- official list address?

2008-02-14 Thread Klaus Steden

Hi there,

Just a quick question ... what is the official sending address of this list?
I'm trying to write filter rules to automatically route messages from it
into my Lustre-list folder ... but some messages are sent by
[EMAIL PROTECTED], while others come in from
[EMAIL PROTECTED]

Is there an address that is consistently present in list mail that I can use
as a filter parameter? I'm using Outlook, which is impossibly lame compared
to something like procmail, so the more obvious a parameter, the better ...

thanks,
Klaus

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre process/system tuning ...

2008-01-31 Thread Klaus Steden

Hello,

I'm seeing some interesting behaviour from one of the nodes in our cluster
when two applications attempt to read from the Lustre. Specifically, one
application is a real time video player, and the other is just interacting
with the file system conventionally ... but if the player is running, and
another process attempts to walk the same directory, a dozen or more ldlm
kernel threads start, and the machine's CPU load skyrockets (up to 40 or 50,
sometimes).

Are there hooks to limit the number of ldlm threads that get launched, or
ways to lower their priority, or raise the player's priority so that it
maintains its real time performance?

This is for Lustre 1.4.7.

thanks,
Klaus

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss