Re: [Lustre-discuss] Lustre and iSCSI
Hi David, I did some experiments last year with Lustre 1.6.x and a Dell iSCSI enclosure. It was a little slow (proof of concept mainly) due to sharing MDT and OST traffic on a single GigE strand, but as long as the operating system presents a valid block device, Lustre works fine. hth Klaus On 7/31/09 11:13 AM, Cliff White cliff.wh...@sun.com etched on stone tablets: David Pratt wrote: Hi. I am exploring possibilities for pooled storage for virtual machines. Lustre looks quite interesting for both tolerance and speed. I have a couple of basic questions: 1) Can Lustre present an iSCSI target Lustre doesn't present target, we use targets, and we should work fine with iSCSI. We don't have a lot of iSCSI users, due to performance concerns. 2) I am looking at physical machines with 4 1TB 24x7 drives in each. How many machines will I need to cluster to create a solution with provide a good level of speed and fault tolerance. 'It depends' - what is a 'good level of speed' for your app? Lustre IO scales as you add servers. Basically, if the IO is big enough, the client 'sees' the bandwidth of multiple servers. So, if you know the bandwidth of 1 server (sgp_dd or other raw IO tools helps) then your total bandwidth is going to be that figure, times the number of servers. This assumes whatever network you have is capable of sinking this bandwidth. So, if you know the IO you need, and you know the IO one server can drive, you just divide the one by the other. Fault tolerance at the disk level == RAID. Fault tolerance at the server level is done with shared storage failover, using linux-ha or other packages. hope this helps, cliffw Many thanks. Regards, David ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mounting via cname
I believe so ... I was using host names with Lustre 1.6.2 and had no issues. Just make sure DNS/name resolution is working properly and consistently. Klaus On 6/2/09 9:04 AM, Robert Olson ol...@mcs.anl.gov etched on stone tablets: I'm curious if it's possible / wise to configure client mounts to use a cname instead of an IP address (server-cname@tcp:/fsname). thanks, --bob ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
Hi Michael, Just want to throw my two cents in with Isaac's posting, as I spent a great deal of time working with these kinds of features over the course of the last two years. In my experience with Lustre 1.6, in the case where multiple NICs were available, Lustre will default to using the first one exclusively until it detects a failure and then switches over to the next available. It will also not distinguish between different NIC types, i.e. IB, GigE, etc., will be picked based on discovery order not speed or some other metric. I didn't even touch Lustre bonding, because as you both remark, it's a little convoluted. I spent a lot of time experimenting with Lustre over 802.3ad (LACP) aggregated links using the Linux bonding driver, and my OSS nodes produced very respectable to very good numbers. Across a pair of OSS nodes each with 2 x GigE NICs, I was able to sustain ~ 350 MB/s write speed when running sandbox tests, so it appears that although the LACP driver doesn't balance a connection across multiple links (i.e. a 2 GigE LACP bond doesn't give you 2 Gbit throughput for a single network I/O), the Lustre implementation somehow manages to squeeze more data through the pipe. To get it set up, simply configure NIC bonding of whatever flavour suits your needs on the OS nodes, and then assign 'bond0' to your tcp networks, something like this: options lnet networks=tcp0(bond0) and you should be off to the races. hth, Klaus On 5/7/09 12:57 PM, Isaac Huang he.hu...@sun.com etched on stone tablets: On Thu, May 07, 2009 at 02:50:13PM +0200, Michael Ruepp wrote: Hi there, .. I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0 and 10.111.20.39-42 oss1 Do I have to make modprobe.conf.local look like this to force lustre to use all four interfaces parallel: options lnet networks=tcp0(eth0,eth1,eth2,eth3) Because on Page 138 the 1.8 Manual says: Note ? In the case of TCP-only clients, the first available non- loopback IP interface is used for tcp0 since the interfaces are not specified. Correct. or do I have to specify it like this: options lnet networks=tcp Because on Page 112 the lustre 1.6 Manual says: Note ? In the case of TCP-only clients, all available IP interfaces are used for tcp0 Wrong. It needs to be updated as well, Sheila? .. My goal ist to let lustre utilize all four Gb Links parallel. And my Lustre Clients are equipped with two Gb links which should be utilized by the lustre clients as well (eth0, eth1) Or is bonding the better solution in terms of performance? I don't have any performance comparisons between the two approaches, but I'd suggest to go with Linux bonding instead (let's call the tcp0(eth0,...ethN) approach Lustre bonding), because: 1. With Lustre bonding it's rather tricky to get routing right, especially when all NICs reside in a same IP subnet. Lustre tcp network driver, as its name suggests, works at TCP layer and the decision as to which outgoing interface to use depends on Linux IP layer routing. When all NICs live in a same IP subnet, it's very possible that all outgoing packets would go through the interface of the 1st route in the Linux routing table, unless some tweaking has been done to also take source IPs into account. Incoming packets could also come in via unexpected NICs, depending on your settings in /proc/sys/net/ipv4/conf/*/arp_ignore and your ethernet topology. 2. Linux bonding does a good job of detecting link status via either the ARP monitor or the MII monitor, but no such mechanism exists in Lustre bonding. In fact, the Lustre bonding is an officially obsoleted feature if I remember correctly. Thanks, Isaac ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Launch of the redesigned lustre.org wiki
Looks good! Glad to see Sun is keeping this resource online. Klaus On 4/16/09 10:21 AM, Sheila Barthel sheila.bart...@sun.com etched on stone tablets: The Lustre Group is pleased to announce the launch of the redesigned lustre.org wiki, which includes a new top-level design, added pages, and an updated color scheme and logo. Check it out at: http:/www.lustre.org. Over the next few months, more pages will be added and content on existing pages will be refreshed. We are interested in your feedback on the look and usability of the redesigned site, and ways we can improve it. To submit comments, use the feedback link on the home page. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Recovery fails if clients not connected
Hi Roger, I believe you can connect the OSSs once the MDS has booted, and in fact, I¹m pretty sure that the five in the connected_clients: 0/5¹ are in fact your OSS nodes. Each OST maintains a connection to the MDS while the file system is mounted, so they will be included in the connection count on the MDS. However, regardless of the state if your MDS is online and the MDT is mounted, you can start up the OSS nodes and corresponding OSTs at any time; clients attempting to make transactions will have their I/O operations block (or fail, depending on the MDS config) until the missing nodes come back online. hth, Klaus On 1/20/09 3:05 PM, Roger Spellman ro...@terascala.com etched on stone tablets: I have 2 MDS, configured as an active/standby pair. I have 5 OSTs that are NOT active/standby. I have 5 clients. I am using Lustre 1.6.5, due to bug 18232 https://bugzilla.lustre.org/show_bug.cgi?id=18232 which only affects 1.6.6. Using Lustre 1.6.5, when I reset my active node, the standby takes over. This is quite reliable. Today, I did the following in this order: Unmounted all the clients Rebooted all the clients Stopped Linux HA from running Unmounted the OSTs Unmounted the MDS Rebooted the OSTs Rebooted both MDSes When the MDSes started up, Linux HA chose one to be active. That system mounted the MDT. I looked at the file /proc/fs/lustre/mds/tacc-MDT/recovery_status, and it showed: [r...@ts-tacc-01 ~]# cat /proc/fs/lustre/mds/tacc-MDT/recovery_status status: RECOVERING recovery_start: 0 time_remaining: 0 connected_clients: 0/5 completed_clients: 0/5 replayed_requests: 0/?? queued_requests: 0 next_transno: 17768 * Note that recovery_start and time_remaining are both zero. * I waited a several minutes, and this file was the same. I was waiting for recovery to complete before trying to mount the OSTs. However, it appears that this would never occur! Does this look like a bug? --- I format my MDT using the following command. The command is run from 10.2.43.1, and the failnode is 10.2.43.2: mkfs.lustre --reformat --fsname tacc --mdt --mgs --device-size=1000 --mkfsoptions=' -m 0 -O mmp' --failnode=10.2.4...@o2ib0 /dev/sdb I format the OSTs using the following command: /usr/bin/time -p mkfs.lustre --reformat --ost --mkfsoptions='-J device=/dev/sdc1 -m 0' --fsname tacc --device-size=4 --mgsnode=10.2.4...@o2ib0 --mgsnode=10.2.4...@o2ib0 /dev/sdb I mount the clients using: mount -t lustre 10.2.4...@o2ib:10.2.4...@o2ib:/tacc /mnt/lustre ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre locking
Hi Mag, If I'm not mistaken, only qmaster writes to the DB, the execd process relays queries through a listening daemon using RPC on the qmaster host which speaks BDB on the back end. hth, Klaus On 1/16/09 4:22 PM, Mag Gam magaw...@gmail.com etched on stone tablets: Thanks Andreas. We also run Sun Grid Engine for our engineering department. Out setup is basically like this: Master -- QMASTER (1 server) Slaves -- EXECD (300 servers) They are share a filesystem which is running of Lustre. Grid Engine has a Berkeley Database as its backend. I am wondering if I need to change all of my slaves and master to distributed locking or local locking. Any thoughts? TIA On Fri, Jan 16, 2009 at 10:10 AM, Andreas Dilger adil...@sun.com wrote: On Jan 16, 2009 00:52 -0500, Mag Gam wrote: At our university many of our students and professors use SQLite and Berkley DB for their projects. Probally, BDB more than SQLite. Would I we need to have Lustre mounted up a certain way to avoid corruption via file locking? Any thoughts about this? That depends on how they use it. Mounting Lustre with -o localflock will provide locking on a single node without any performance impact, which is enough for single-node databases like SQLite and Berkley DB. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] tcp0 for maximum effect
Hi folks, Lustre doesn't support any inherent link aggregation, it simply utilizes the device node the OS presents. If this is a bonded NIC, it will use it no problem, but the underlying device driver takes care of load balancing and distribution. I've used Lustre 1.6.x quite successfully with load-balanced 802.3ad configurations; in some of my tests I was able to get about 350 MB/s aggregate sustained across two OSS nodes with 2 x GigE bonded each. 802.3ad link aggregation is a standard NIC bonding protocol, and is supported on all good quality L3 switches and by vendors like Cisco, Foundry, Extreme, and Juniper. cheers, Klaus On 1/11/09 11:37 AM, Peter Grandi pg_...@lus.for.sabi.co.uk etched on stone tablets: I have two boxes that have this: [r...@lustrethree Desktop]# ifconfig eth0 Link encap:Ethernet HWaddr 00:1B:21:2A:17:76 inet addr:192.168.0.19 Bcast:192.168.0.255 Mask:255.255.255.0 RX bytes:120168321 (114.6 MiB) TX bytes:5300070662 (4.9 GiB) [ ... ] eth1 Link encap:Ethernet HWaddr 00:1B:21:2A:1C:DC inet addr:192.168.0.20 Bcast:192.168.0.255 Mask:255.255.255.0 RX bytes:55673426 (53.0 MiB) TX bytes:846 (846.0 b) [ ... another 4 like that, 192.168.0.21-24 ... ] That's a very bizarre network configuration, you have 5 interfaces on the same subnet (presumably all plugged into the same switch) with no load balancing, as all the outgoing traffic goes via 'eth0'. You have some better alternatives: * Use bonding (if the switch supports to ties together the 5 interfaces as one virtual interface with a single IP address. * Use something like 'nexthop' routing (and a couple other tricks) to split the load across the several interfaces. This is easier for the outgoing traffic than the incoming traffic, but it seems you have a lot more outgoing traffic. * Use 1 10Gb/s card per server and a 1Gb/s switch with 2 10Gb/s ports. 10Gb/s cards and switches have fallen in price a lot recently (check Myri.com), and a server that can do several hundred MB/s really deserves a nice 10Gb/s interface. IIRC 'lnet' has something like bonding built in, but I am not sure that it handles multiple addresses in the same subnet well. Would it be better to have these two boxes as OSS's or as MDT or MGS machines? Currently they are configured 1 as a MGS and the other as the MDT. If these are the two servers with gigantic disk arrays, I'd have on each both MDS and OSS. Possibly with the OSTs replicated across both machines in an active/passive configuration. The question is does LNET use the available tcp0 connections different from the OSS perspective as opposed to the MDT or MGS perspective? Not sure that the question means. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Using ib0 and tcp
Hi Joseph, Lustre attaches itself to the first listed/available interface, in your case, ib0. I don't remember off the top of my head if there is (or what the right way is) to do what you want to do. hth, Klaus On 11/13/08 4:18 PM, Joseph Farran [EMAIL PROTECTED] etched on stone tablets: Howdy. Newbie here. I have Lustre 1.6.6. setup and running. For the network, we have eth0 and eth1 bonded (channel bonding) and we also have Infiniband (SilverStorm). /etc/modprobe.conf file looks like this: alias bond0 bonding options bond0 mode=balance-alb miimon=100 options lnet networks=o2ib(ib0),tcp(bond0) Lctl on one of my OSS's shows my network as: # lctl list_nids [EMAIL PROTECTED] [EMAIL PROTECTED] How can I get Lustre to use both ib0 and bond0 (eth0 / eth1) for the data nework? Currently it only uses Infiband (ib0) and not bond0. Thanks, Joseph ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Network aliasing and HA
Hi Timh, If you're using Linux-HA, you can configure how quickly failover takes place. I have mine set to 90 seconds before the primary is marked dead and the secondary takes over. When this occurs, any Lustre transactions not yet in flight will block until the ones that were in progress at the time of the failure have either had a chance to complete or have timed out. I'm not sure how to modify Lustre-specific settings for recovery time, though. cheers, Klaus On 9/25/08 1:54 PM, Timh Bergström [EMAIL PROTECTED]did etch on stone tablets: To follow up on this matter, i've currently set ha/drbd as suggested, formatted the ost's with double mgsserver directives and also mounted with double addresses on the clients, as [EMAIL PROTECTED]:[EMAIL PROTECTED]:/fsname - though, if i fail mgs/mdt 1 it does not recover (in a resonable time), what kinds of tuning/settings will affect this? //Timh 2008/9/23 Timh Bergström [EMAIL PROTECTED]: Thank you, that's the path i've taken from the last message on this list, as I misunderstood some of the drbd/ha setups before. However, using 4 mgsnode paths, is that recommended or should I use one mgspath per node and use the other as some sort of manual failover? Regards, Timh 2008/9/23 Kevin Van Maren [EMAIL PROTECTED]: Note that you do not normally use IP takeover with Lustre/Heartbeat: you set the failover IP addresses with the mkfs.lustre command, and Lustre reconnects to the _other_ address when it is disconnected. In your case, you would have 2 fixed addresses for each node (w/o heartbeat - do NOT use the heartbeat virtual IP addresses), and specify both those failover NIDs (rather than just 1). Lustre1.6 is a bit different from a lot of HA/Heartbeat users: Lustre _knows_ about the multiple paths/addresses, and simply requires Heartbeat to ensure it is mounted on exactly one node in the failover pair: it does NOT rely on IP takeover for HA. Kevin Van Maren Timh Bergström wrote: 2008/9/23 Brian J. Murrell [EMAIL PROTECTED]: On Tue, 2008-09-23 at 15:06 +0200, Timh Bergström wrote: Hi, Hi, Hi again, and thanks for the quick reply! My (current) modprobe: options lnet networks=tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50 This syntax is incorrect. For some examples of multi-homed configurations see the manual at http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigu rations.html#50642998_20213 Yes that's the link i've been consulting, perhaps im not looking hard enough. This is the errors i get: LustreError: 10f-e: Error parsing 'networks=tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50' When you specify networks because you specify the interfaces to use, you don't need to specify the ip address. I think you are confusing the networks and ipnets options. The problem here exactly is that the physical interfaces is there, but not with the ip-addresses i want the mdt to listen on - the NIDs, they are added later through heartbeat as aliases (IPaddr2::10.4.21.50 IPaddr2::10.4.22.50), but before mounting the mdt-resource (drbd). LustreError: 110-0: here...|-| LustreError: 4527:0:(events.c:707:ptlrpc_init_portals()) network initialisation failed (along with a bunch of errors since this module does not load) I've tried with tcp0(eth0:0) which fails with about the same error, i've tried tcp0(eth0,eth1) which gives me the wrong addresses (machine ones) but works. What is the topology exactly? Are there two nics or one nic with two addresses? Are the two nics on the same physical network or separate physical networks? eth0 and eth1 are physical interfaces, they have statically assigned ip's (for management, supervision etc), heartbeat then adds addresses to theese two interfaces if the node is primary. If it matters - eth0 and eth1 has separated physical paths to everything, this is because we want to survive a physical fail on the network before failing over to another physical server. As I read the manual, i format my OST's with more than one --mgsnode option, which in turn will make the OST know about both path's to the MDS/MGS server(s). As in, if first MGS does not work (physical network failure on side A) - try second (Physical side B). What we healthcheck on is the data/disks/server hardware which will tell heartbeat to fail over to server 2 which takes over network path A and network path B (on 10.4.[21,22].50), and the OST's/clients should continue working without noticing. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Timh Bergström System Administrator Diino AB - www.diino.com :wq ___ Lustre-discuss mailing list
Re: [Lustre-discuss] Lustre 1.6.5.1 on the box with 2 interfaces
Hi Lukas, Do you want to use these interfaces together as a logical unit? The syntax below would construct two separate LNET networks. Additionally, you would need to qualify the mount path on the client side to bind Lustre to a specific LNET instance, i.e. 'mount /mnt/lustre [EMAIL PROTECTED]:/lustre', or else it will default to the first available LNET instance. The order doesn't really matter in the module declaration, though. Looking at my OSS nodes, I can see entries in /proc/fs/lustre/devices, as well as in the output of 'df'. Did you specify a file system type? Normally when mounting either an OST or an MDT, you need to provide a '-t lustre' argument to 'mount', or list that as the fstype in /etc/fstab. Otherwise, I'm assuming it mounts the device as ext3, which works, but isn't usable by Lustre. hth, Klaus On 9/24/08 12:41 PM, Lukas Hejtmanek [EMAIL PROTECTED]did etch on stone tablets: Hello, I have some difficulties to setup a server acting as metadata server and OST server. The box has two interfaces, eth0 and eth2. eth0 is the primary interface and has the ip which is associated with the hostname. However, I would like to use the eth2 interface for Lustre. In an ideal case, I would like to use Lustre on both interfaces. But for now, it would be sufficient to use just the non-primary one. I use options lnet networks=tcp0(eth2),tcp1(eth0) I created the metadata: /usr/local/lustre/sbin/mkfs.lustre --fsname=l_smaug2 --reformat --mdt --mgs /dev/Scratch_VG/Scratch_1 I mounted the mdt: mount /dev/Scratch_VG/Scratch_1 /mnt/lustre/mdt Then I created the OST /usr/local/lustre/sbin/mkfs.lustre --fsname=l_smaug2 --reformat --ost [EMAIL PROTECTED] /dev/Scratch_VG/Scratch_2 I mounted it: mount /dev/Scratch_VG/Scratch_2 /mnt/lustre/ost0 However, cat /proc/fs/lustre/devices shows that no OST is attached at all. What am I missing? eth0 has a public address, eth2 has the 192.168.1.1 address. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST failover
Hi, You interconnect OSS1 and OSS2, OSS2 and OSS3, OSS3 and OSS4, and OSS4 and OSS1 together via an out-of-band communication fabric, install Linux-HA heartbeat software to monitor mount points, and configure nodes to take over each other¹s mounts in case of failure. The implementation I have here uses only single point-to-point links between OSS pairs (there is no crossover between failover partners) but the princple is the same; your implementation should work pretty much the same way, with a few minor differences. I¹m assuming you¹ve got unused NICs on your OSSes, so if you connect those to a standard GigE edge switch protected by UPS, you can create an HA fabric which will give you the coverage you need. Lustre itself doesn¹t support failover natively, you need to implement any failover using a third-party package; most of us out there use Linux-HA to great effect. cheers, Klaus On 8/27/08 2:30 AM, sd a [EMAIL PROTECTED]did etch on stone tablets: Hi the list, I have 4 OSS servers , each OSS has 1 gigabit network interface and 2 OSTs, first OST use /dev/sdb1, second OST use /dev/sdb2. OSS1: OST1 OST2 OSS2: OST3 OST4 OSS3: OST5 OST6 OSS4: OST7 OST8 Each file is striped across 4 OSTs : OST1,2,3,4 And I want to setup failover as: OST2 failover with OST3 OST4 failover with OST5 OST6 failover with OST7 OST8 failover with OST1 How do I do? Thanks. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] multiple NLI's/interfaces to listen on?
Hi Robert, You'd need to adjust the lnet options line in /etc/modprobe.conf to force Lustre to bind to all your NICs, I believe it binds to the first one available unless instructed otherwise. Try this: -- cut -- options lnet networks=tcp0(eth1),tcp1(eth2),tcp3(eth3) -- cut -- which will serve up the Lustre volume on each of your interfaces via three different LNET networks. You may be able to organize them all under a single LNET network, but without the right routing in place, it gets very complicated. hth, Klaus On 8/25/08 5:23 AM, Robert Hassing [EMAIL PROTECTED]did etch on stone tablets: Hi All, Got this little problem wich is driving me nuts. I have a small network with a combined Lustre MDT/MGS/OST server and 3 clients I am trying to let the server listen on 3 different NIC's in a seperate network. e.g.: eth1: 10.1.34.50 eth2: 10.1.35.50 eth3: 10.1.36.50 I want to connect each client directly to one of the interfaces but when trying that only eth1 works. from dmesg: Lustre: Added LNI [EMAIL PROTECTED] [8/256] Lustre: Accept secure, port 988 it appears the server only is listening on eth1 and not the other NIC's Is there a possibility to let the other NIC's also listen so i can use them? Kind regards Robert H ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Log file opens/reads/etc?
Hello Andreas, My apologies for not explaining myself. :-) The trusted computing standards I'm talking about (there are a few, some good, some not so much) are effectively based on US Department of Defense C2 (aka Orange Book) security standards: http://en.wikipedia.org/wiki/TCSEC The best audit trail implementations I've seen are based on Sun's BSM, adopted and implemented by both FreeBSD and Apple in their auditing code. http://docs.sun.com/app/docs/doc/806-1789 http://www.apple.com/support/security/commoncriteria/ http://www.freebsd.org/doc/en/books/handbook/audit.html BSM-based auditing systems define classes of system calls, users, and groups of users that are of interest -- file create, file read, login, socket opens, people in the 'wheel' group, etc. -- and record a realtime log of events as they occur within the kernel. This information is stored in a packed binary format, and can be exploded into ASCII for parsing and analysis using built-in tools, allowing you to establish a complete audit trail of the operations of interest. How Lustre would implement this I'm not sure, since it's object-based and BSM auditing records file names ... but the idea is important, especially in digital media where auditability keeps lawyers from the MPAA and the big studios at bay. cheers, Klaus On 8/18/08 9:13 PM, Andreas Dilger [EMAIL PROTECTED]did etch on stone tablets: On Aug 18, 2008 17:18 -0700, Klaus Steden wrote: Hrm. Who should I contact to find out more, then? Nathan is working on the Changelog code, but I think the main issue is that neither of us know what compliant with Trusted Computing standards really means. On 8/18/08 4:44 PM, Andreas Dilger [EMAIL PROTECTED]did etch on stone tablets: On Aug 18, 2008 12:53 -0700, Klaus Steden wrote: Will this be compliant with Trusted Computing standards? i.e. will it be possible to use this information for auditing purposes? I don't know enough about that to make a useful answer, sorry. On 8/18/08 3:43 AM, Andreas Dilger [EMAIL PROTECTED]did etch on stone tablets: On Aug 09, 2008 05:06 -0700, daledude wrote: Is there is a tool that shows what files are being accessed? Sort of like inotify, but not inotify? I'd like to compile file access statistics to try and balance the most accessed files across the OST's better. There is a feature being worked on for Lustre 2.0 called Changelogs that will allow recording all files that are modified. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Log file opens/reads/etc?
Hi Andreas, Will this be compliant with Trusted Computing standards? i.e. will it be possible to use this information for auditing purposes? thanks, Klaus On 8/18/08 3:43 AM, Andreas Dilger [EMAIL PROTECTED]did etch on stone tablets: On Aug 09, 2008 05:06 -0700, daledude wrote: Is there is a tool that shows what files are being accessed? Sort of like inotify, but not inotify? I'd like to compile file access statistics to try and balance the most accessed files across the OST's better. There is a feature being worked on for Lustre 2.0 called Changelogs that will allow recording all files that are modified. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Log file opens/reads/etc?
Hi Andreas, Hrm. Who should I contact to find out more, then? thanks, Klaus On 8/18/08 4:44 PM, Andreas Dilger [EMAIL PROTECTED]did etch on stone tablets: On Aug 18, 2008 12:53 -0700, Klaus Steden wrote: Will this be compliant with Trusted Computing standards? i.e. will it be possible to use this information for auditing purposes? I don't know enough about that to make a useful answer, sorry. On 8/18/08 3:43 AM, Andreas Dilger [EMAIL PROTECTED]did etch on stone tablets: On Aug 09, 2008 05:06 -0700, daledude wrote: Is there is a tool that shows what files are being accessed? Sort of like inotify, but not inotify? I'd like to compile file access statistics to try and balance the most accessed files across the OST's better. There is a feature being worked on for Lustre 2.0 called Changelogs that will allow recording all files that are modified. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mounting an OST from another node attached to a fibre channel switch
Yes. The MGS contains all the configuration information for the file system it serves, including the locations and network paths for all the MDS and OSS nodes within the file system. What you want to do is tell the MGS that your OSS(es) have been moved to new addresses and have it update its records. At that point, your file system will be usable in the new configuration. hth, Klaus On 8/15/08 2:18 PM, Ron [EMAIL PROTECTED]did etch on stone tablets: Which section of which manual? Please. I.e are you talking about: Part No. 820-3681-10 Lustre manual version: Lustre_1.6_man_v1.10 December 2007 4.2.3.2 Running the Writeconf Command and/or 4.2.3.3 Changing a Server NID Our MGS is not changing nodes, but the OSTs are. Is there really just one simple MGS only operation? Those sections do not quite fit (to my way of thinking, which is in the process of being adjusted :} Thanks, Ron On Aug 14, 4:33 pm, Klaus Steden [EMAIL PROTECTED] wrote: Yes. There is an entry in the manual on this topic. You'll have to stop Lustre and update the MGS configuration, but it's a pretty quick operation. cheers, Klaus On 8/14/08 2:29 PM, Ron [EMAIL PROTECTED]did etch on stone tablets: Hi, We have set up a couple of test systems where the OSTs are mounted on the same node as the MDT. The OSTs are LUNS on a SATA Beast controller accessible from multiple systems attached to a fibre channel switch,. We would like to umount an OST from the MDT system and mount it on another system. We've tried doing this and even though there is network traffic between the new system and the mds system, the mds system seems to be ignoring the OST mount. Can the an OSTs OSS change? Thanks, Ron ___ Lustre-discuss mailing list [EMAIL PROTECTED] http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list [EMAIL PROTECTED]://lists.lustre.org/mailman/listinfo/lustr e-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mounting an OST from another node attached to a fibre channel switch
Yes. There is an entry in the manual on this topic. You'll have to stop Lustre and update the MGS configuration, but it's a pretty quick operation. cheers, Klaus On 8/14/08 2:29 PM, Ron [EMAIL PROTECTED]did etch on stone tablets: Hi, We have set up a couple of test systems where the OSTs are mounted on the same node as the MDT. The OSTs are LUNS on a SATA Beast controller accessible from multiple systems attached to a fibre channel switch,. We would like to umount an OST from the MDT system and mount it on another system. We've tried doing this and even though there is network traffic between the new system and the mds system, the mds system seems to be ignoring the OST mount. Can the an OSTs OSS change? Thanks, Ron ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] NFS and lustre
Hello Mag, Some people on-list have tried it before, but it generally performs poorly. I believe HP contributed some optimizations, but the consensus as recently as the start of this year was that it will generally suck. Check the list archives for more information, I believe CIFS was a better performer. If you're exporting to Linux clients, going with native Lustre clients is definitely a better option than using NFS (although you may not be doing so). cheers, Klaus On 8/5/08 7:25 PM, Mag Gam [EMAIL PROTECTED]did etch on stone tablets: Is there a guide for Lustre and NFS? At our university, we have a lustre server which also exports NFS data. We are using Centos, and I was wondering if there are any tuning and best practices guides available. TIA ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Does how to know OST to have data?
Hi Johnyla, Use 'lfs getstripe file', i.e. root# lfs getstripe /mnt/lustre/testfile1 OBDS: 0: lustre-OST_UUID ACTIVE 1: lustre-OST0001_UUID ACTIVE /mnt/lustre/testfile1 obdidx objid objidgroup 0 3157260x4d14e0 root# This tells me that 'testfile1' is stored entirely on OST0. good luck, Klaus On 8/6/08 6:09 AM, Johnlya [EMAIL PROTECTED]did etch on stone tablets: Does how to know OST to have data when mount lustre OST? My disk is FCSAN. Thank you! [EMAIL PROTECTED] ~]# uname -a Linux OSS2_MASTER 2.6.9-67.0.7.EL_lustre.1.6.5smp #1 SMP Mon May 12 22:02:50 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover
Hi Brock, I've been using Sun X2200s with Lustre in a similar configuration (IPMI, STONITH, Linux-HA, FC storage) and haven't had any issues like this (although I would typically panic the primary node during testing using Sysrq) ... is the behaviour consistent? Klaus On 7/31/08 1:57 PM, Brock Palen [EMAIL PROTECTED]did etch on stone tablets: I have two machines I am setting up as my first mds failover pair. The two sun x4100's are connected to a FC disk array. I have set up heartbeat with IPMI for STONITH. Problem is when I run a test on the host that currently has the mds/ mgs mounted 'killall -9 heartbeat' I see the IPMI shutdown and when the second 4100 tries to mount the filesystem it does a kernel panic. Has anyone else seen this behavior? Is there something I am running into? If I do a 'hb_takelover' or shutdown heartbeat cleanly all is well. Only if I simulate heartbeat failing does this happen. Note I have not tired yanking power yet, but I want to simulate a MDS in a semi dead state and ran into this. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre 1.6.5.1 panic on failover
netdump is indeed good for this, but you may have to take two or three cracks at it ... it doesn't always dump the complete core image, and you can't really do a whole lot with the incomplete version. Klaus On 7/31/08 5:50 PM, Kilian CAVALOTTI [EMAIL PROTECTED]did etch on stone tablets: On Thursday 31 July 2008 17:22:28 Brock Palen wrote: Whats a good tool to grab this? Its more than one page long, and the machine does not have serial ports. If your servers do IPMI, you probably can configure Serial-over-LAN to get a console and capture the logs. But a way more convenient solution is netdump. As long as the network connection is working on the panicking machine, you should be able to transmit the kernel panic info, as well as a stack trace, to a netump-server, which will store it in a file. See http://www.redhat.com/support/wpapers/redhat/netdump/ Cheers, ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] multiple OSTs accessing the same shared storage simultaneously?
On 7/29/08 6:44 AM, Aaron Knister [EMAIL PROTECTED]did etch on stone tablets: Oh and to answer your question- an OST cannot be mounted twice simoltaneously. ... well, you can- mount it from two locations, you¹re just inevitably going to corrupt the heck out of the volume in question ... so don¹t do that. ;-) I would also echo Aaron¹s remarks about running Lustre servers on VMs ... big waste of compute power, and you will run into significant latency/contention issues. cheers, Klaus ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre Performance Data for Simultaneous Reads and Writes from Multiple Clients
Hi Daniel, I don¹t believe so. Various people have posted informal results from their own tests in the field, but none have ever been formally collated. There are some rough numbers on the Wikipedia page for CFS for GigE, IB, and 10GigE, but they assume particular things about configuration, disk throughput, striping, OST counts, etc. Because Lustre is so versatile, information like this can be hard to nail down running a GigE network with ATA drives is obviously not going to get the best performance compared to 8 GB Fibre Channel, but both are equally valid Lustre configurations. hth, Klaus On 7/25/08 9:03 AM, Daniel Ferber [EMAIL PROTECTED]did etch on stone tablets: I¹m working with someone who is modeling a customer system, and wants to partially model Lustre performance as part of that. What they would like is the following data, or similar, for a given network. I say given in that pick any network config and any given stripe size and any given file size (can be 250MB to 1GB), and then supply the following data: * From a single client, the read I/O start and stop time, or ³bandwidth² * From a single client, the write I/O start and stop time, or ³bandwidth² * Then introduce additional clients doing reads or writes and study the impact, as in one client writing and four clients reading simultaneously, and their start/stop times for IO, or bandwidth, and then one client reading and four clients writing simultaneously, and their individual IO bandwidths The objective really is to know how concurrent reads and writes impact Lustre performance. Does this data exist, or would someone need to go do and collect this data? Thanks, Dan ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] multihomed OST's configuration
Hi Mario, Lustre will, if not instructed otherwise, bind to all available NICs on the system. I've used Lustre extensively with LACP aggregate groups, and it performs quite well. Configuring multiple NICs from the same host into the same VLAN is something of a non-sensical configuration unless you're running some kind of bizarre failover scenario, but if they're all going to the same switch, that's an impossibility. This kind of configuration would also make ordinary TCP/IP routing somewhat funky. Use NIC bonding, and configure your switch as appropriate to do likewise. Cisco, Foundry, Extreme, Juniper, Alcatel, Netgear and a number of others all support LACP in their L3 edge switches, and it's a standard feature of any core switch. Once you've set up the switch and the OS, instruct Lustre to use the bond by putting options lnet networks=tcp(bond0) in your /etc/modprobe.conf and it will take care of the rest. cheers, Klaus On 7/9/08 5:07 AM, mdavid [EMAIL PROTECTED]did etch on stone tablets: hi Brian I was mislead by what it says in the ops manual, 12.1 chapter Lustre can use multiple NICs without bonding. There is a difference in performance when Lustre uses multiple NICs versus when it uses bonding NICs. though here it says multiple NICS not multihomed configurations. Anyway I still don't know how to configure multiple NICS both from the point of view of the OS and Lustre note all the ethXX are in the same LAN, and connected to the same card in the switch if on the Lustre OST's I put options lnet networks=tcp(eth0,eth1,eth2,eth3) how is it configured each ethX in principle I would have a single IP for the server cheers Mario David On Jul 8, 1:25 pm, Brian J. Murrell [EMAIL PROTECTED] wrote: On Mon, 2008-07-07 at 03:13 -0700, mdavid wrote: hi list I am a new to lustre (1 week old) and this list. I have some Dell PE1950 servers with MD1000 enclosures (scientific linux 5 == RHEL5 x86_54) on them and lustre 1.6.5, with lustre patched kernels on them on a first try (indeed it was the second), I managed to have a lustre up and running OK, now each dell server has 4 times 1Gb interfaces, and I want to take profit from them all either I try bonding them, or go for multihomed (which is my first try) If what you want is to get the bandwidth of all 4 interfaces to the Lustre servers then you really do want bonding. Can you explain why you think you want multihoming vs. bonding? Maybe I'm misunderstanding your goal. b. signature.asc 1KDownload ___ Lustre-discuss mailing list [EMAIL PROTECTED]://lists.lustre.org/mailman/listinfo/lustr e-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Correct way to unmount a lustre client?
Hello Hans-Juergen, I usually try a combination of searching the process table for any running tasks that are blocking the umount request and killing them, then doing a 'umount -k', and then using 'lctl modules |awk '{print $2}' |xargs rmmod -v' to deactivate the kernel modules. This last step sometimes leads to complaints that the network module is busy, so you have to tell it to shut down LNET in order to completely unload the remaining modules. However, I don't find this is a problem very frequently with my nodes ... maybe you've got an uncooperative application that needs to be looked at? cheers, Klaus On 7/8/08 9:30 AM, Hans-Juergen Schnitzer [EMAIL PROTECTED]did etch on stone tablets: Hello, what is the correct way to unmount a lustre client when a simple umount filesystem responds with device is busy? A umount -l filesystem does unmount the filesystem however when I reboot the machine subsequently the shutdown process hangs. The last messages on console are: LustreError: 131-3: Received notification of device removal Please shutdown LNET to allow this to proceed Best regards, Hans Schnitzer ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Client and OST on Same Blade
You mean ... you have a FUSE client and a FUSE server for the same file system running on the same node? Klaus On 7/8/08 11:38 AM, Kevin Fox [EMAIL PROTECTED]did etch on stone tablets: FUSE did it some how, didn't they? On Mon, 2008-07-07 at 14:12 -0700, Brian J. Murrell wrote: On Mon, 2008-07-07 at 17:00 -0400, Roger Spellman wrote: Is there any problem having a Client and OST on the same blade? If by same blade you mean on the same kernel sharing the same memory pool, yes, the problems that there were still are. They are inherent problems in which the client and OST share the same memory pool and an effort to relieve memory pressure (by the client) requires memory be available to the OST. Of course if the client is experiencing memory pressure so is the OST and the OST might not get the memory it needs to help the client get the memory it needs since it's all one pool of memory. Indeed it's a deadlock. I've discussed this with one of the more VM knowledgeable engineers than I and IIRC his feeling that here really is no fool-proof fix for this. Perhaps somebody more expert in the VM wants to explain further. b. plain text document attachment (ATT1031826.txt), ATT1031826.txt ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Failover Setup MDS/MDT
Hello Heiko, If I'm not mistaken, 'MDS' refers to the metadata _server_, while 'MDT' refers to the metadata _target_, i.e. the distinction is akin to that between 'OSS' and 'OST'. The MDS is a server node; the MDT is the volume where all the metadata for your volume is stored. The handbook recommends an MDT size about 1-2% of the size of your total volume size, i.e. if your total CFS volume is 10 TB, the MDT would be about 200 GB. This is fairly conservative, so you may want to err on the side of growth by using a larger volume than that. If you can spare the disk, you're certainly not sacrificing anything by over-provisioning your MDS. hope this helps, Klaus On 6/25/08 5:29 AM, Heiko Schroeter [EMAIL PROTECTED]did etch on stone tablets: Am Mittwoch, 25. Juni 2008 14:19:11 schrieb Brian J. Murrell: On Wed, 2008-06-25 at 07:36 +0200, Heiko Schroeter wrote: How can one determine the size for the MDT partition or is that the same as the MDS device ? (As far as i can see the MDT takes the DIR info etc. So it should be larger than the MDS.) An MDT is the device (i.e. the disk) that Lustre in an MDS (the server) uses to manage the metadata. Maybe that clears it up? Well yes ok, but what about the sizes of the partitions ? The docs present an example calculating the inode space needed on an 'MDS'. (3.2.2 Calculating MDS Size) That what actually confuses me a bit. So when the MDS partitions holds the inodes of the lustre system what will be the partition size of the MDT device ? Or should it read 'MDT' partition size in the docs and the MDS partition size doesn't matter at all ? Thanks and Regards Heiko ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Confusion with failover
Hopefully this isn't a stupid question ... but have you considered using Lustre 1.6 instead, if it's an option? It's much, much easier to work with. As for failover itself ... Lustre provides redundancy in the design, i.e. you can have a secondary MDS that comes online when the primary has failed, and OSSes can assume control of one another's OSTs in case of host failure, but it does not implement any of this functionality. You'll have to use something like Linux-HA to get your server nodes to gracefully resume service for failed nodes. Again, I'd strongly recommend using 1.6, as the latest version of Linux-HA has native support for Lustre, which can use standard Linux 'mount' commands. If you go with 1.4, you'll have to noodle with XML and scripts (which will still work) which is more of a hassle, while 1.6 just works. cheers, Klaus On 6/25/08 11:09 PM, Dhruv [EMAIL PROTECTED]did etch on stone tablets: Hello Everybody, I am a novice in using Lustre. Wanted some help. I am using luster 1.4.5.1 on RHEL4 update2 with kernel 2.6.9-22. Am facing some problems. Case1: I tried the ost failover . Following is the config file to generate xml file. rm -f failover_2node.xml ./lmc -m failover_2node.xml --add net --node node-mds --nid sm01 -- nettype tcp ./lmc -m failover_2node.xml --add net --node node-ost1 --nid sm02 -- nettype tcp ./lmc -m failover_2node.xml --add net --node node-ost2 --nid sm06 -- nettype tcp ./lmc -m failover_2node.xml --add net --node client --nid '*' -- nettype tcp # Cofigure MDS ./lmc -m failover_2node.xml --add mds --node node-mds --mds mds_test -- fstype ldiskfs --dev /dev/sdb5 # Cofigure LOV ./lmc -m failover_2node.xml --add lov --lov lov_test --mds mds_test -- stripe_sz 1048576 --stripe_cnt 2 --stripe_pattern 0 # Configures OSTs ./lmc -m failover_2node.xml --add ost --node node-ost1 --lov lov_test --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb1 ./lmc -m failover_2node.xml --add ost --node node-ost2 --lov lov_test --ost ost1 --failover --fstype ldiskfs --dev /dev/sdb7 # Configure client (this is a 'generic' client used for all client mounts) ./lmc -m failover_2node.xml --add mtpt --node client --path /mnt/ lustre --mds mds_test --lov lov_test . Following were my lconf commands. 1. lconf --reformat --node node-ost1 failover_2node.xml on sm02 2. lconf --reformat --node node-ost2 --service=ost1 failover_2node.xml on sm06 3. lconf --reformat --node node-mds failover_2node.xml on sm01 4. lconf --node client failover_2node.xml ... on sm02 and sm06 So my intention is to keep a failover ost node incase one fails. MDS is on a seperate node. I tried different scenarios where one ost goes down and still data can be retrieved from other. New files can be created and old can be deleted on the failover ost. Data was available most of time. So my question is whether Linux HA required to configure such failover scenario? Case2: I tried with same sort of formula as shown above for a failover MDS. But when the main MDS fails, it doesnt switches to new MDS. Also when the main MDS comes up again, the file system doesnt recover. I brought down the client and again brought up. Then it was working. So is Linux HA or similar program necessary for configuring failover?? Dhruv ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MGS disk size and activity
I don't think it would be much, such that could it share spindles with the journal for the MDS file system? Hrm. Given it's relatively low use, I'd think that would be fine. I have a question ... if the MGS is used so infrequently relative to the use of the MDS, why is it (is it?) problematic to locate it on the same volume as the MDT? thanks, Klaus ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Rule of thumb for setting up lustre resources...
Hi Mark, See my comments inline below. cheers, Klaus On 6/14/08 11:22 AM, Mark True [EMAIL PROTECTED]did etch on stone tablets: Hello! I am new to the list, but I have been researching Lustre for quite some time and finally have an occasion to use it. I am trying to do some capacity planning and I am wondering if there are some general rules of thumb for configuring a Lustre environment. Specifically: A If increasing the number of OSTs increases throughput, is there a relationship that can be used to determine how many OSTs we're likely to need at the outset to establish a baseline minimum throughput. For examples, if I want to get 3GB sustained throughput how many OSTs will facilitate this. B Does the MGS and MDS have to be separate for best performance, or can they be consolidated into one server without causing too much hardship C Right now I am looking at a model where I am connecting all the OSTs, and the MDS/MGS together using infiniband, and connecting the storage via fibrechannel. Is this the ideal solution or am I going in the wrong direction. This is a good solution, and will give you good performance overall, although you can mix different storage technologies and network technologies within the same storage environment and it should remain relatively transparent. I've got a cluster that handles both FC storage and iSCSI storage, but I know there are people out there using DRBD, and I'm dying to try Infiniband-based storage as well. Anything that presents a block device to an OSS should be suitable for use with Lustre, but some will perform better than others. Bottom line, I think, is pick the best technology for your price range and performance needs. Infiniband + FC is pretty much the top of the mountain, though. D Just wondering what clustering software people use on the front end with Lustre typically, if they are going to be using this as a filesystem for some kind of HPC environment, what is the most popular clustering technology for this. Our CFS clusters are all organized as part of ROCKS clusters. I know a number of people on this list are on the ROCKS list, so there's good cross-pollination between technologies. It's a mature cluster architecture designed for HPC, and bundles a number of useful solutions and tools onboard (MPI, SGE, Torque, distributed compilers, visualization, etc.). It's also relatively easy to integrate with Lustre, as you can simply drop in the pre-built Lustre RPMs into the cluster installer and be ready to go in a few minutes. E Does Heartbeat install next to whatever HPC clustering technology you have? I'm using Linux-HA, and it wasn't built into my cluster software distro, but it was easy enough to drop into the mix, and as of late last year had native disk support for Lustre file systems. Thanks, and I hope that I can soon be someone who contributes rather than just asking questions :) --Mark T. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] CIFS
Yes, if you export the CFS volume from one of the client nodes using Samba, you¹ll be able to access it via CIFS from a Windows client. Performance will be well below customary Lustre performance, though, so don¹t expect miracles. :-) cheers, Klaus On 6/11/08 12:50 PM, [EMAIL PROTECTED] [EMAIL PROTECTED]did etch on stone tablets: Hi all, Can the windows clients access to the data under the lustre file system via CIFS? Thank you. Regards, Harutyun ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre file system with failover
Hi Trupti, It depends on how your failover is implemented. The bottom line is that if you have a transaction in-flight when your MDT is disconnected, all new transactions will block until queued and in-flight transactions either complete or time out. If your failover window is a few seconds or less, you shouldn¹t notice more than a minor blip as the failover MDS recovers state for the downed MDS and finishes any transactions in progress. Your results will vary depending on the hardware you use and the settings of your failover; in one of my clusters, my failover window is actually quite long (about a minute and a half) due to the way the storage is implemented (the FC buses spend considerable time polling each visible LUN looking for the Lustre ones and I never bothered with device multi-pathing), but my transactions complete as expected once the MDT becomes active on the failover MDS. cheers, Klaus On 6/11/08 5:07 AM, trupti shete [EMAIL PROTECTED]did etch on stone tablets: Hi I am very new to lustre file system. I want to know how can I test whether the failover is working or not? I am having the following scenario of lustre file system-- MGS- /dev/sdb on node1 and failnode node2 MDT- /dev/sdc on node1 and failnode node2 (I am using iSCSI for the sharing discs /dev/sdb /dev/sdc between node1 node2) OST1- /dev/sda1 on node2 OST2- /dev/sda2 on node2 And there are 3 clients. If client1 opens a file to write and at that time if I umount the MDT (which will be mounted on node1) will node2 take care of? Will client1 experience any difference? -Trupti Explore your hobbies and interests. Click here to begin. http://in.rd.yahoo.com/tagline_groups_6/*http://in.promos.yahoo.com/groups/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre and multi path
Hi Brock, I've got a Sun StorageTek array hooked up to one of our clusters, and I'm using labels instead of multi-pathing. We've got it hooked up in a similar fashion as Stuart; it's a bit slow and sloppy when initializing, but it works well enough and there are no problems once OSTs are online. Klaus On 6/5/08 3:57 PM, Brock Palen [EMAIL PROTECTED]did etch on stone tablets: Our new lustre hardware arrived from sun today. Looking at the duel MDS and FC disk array for it. We will need multipath. Has anyone ever used multipath with lustre? Is there any issues? If we set up regular multipath via LVM lustre won't care as far as I can tell and browsing archives. What about multipath without LVM? Our StorageTek array has dual controllers with dual ports going to dual port FC cards in the MDS's. Each MDS has a connection to both controllers so we will need multipath to get any advantage to this. Comments? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre and multi path
Hi Brock, Yeah, that's likely to be an issue if each host has more than one path ... What about using HA to force one path to be inactive at the device level? I know QLogic FC cards support this functionality, although it requires changing the options used by the driver kernel module ... mind you, comparing that to a solution using the multipath daemon, that's six of one, half a dozen of the other, I'd think. Klaus On 6/5/08 5:09 PM, Brock Palen [EMAIL PROTECTED]did etch on stone tablets: This would be for the MDS/MGS only, but thats good to know. Problem is our two MDS servers (active/passive) will have two connections each to the same lun, so there could be issues. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Jun 5, 2008, at 7:52 PM, Klaus Steden wrote: Hi Brock, I've got a Sun StorageTek array hooked up to one of our clusters, and I'm using labels instead of multi-pathing. We've got it hooked up in a similar fashion as Stuart; it's a bit slow and sloppy when initializing, but it works well enough and there are no problems once OSTs are online. Klaus On 6/5/08 3:57 PM, Brock Palen [EMAIL PROTECTED]did etch on stone tablets: Our new lustre hardware arrived from sun today. Looking at the duel MDS and FC disk array for it. We will need multipath. Has anyone ever used multipath with lustre? Is there any issues? If we set up regular multipath via LVM lustre won't care as far as I can tell and browsing archives. What about multipath without LVM? Our StorageTek array has dual controllers with dual ports going to dual port FC cards in the MDS's. Each MDS has a connection to both controllers so we will need multipath to get any advantage to this. Comments? Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Planning a Lustre Quick Start Guide
Hi Brock, I'm experimenting with a Dell iSCSI array in one of our labs here ... so far, it behaves pretty typically for a Lustre, although the performance isn't blazing -- but that's due to limitations of the network infrastructure. I didn't notice anything in the iSCSI gear that would indicate that it couldn't do failover ... on my OSS nodes, I'm using disk labels rather than device IDs, and the OSTs are interchangeable on both OSSes. Just fyi, I don't know if there's value in it. I hadn't planned on testing failover with this config as it was mostly proof-of-concept of iSCSI Lustre for management, but I could make it happen at some point. Klaus On 5/23/08 8:53 AM, Brock Palen [EMAIL PROTECTED]did etch on stone tablets: This is not quite a HOWTO but has some interesting suggestions for HA and backup (even if I think that the only sensible way to backup a large Lustre storage pool is another Lustre storage pool): http://indico.cern.ch/contributionDisplay.py? contribId=24sessionId=12confId=27391 Interesting they are using DRBD, We thought about this, and there is a request about it in bugzilla, but nothing appears to have been done about it. I have used it before for NFS and LVM with Xen virtual machines without issue. We also asked sun about using an iSCSI array for the shared storage for failover with the MDT/MGS. We were told it had not been tested and to use FC inplace. Some what disapointed, FC more than doubled the cost of the MDT/MGS setup when you put in the FC adapters and the cabinet. We wanted to be safe though. On you replacing the thumper sata_mv driver with the one from sun, i hope this fixes the lacking performance. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Help reviving a 1.4.x volume with a destroyed OST
Hello there, We had a bit of an accident in one of our labs earlier today, and it effectively destroyed one of the OSTs in the Lustre file system. From what I can figure (I wasn't there at the time), one of the OSSes re-provisioned itself accidentally, and installed its OS information on one of the OSTs in the cluster. So now we've got a file system with 16 OSTs, one of which is actually a regular Linux OS install. We're not quite so worried about the data that's been lost, but it would be good to bring the file system back online with the hole in place to inspect it for damage, and then subsequently reformat the damaged piece and re-insert it into the existing file system. I've tried doing an 'lctl --inactive UUID config.xml' on the OSS in question, but it always errors out. I can't pull the UUID off the disk itself presumably because it was destroyed when the disk was rewritten. From the config.xml, the UUIDs all look pretty generic -- 'ost2_UUID', 'ost7_UUID', etc. -- but if I use 'blkid' on any of the corresponding LUNs, I get strings that resemble actual real-world UUIDs. Is there any place I can extract the previously-generated-and-now-sadly-destroyed UUID for the damaged OST? Is the generic-looking UUID field in the XML file an actual UUID? When it comes time to re-insert the OST in question back into the file system, is it simply a matter of adding it the same way as adding a new OST, or will I have to remove information about the previous OST if I want to replace it inline? I looked through the manual and Google fairly extensively, but I couldn't quite find the information I was looking for. Any help would be greatly appreciated! thanks, Klaus ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre with Juniper switches?
Hi there, Has anyone out in the Lustre community set theirs up in an environment that used Juniper switches? We're testing one out in the lab with ours, and something about its configuration isn't working. The same setup has been tested and operated successfully with Extreme, Cisco, and Alcatel, with no changes to the Lustre setup itself. I'm thinking it's more than likely not the Lustre setup, but something about the switch configuration - VLAN membership, IGMP config, etc. Is there a checklist of required network protocols/behaviours we can use to verify our switch configuration? cheers, Klaus ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] LMT 2.6.0
Hi Jim, Out of curiosity (and bear with me if this is a stupid question), do you know if anyone's integrated LMT with Ganglia? The clusters we have that use Lustre are all ROCKS-based, which uses Ganglia for monitoring ... I'm not familiar with Cerebro, so I'm not sure how you're using it, but it would be nice to be able to integrate LMT's statistics into an existing monitoring solution rather than implementing a second one. Other than that, I'm looking forward to screenshots -- this sounds like a nice add-on. cheers, Klaus On 4/9/08 8:59 AM, Jim Garlick [EMAIL PROTECTED]did etch on stone tablets: Hi Chris, I think Herb is reporting for jury duty today so I'll speak up and let him follow up later if I missed anything. There are text clients (ltop and lstat). I haven't tried running the GUI remotely and tunneling the X session, or running the client locally and tunneling mysql protocol, but I don't know of anything architectural that would prevent either from working. We'll get some screenshots out there. Herb was reticent to publish LMT without good documentation, but I encouraged him because there were a couple of people anxious to try it out. So the lack of screenshots/ documentation is my bad - we'll try to fix that as time permits. Regarding the dependencies: it's always a tradeoff between reinventing the wheel and adding dependencies. Sigh. In this case, we leveraged MySQL for the database of historical data; we managed to get a graphics group here which is a Java shop to code the clients, so they interface directly to MySQL using the Java MySQL bindings; and we leveraged the Cerebro multicast based monitoring tool for the data collection because it introduced no new dependencies on lustre servers at our site (since we already use it to monitor other things), and is very lightweight. I think this is a good architecture: the clients and data collection parts are independent with interfaces defined by the database schema, so they could be independently replaced, for example if someone wanted to write text clients in C/curses/MySQL, or replace the backend with some other data collection infrastructure that they are already using or feel is superior. Jim On Wed, Apr 09, 2008 at 08:47:39AM -0600, Chris Worley wrote: Are there any screenshots/FAQs? I looked at the package, and the depandancy list is long. As most of my work is remote, does the GUI run remotely over an SSH tunnel? Are there any similar console/text-based utilities w/ a shorter dependency list and lighter weight for remote access? Thanks, Chris On Mon, Apr 7, 2008 at 5:54 PM, Herb Wartens [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Hi All, We have a new and improved version of LMT released on Sourceforge. For those who have never used it, LMT is the Lustre Monitoring Tool developed at LLNL. It provides realtime monitoring of a Lustre filesystem (or multiple filesystems). It also graphs data over time for a set of attributes specified by the user. Please feel free to try it out. Below are the release notes. (please forgive this if it is a repost as it looks like lustre-discuss may have finally come back up) http://sourceforge.net/projects/lmt Release Notes for LMT 2.6.0 07 Apr 2008 * This version of LMT 2 has been tested to work properly with Lustre 1.6.x It is no longer necessary to keep the old lustre *.xml configuration and there are no more xwatch-lustre.conf or lmt.conf files to set up. * LMT now uses Cerebro to transport the data collected from the Lustre filesystem being monitored. For more info see: http://sourceforge.net/projects/cerebro * Building the rpms requires ibm-java since this is what we use in Chaos 4. This can be changed in the specfile to require Sun Java if that is preferred (the main issue is that we need a full fledged JVM that supports swing). * There is now a dependency on mysql to store the collected data. Since the resolution of the collection is ~5 seconds this can quickly consume a lot of space. The clients and servers are completely decoupled. * Please refer to the documentation for instructions on how to install and configure LMT 2. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFH+rQhP/62XqEEbMYRCqY6AJ9woGAmDo+lqEoOrr8fhSqGGL1aXwCg1Ml5 NwuPz3GIhOU0zAeZDrueQhA= =QWFx -END PGP SIGNATURE- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing
Re: [Lustre-discuss] lustre cross IP network routing
Hi Andrew, 1. No. 2. Not sure, check the Lustre manual for info on routing. Assuming TCP/IP can see the whole path, it should work once the configuration for Lustre is correct. On 3/7/08 12:55 PM, Lundgren, Andrew [EMAIL PROTECTED]did etch on stone tablets: Is there any restriction that lustre nodes on TCP must be on the same IP subnets? Is there anything special that needs to be done to make a client on one network see an MGS/OSSes on another network? Thanks! -- Andrew ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] configuration question 1.6.4; multiple NICs on OSS
Hi Jim, I use bonding in one of our configurations here (LACP-based, to an Extreme Summit series switch), and the overhead is not bad. My best performance test so far provided about 340-350 MB/s sustained read performance across two OSS nodes, each with two GigE striped together using LACP for a total of 4 GigE from the file system. Single link performance with the same equipment was about 200 MB/s (a single NIC on each OSS), so for me, the overhead of LACP is worth it, since the overall performance goes up significantly. With the right switch, you can get some pretty impressive results using plain ol' vanilla GigE. However ... that's just a suggestion. From my experience, in order to do what I think you want to do ... Each OSS would communicate on either eth0 or eth1, and thus its' LNET config would look like this in /etc/modprobe.conf: options lnet networks=tcp0(eth0),tcp1(eth1) On the client side, in order to take advantage of the split networking, your LNET config would look like this in /etc/modprobe.conf: options lnet networks=tcp0(eth0) or this: options lnet networks=tcp1(eth1) since with what you're attempting, Lustre will push all its traffic over the first available link in the case of multiple paths -- so if your clients were able to choose between one or the other, you'd simply saturate the tcp0 path and nothing would really happen on the tcp1 path. This gets to be a bit of a hassle to manage, as the administrator has to take a hand in the load balancing aspect, determining which clients use which LNET network. This can be handled relatively trivial with some modulo arithmetic in a Kickstart file (where you'd generate the LNET entries your client node would use), but really ... it's extra work and extra hassle. Using bonding on the OSSes, you would see balanced usage of all the participating NICs and respectable overall throughput, but you don't have to fool around with multiple LNETs or IP subnetting. That's just my two cents, and I'm happy to be proven wrong, but for my money (and labour), it is easier to implement Lustre using a solid NIC bonding framework than it was to attempt to split up multiple LNETs and keep it all sorted in my head and on paper. cheers, Klaus On 3/3/08 11:37 AM, Jim Albin [EMAIL PROTECTED]did etch on stone tablets: Hello, We're trying to see if we can use multiple NIC's on a pair of OSS's without bonding. Trying to decipher the Multi-Home example in the Operations Manual 1.6_v1.10 Chapter 7 and I must be missing something. I have not attempted bonding yet, the manual seems to suggest you can use multiple NIC's without bonding and avoid the overhead of bonding. We're looking for either failover or load balancing advantages over a single NIC in the OSS. Could someone please post an example of a configuration similar to this: mdt - eth0 only oss1,oss2 - eth0 eth1 client configuration If you could include the modprobe.conf entry, mount commands and anything else to try or verify with I'd appreciate it very much. Thanks in advance. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how do you mount mountconf (i.e. 1.6) lustre on your servers?
At this point, recovery of the primary server is a manual process -- so in the case of a failure, the secondary would assume service for the failed node, which gets powered off; an administrator is required to intervene to recover the primary. Klaus On 2/14/08 2:51 PM, Andreas Dilger [EMAIL PROTECTED]did etch on stone tablets: On Feb 14, 2008 11:17 -0800, Klaus Steden wrote: Here's a mount line from our first OSS node: LABEL=lustre-OST/mnt/lustreost0 lustre defaults0 0 LABEL=lustre-OST0001/mnt/lustreost1 lustre defaults,noauto 0 0 It has a partner, and the lines in that fstab swap the 'noauto' flag. Klaus, if you have the backup node mounting the filesystem because of primary server failure, how do you prevent the primary server from mounting the filesystem again as soon as it boots? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Off topic -- official list address?
Hi there, Just a quick question ... what is the official sending address of this list? I'm trying to write filter rules to automatically route messages from it into my Lustre-list folder ... but some messages are sent by [EMAIL PROTECTED], while others come in from [EMAIL PROTECTED] Is there an address that is consistently present in list mail that I can use as a filter parameter? I'm using Outlook, which is impossibly lame compared to something like procmail, so the more obvious a parameter, the better ... thanks, Klaus ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre process/system tuning ...
Hello, I'm seeing some interesting behaviour from one of the nodes in our cluster when two applications attempt to read from the Lustre. Specifically, one application is a real time video player, and the other is just interacting with the file system conventionally ... but if the player is running, and another process attempts to walk the same directory, a dozen or more ldlm kernel threads start, and the machine's CPU load skyrockets (up to 40 or 50, sometimes). Are there hooks to limit the number of ldlm threads that get launched, or ways to lower their priority, or raise the player's priority so that it maintains its real time performance? This is for Lustre 1.4.7. thanks, Klaus ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss