[Lustre-discuss] Information regarding the FILE HANDLE
Hi, I was wondering in which part/structure of the code the file handle information is present in Lustre. Can somebody please point me to it ? I tried looking up in the code but couldn't exactly make it out. Is File handle part of EA i.e. the extended attributes ? Thanks, Vilobh ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] mv_sata module for rhel5 and write through patch
We are (finally) updating our x4500's to rhel5 and luster 1.8.5 from rhel4 and 1.6.7 On rhel4 we had used the patch from: https://bugzilla.lustre.org/show_bug.cgi?id=14040 for the mv_sata module. Is this still recommended on rhel5? To use the mv_sata module over the stock redhat sata_mv as well as applying this patch? That patch is quite old is there a newer one? What are other x4500/thumper users running? Also I will do some digging on the list but why is lustre 2.0 not the 'production' version? We are planning on 1.8.x for now but if 2.0 is stable we would install that one. Can we upgrade directly from 1.6 to 2.0 if we did this? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mv_sata module for rhel5 and write through patch
Brock Palen wrote: We are (finally) updating our x4500's to rhel5 and luster 1.8.5 from rhel4 and 1.6.7 On rhel4 we had used the patch from: https://bugzilla.lustre.org/show_bug.cgi?id=14040 for the mv_sata module. Is this still recommended on rhel5? To use the mv_sata module over the stock redhat sata_mv as well as applying this patch? That patch is quite old is there a newer one? I don't know: the last I heard was that the upcoming rhel 5.3 was to have an in-tree Marvell driver that worked. If your system is still under support, I'd contact Oracle support for information about running RHEL5 on the x4500. You do want to ensure the write-back cache is disabled on the drive, but you may be able to do that with udev scripts. See Bug 17462 for an example for the J4400. What are other x4500/thumper users running? Also I will do some digging on the list but why is lustre 2.0 not the 'production' version? We are planning on 1.8.x for now but if 2.0 is stable we would install that one. Lustre 2.0 is not being widely used, and would not be covered by an Oracle support contract. It is strongly recommended to run production systems on 1.8.x rather than 2.0. If you really want to try Lustre 2.x, you will want to use something newer than 2.0: maybe check with lustre...@googlegroups.com for the current status of the whamcloud git repository? Can we upgrade directly from 1.6 to 2.0 if we did this? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] mount mdt/mgs - file exists -17
Hi, After my MDS crashed I was unable to mount the mdt/mgs. The dmesg output is below. I'm unable to remove lustre modules (lustre_rmmod) and it's listed under /proc/fs/lustre/devices but not mounted. Rebooting the system to try again results in a kernel panic. Upon reset I ran fsck which revealed no problems so I tried a --writeconf and deleted CATALOGS but still received -17 and was unable to reboot clean or unload modules. Fortunately this is my test system but I'd like to understand what happened! Running Lustre 1.8.5 on RHEL 5.5. cat /proc/fs/lustre/devices 7 AT osc test-OST-osc test-mdtlov_UUID 1 Lustre: MGS MGS started Lustre: MGC192.168.5.100@o2ib: Reactivating import Lustre: MGC192.168.5.100@o2ib: Reactivating import Lustre: Enabling user_xattr Lustre: test-MDT: Now serving test-MDT on /dev/sda1 with recovery enabled Lustre: 5590:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) test-MDT: group upcall set to /usr/sbin/l_getgroups Lustre: test-MDT.mdt: set parameter group_upcall=/usr/sbin/l_getgroups LustreError: 5590:0:(ldlm_lib.c:331:client_obd_setup()) can't add initial connection LustreError: 5590:0:(obd_config.c:372:class_setup()) setup test-OST-osc failed (-2) LustreError: 5590:0:(obd_config.c:1199:class_config_llog_handler()) Err -2 on cfg command: Lustre:cmd=cf003 0:test-OST-osc 1:test-OST_UUID 2:128.174.5.100@tcp LustreError: 15c-8: MGC192.168.5.100@o2ib: The configuration from log 'test-MDT' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 5453:0:(obd_mount.c:1126:server_start_targets()) failed to start server test-MDT: -2 LustreError: 5453:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -2 Lustre: Failing over test-MDT Lustre: Failing over test-mdtlov Lustre: test-MDT: shutting down for failover; client state will be preserved. Lustre: MDT test-MDT has stopped. Lustre: MGS has stopped. Lustre: server umount test-MDT complete LustreError: 5453:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-2) Thanks, Dan ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] df and du difference on lustre fs
Greetings, I've got such strange situation: We've installed lustre-client(2.0.0) and xen(4.0) on Debian squeeze. Then we loaded img files with virtual machine images on lustre (images were striped on 2 OSTs). VM started but then crashed. After that I tried to remove img files from lustre. Client hangs and xen server hangs too. After that I removed img files from another lustre client without any problems and reboot xen server. When I make df -h it seems that img files are on the lustre, but ls command doesn't show deleted files and du -sm shows correct used space on lustre. df -h /lustre/ FilesystemSize Used Avail Use% Mounted on 10.165.175.201@tcp0:/lihep 5.4T 62G 5.1T 2% /lustre du -sm /lustre/ 41630/lustre/ On OSTs : # df -h FilesystemSize Used Avail Use% Mounted on ... /dev/md10 2.7T 32G 2.6T 2% /ost1 # df -h FilesystemSize Used Avail Use% Mounted on ... /dev/md13 2.7T 30G 2.6T 2% /ost2 How can I fix it ? With regards, Ekaterina Popova ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Poor multithreaded I/O performance
Ok I ran the following tests: [1] Application spawns 8 threads. I write to Lustre having 8 OSTs. Each thread writes data in blocks of 1 Mbyte in a round robin fashion, i.e. T0 writes to offsets 0, 8MB, 16MB, etc. T1 writes to offsets 1MB, 9MB, 17MB, etc. The stripe size being 1MByte, every thread ends up writing to only 1 OST. I see a bandwidth of 280 Mbytes/sec, similar to the single thread performance. [2] I also ran the same test such that every thread writes data in blocks of 8 Mbytes for the same stripe size. (Thus, every thread will write to every OST). I still get similar performance, ~280Mbytes/sec, so essentially I see no difference between each thread writing to a single OST vs each thread writing to all OSTs. And as I said before, if all threads write to their own separate file, the resulting bandwidth is ~700Mbytes/sec. I have attached my C file (simple_io_test.c) herewith. Maybe you could run it and see where the bottleneck is. Comments and instructions for compilation have been included in the file. Do let me know if you need any clarification on that. Your help is appreciated, Kshitij This is what my application does: Each thread has its own file descriptor to the file. I use pwrite to ensure non-overlapping regions, as follows: Thread 0, data_size: 1MB, offset: 0 Thread 1, data_size: 1MB, offset: 1MB Thread 2, data_size: 1MB, offset: 2MB Thread 3, data_size: 1MB, offset: 3MB repeat cycle Thread 0, data_size: 1MB, offset: 4MB and so on (This happens in parallel, I dont wait for one cycle to end before the next one begins). I am gonna try the following: a) Instead of a round-robin distribution of offsets, test with sequential offsets: Thread 0, data_size: 1MB, offset:0 Thread 0, data_size: 1MB, offset:1MB Thread 0, data_size: 1MB, offset:2MB Thread 0, data_size: 1MB, offset:3MB Thread 1, data_size: 1MB, offset:4MB and so on. (I am gonna keep these separate pwrite I/O requests instead of merging them or using writev) b) Map the threads to the no. of OSTs using some modulo, as suggested in the email below. c) Experiment with fewer no. of OSTs (I currently have 48). I shall report back with my findings. Thanks, Kshitij [Moved to Lustre-discuss] However, if I spawn 8 threads such that all of them write to the same file (non-overlapping locations), without explicitly synchronizing the writes (i.e. I dont lock the file handle) How exactly does your multi-threaded application write the data? Are you using pwrite to ensure non-overlapping regions or are they all just doing unlocked write() operations on the same fd to each write (each just transferring size/8)? If it divides the file into N pieces, and each thread does pwrite on its piece, then what each OST sees are multiple streams at wide offsets to the same object, which could impact performance. If on the other hand the file is written sequentially, where each thread grabs the next piece to be written (locking normally used for the current_offset value, so you know where each chunk is actually going), then you get a more sequential pattern at the OST. If the number of threads maps to the number of OSTs (or some modulo, like in your case 6 OSTs per thread), and each thread owns the piece of the file that belongs to an OST (ie: for (offset = thread_num * 6MB; offset size; offset += 48MB) pwrite(fd, buf, 6MB, offset); ), then you've eliminated the need for application locks (assuming the use of pwrite) and ensured each OST object is being written sequentially. It's quite possible there is some bottleneck on the shared fd. So perhaps the question is not why you aren't scaling with more threads, but why the single file is not able to saturate the client, or why the file BW is not scaling with more OSTs. It is somewhat common for multiple processes (on different nodes) to write non-overlapping regions of the same file; does performance improve if each thread opens its own file descriptor? Kevin Wojciech Turek wrote: Ok so it looks like you have in total 64 OSTs and your output file is striped across 48 of them. May I suggest that you limit number of stripes, lets say a good number to start with would be 8 stripes and also for best results use OST pools feature to arrange that each stripe goes to OST owned by different OSS. regards, Wojciech On 23 May 2011 23:09, kme...@cs.uh.edu mailto:kme...@cs.uh.edu wrote: Actually, 'lfs check servers' returns 64 entries as well, so I presume the system documentation is out of date. Again, I am sorry the basic information had been incorrect. - Kshitij Run lfs getstripe your_output_file and paste the output of that command to the mailing list. Stripe count of 48 is not possible if you have max 11 OSTs (the max stripe count will be 11) If your striping is correct, the bottleneck can be your client network. regards,
Re: [Lustre-discuss] df and du difference on lustre fs
Hello! On May 26, 2011, at 8:19 AM, Ekaterina Popova wrote: We've installed lustre-client(2.0.0) and xen(4.0) on Debian squeeze. Then we loaded img files with virtual machine images on lustre (images were striped on 2 OSTs). VM started but then crashed. After that I tried to remove img files from lustre. Client hangs and xen server hangs too. kernel messages from the hung instance would be interesting to see. After that I removed img files from another lustre client without any problems and reboot xen server. When I make df -h it seems that img files are on the lustre, but ls command doesn't show deleted files and du -sm shows correct used space on lustre. I assume that the hung client was holding the file open, then when you unlinked the files from another client, only name got removed on MDS and objects on OSTs remained. # df -h FilesystemSize Used Avail Use% Mounted on ... /dev/md13 2.7T 30G 2.6T 2% /ost2 How can I fix it ? Since you already killed the client holding the file open once MDS registers the death of that client due to ping timeout, it would close the descriptor and would proceed with destroying the now orphan objects. Alternatively next time your MDS restarts such orphaned objects should also be destroyed. Bye, Oleg -- Oleg Drokin Senior Software Engineer Whamcloud, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Information regarding the FILE HANDLE
Hello! On May 26, 2011, at 3:13 AM, vilobh meshram wrote: I was wondering in which part/structure of the code the file handle information is present in Lustre. Can somebody please point me to it ? I tried looking up in the code but couldn't exactly make it out. Is File handle part of EA i.e. the extended attributes ? There are multiple things that could be called file handle, so it would be great if you explain a little bit about what is it you are actually looking for. Bye, Oleg -- Oleg Drokin Senior Software Engineer Whamcloud, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mount mdt/mgs - file exists -17
On Thu, May 26, 2011 at 10:27:26AM -0700, Dan wrote: Lustre: MGS MGS started Lustre: MGC192.168.5.100@o2ib: Reactivating import Lustre: MGC192.168.5.100@o2ib: Reactivating import So you use infiniband ... [...] LustreError: 5590:0:(ldlm_lib.c:331:client_obd_setup()) can't add initial connection LustreError: 5590:0:(obd_config.c:372:class_setup()) setup test-OST-osc failed (-2) LustreError: 5590:0:(obd_config.c:1199:class_config_llog_handler()) Err -2 on cfg command: Lustre:cmd=cf003 0:test-OST-osc 1:test-OST_UUID 2:128.174.5.100@tcp But a tcp nid is registered for OST. Is this intended? If so, have you configured lnet on the MDS to use tcp? Cheers, Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Information regarding the FILE HANDLE
Hi Oleg, Thanks for the reply. What is equivalent of the NFS File handle in Lustre ? Can you give me example of few of the multiple things that could be called as a File handle. Thanks, Vilobh On Thu, May 26, 2011 at 4:30 PM, Oleg Drokin gr...@whamcloud.com wrote: Hello! On May 26, 2011, at 3:13 AM, vilobh meshram wrote: I was wondering in which part/structure of the code the file handle information is present in Lustre. Can somebody please point me to it ? I tried looking up in the code but couldn't exactly make it out. Is File handle part of EA i.e. the extended attributes ? There are multiple things that could be called file handle, so it would be great if you explain a little bit about what is it you are actually looking for. Bye, Oleg -- Oleg Drokin Senior Software Engineer Whamcloud, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Information regarding the FILE HANDLE
Hello! Well, the closest to nfs fh is probably lustre FID. (inode inum and generation in 1.8). Multiple things are: the open handle returned by MDS open, the FIDs, the nfs file handle constructed by lustre_nfs layer, the file descriptors returned from open(2), ... Bye, Oleg On May 26, 2011, at 7:31 PM, vilobh meshram wrote: Hi Oleg, Thanks for the reply. What is equivalent of the NFS File handle in Lustre ? Can you give me example of few of the multiple things that could be called as a File handle. Thanks, Vilobh On Thu, May 26, 2011 at 4:30 PM, Oleg Drokin gr...@whamcloud.com wrote: Hello! On May 26, 2011, at 3:13 AM, vilobh meshram wrote: I was wondering in which part/structure of the code the file handle information is present in Lustre. Can somebody please point me to it ? I tried looking up in the code but couldn't exactly make it out. Is File handle part of EA i.e. the extended attributes ? There are multiple things that could be called file handle, so it would be great if you explain a little bit about what is it you are actually looking for. Bye, Oleg -- Oleg Drokin Senior Software Engineer Whamcloud, Inc. -- Oleg Drokin Senior Software Engineer Whamcloud, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss