[Lustre-discuss] RAID Stripe alignment
Hello, I read this thread http://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg07791.html and the Lustre Operations manual 10.1 Considerations for Backend Storage in order to determine the best performing setup for our OSS hardware. HP DL180 G6 - CentOS 5.5 and Lustre 1.8.4 - HP Smart Array P410 controller (512 MB cache, 25% Read / 75% Write) - 600 GB SAS drives The stripesizes (in HP P410 terminology is the amount of data read/written to each disk) available on the P410 array controllers are 8,16,32,64,128,256 Two of the scenarios that we tested are: 1) 1 x 12 disk RAID6 lun chunksize = 1024 / 10 disks = 102.4 = use stripesize=64 - not optimally aligned but maximum space usage - setup on oss[2-4] - sgpdd_survey results: http://www.sharcnet.ca/~kaizaad/orcafs/unaligned.html 2) 1 x 10 disk RAID6 lun chunksize = 1024 / 8 = 128 = use stripesize=128 - optimally aligned but at the sacrifice of 2 disks of space - setup on oss[8-10] - sgpdd_survey results: http://www.sharcnet.ca/~kaizaad/orcafs/aligned.html In our cases, the graphs seem to indicate that the the underlying RAID alignment setup doesn't matter much, which is totally counter intuitive to the recommendations by the Lustre list and manual. Is there something we are missing here? Maybe I misunderstood the recommendations? Or are we just bottlenecking on a component in the setup so a proper RAID alignment doesn't show as being beneficial? Any insight is appreciated. thanks -k ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] manual OST failover for maintenance work?
Hi. We have pairs of OSS nodes hooked up to shared storage arrays containing OSTs but we have not enabled any failover settings yet. Now we need to perform maintenance work on an OSS and we would like to minimize Lustre downtime. Can I use tunefs.lustre to specify the OSS failover NID for an existing OST? I assume i'll have to take the OST offline to make this change. Will clients that have Lustre mounted pick up this change or will all clients have to remount? I should mention that we are running Lustre 1.8.2. --- Yemi ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] RAID Stripe alignment
Hello Andreas, Thanks for your reply. On Mon, 6 Dec 2010, Andreas Dilger wrote: Is your write cache persistent? Yes. It is 512 MB battery backed. One major factor in having Lustre read and write alignment is that any misaligned write will cause read-modify-write, and misaligned reads will cause 2x reads if the RAID layer is doing parity verification. If your RAID layer is hiding this overhead via cache, you need to be absolutely sure that it is safe in case of crashes and failover of either or both the OSS and RAID controller. The HP Smart Array P410 controller also has this setting called Accelerator Ratio which determines the amount of cache devoted to either reads or writes. Currently it is set (default) as follows: Accelerator Ratio: 25% Read / 75% Write We can try setting it to one extreme and the other to see what difference it makes. This Lustre system is going to be used as /scratch for a broad range of HPC code with diverse requirements (large files, small files, many files, mostly reading, mostly writing) so I don't know how much we can tune this cache setting to help specific access patterns at the detriment of others, we are just looking for appropriate middle ground here. But for thread completeness, I post the sgpdd_survey results if there are any large differences in performance. Cheers, Andreas thanks a bunch -k ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] OST errors caused by residual client info?
Greetings.. Is it possible that the below error can be derived from a client that has not been rebooted or had lustre kernel mods reloaded during a time when a few test file systems were built and mounted? LustreError: 12967:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (-19) r...@81032dd2d000 x1348952525350751/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1291669076 ref 1 fl Interpret:/0/0 rc -19/0 LustreError: 12967:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 55 previous similar messages LustreError: 137-5: UUID 'fs-OST0058_UUID' is not available for connect (no target) Normally this would be a back end storage issue. In this case, the oss where this error is logged doesn't have an ost OST0058. It has an ost OST006d. Regardless of the ost name, the backend raid is healthy with no hardware errors. No other h/w errors present on the oss node (e.g.: mce, panic, ib/enet failures, etc). Previous test incarnations of this filesystem were built where ost name was not assigned (e.g.: OST) and was assigned upon first mount and connection to the mds. Is it possible that some clients have residual pointers or config data about the previously built file systems? Thanks! --Jeff ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST errors caused by residual client info?
Hello! On Dec 6, 2010, at 6:50 PM, Jeff Johnson wrote: Previous test incarnations of this filesystem were built where ost name was not assigned (e.g.: OST) and was assigned upon first mount and connection to the mds. Is it possible that some clients have residual pointers or config data about the previously built file systems? If you did not unmount clients from the previous incarnation of the filesystem, those clients would still continue to try to contact the servers they know about even after the servers themselves go away and are repurposed (since there is no way for the client to know about this). Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST errors caused by residual client info?
On 12/6/10 3:55 PM, Oleg Drokin wrote: Hello! On Dec 6, 2010, at 6:50 PM, Jeff Johnson wrote: Previous test incarnations of this filesystem were built where ost name was not assigned (e.g.: OST) and was assigned upon first mount and connection to the mds. Is it possible that some clients have residual pointers or config data about the previously built file systems? If you did not unmount clients from the previous incarnation of the filesystem, those clients would still continue to try to contact the servers they know about even after the servers themselves go away and are repurposed (since there is no way for the client to know about this). All clients were unmounted but the lustre kernel mods were never removed/reloaded nor were the clients rebooted. Is it odd that this error would occur naming an ost that is not present on that oss? Should an oss only report this error about its own ost devices? As I said, this particular oss where the error came from only has an OST006c and OST006d. It does not have an OST0058 although it may have back when the filesystem was made with a simple test csv that did not specifically give index numbers as part of the mkfs.lustre process. They were named later, randomly, when the osts were first mounted and connected to the mds. Do you think it is possible for a client to retain this information even though a umount/mount of the filesystem took place? --Jeff ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST errors caused by residual client info?
Hello! On Dec 6, 2010, at 7:05 PM, Jeff Johnson wrote: Previous test incarnations of this filesystem were built where ost name was not assigned (e.g.: OST) and was assigned upon first mount and connection to the mds. Is it possible that some clients have residual pointers or config data about the previously built file systems? If you did not unmount clients from the previous incarnation of the filesystem, those clients would still continue to try to contact the servers they know about even after the servers themselves go away and are repurposed (since there is no way for the client to know about this). All clients were unmounted but the lustre kernel mods were never removed/reloaded nor were the clients rebooted. If the clients were unmounted, then there is no information left in the kernel about those now vanished mountpoints. Is it odd that this error would occur naming an ost that is not present on that oss? Should an oss only report this error about its own ost devices? As I said, OSS would report such an error if a client contacted it trying to access an OST not present on this OSS. This could be because of a client containing some stale information about services because it was not unmounted from previous incarnation of the filesystem or it could be because there is an failover pair setup that names this OSS as a possible nid for a failover target. Do you think it is possible for a client to retain this information even though a umount/mount of the filesystem took place? If the clients unmounted cleanly, I don't think there is anywhere such info could be stored. You could go back to the clients sending these requests (identify them by error messages in the logs, they'd complain about error -19 connecting to OSTs) and see what's wrong with them, what do they have mounted and such. Bye, Oleg ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre Quotas
On 12/7/10 4:35 AM, Mark Nelson wrote: Hi Guys, Several years ago there was a thread discussing some of the problems with Lustre Quotas and what kinds of things might be done to move forward. I was wondering if/how things have improved since then? Any one have any thoughts/experiences they would be willing to share? Here's the thread from 2008: http://lists.lustre.org/pipermail/lustre-devel/2008-May/002451.html As I known, the progress is as following: * Changes required to quotas because of architecture changes * #1: Supporting quotas on HEAD (no CMD) It has been done and released in lustre-2.0. #2: Supporting quotas with CMD Some design only, not implement yet. #3: Supporting quotas with DMU Seems in processing. * Shortcomings of the current quota implementation * Unfortunately, these known quota issues on lustre are not overcame yet. Cheers, -- Nasf Thanks, Mark ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre Quotas
Mark Nelson wrote: Hi Guys, Several years ago there was a thread discussing some of the problems with Lustre Quotas and what kinds of things might be done to move forward. I was wondering if/how things have improved since then? Any one have any thoughts/experiences they would be willing to share? Here's the thread from 2008: http://lists.lustre.org/pipermail/lustre-devel/2008-May/002451.html Thanks, Mark As everyone knows, HEAD(2.0) already supports quota. It has been done by Fanyong. Currently, a redesign and reimplementation for porting quota to kdmu is going on. Its main tasks include: 1. support new osd api 2. build separate quota connections between quota master and quota slaves, instead of using ldlm reverse import. Certainly, I will do some optimization for quota code at the same time. Other issues will be handled in the future. landen ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Announce: Lustre 1.8.5 is available!
Hi all, Lustre 1.8.5 is available on the Oracle Download Center Site. http://www.oracle.com/technetwork/indexes/downloads/sun-az-index-095901.html#L The change log can be read here: http://wiki.lustre.org/index.php/Use:Change_Log_1.8 Here are some items that may interest you in this release: * Changes to support matrix - Kernel update: SLES11 SP1 - 2.6.32.19-0.2.1 https://bugzilla.lustre.org/show_bug.cgi?id=21610 - Kernel update: SLES10 SP3 - 2.6.16.60-0.69.1 https://bugzilla.lustre.org/show_bug.cgi?id=20744 * Significant Bugs - Fix a problem with atime which is not always updated https://bugzilla.lustre.org/show_bug.cgi?id=23766 - Fix an issue with file size which can be inconsistent between client nodes https://bugzilla.lustre.org/show_bug.cgi?id=23174 As always, you can report issues via Bugzilla: https://bugzilla.lustre.org/ Our next release is Lustre 2.1.0, expected in the next couple of months. The next 1.8.x release will be 1.8.6 and the schedule for this is to be determined. To access earlier releases of Lustre, please check the box See previous products(P), then click L or scroll down to Lustre, the current and all previous releases (1.8.0 - 1.8.5) will be displayed. Happy downloading! -- The Lustre Team -- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss