[Lustre-discuss] RAID Stripe alignment

2010-12-06 Thread Kaizaad Bilimorya
Hello,

I read this thread
http://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg07791.html

and the Lustre Operations manual 10.1 Considerations for Backend Storage

in order to determine the best performing setup for our OSS hardware.

HP DL180 G6
- CentOS 5.5 and Lustre 1.8.4
- HP Smart Array P410 controller (512 MB cache, 25% Read / 75% Write)
- 600 GB SAS drives

The stripesizes (in HP P410 terminology is the amount of data read/written 
to each disk) available on the P410 array controllers are 8,16,32,64,128,256

Two of the scenarios that we tested are:

1) 1 x 12 disk RAID6 lun
chunksize = 1024 / 10 disks = 102.4 = use stripesize=64
- not optimally aligned but maximum space usage
- setup on oss[2-4]
- sgpdd_survey results:
   http://www.sharcnet.ca/~kaizaad/orcafs/unaligned.html

2) 1 x 10 disk RAID6 lun
chunksize = 1024 / 8 = 128 = use stripesize=128
- optimally aligned but at the sacrifice of 2 disks of space
- setup on oss[8-10]
- sgpdd_survey results:
   http://www.sharcnet.ca/~kaizaad/orcafs/aligned.html


In our cases, the graphs seem to indicate that the the underlying RAID 
alignment setup doesn't matter much, which is totally counter intuitive to 
the recommendations by the Lustre list and manual.

Is there something we are missing here? Maybe I misunderstood the 
recommendations? Or are we just bottlenecking on a component in the setup 
so a proper RAID alignment doesn't show as being beneficial? Any insight 
is appreciated.

thanks
-k
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] manual OST failover for maintenance work?

2010-12-06 Thread Adeyemi Adesanya

Hi.

We have pairs of OSS nodes hooked up to shared storage arrays  
containing OSTs but we have not enabled any failover settings yet. Now  
we need to perform maintenance work on an OSS and we would like to  
minimize Lustre downtime. Can I use tunefs.lustre to specify the OSS  
failover NID for an existing OST? I assume i'll have to take the OST  
offline to make this change. Will clients that have Lustre mounted  
pick up this change or will all clients have to remount? I should  
mention that we are running Lustre 1.8.2.

---
Yemi
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] RAID Stripe alignment

2010-12-06 Thread Kaizaad Bilimorya
Hello Andreas,

Thanks for your reply.

On Mon, 6 Dec 2010, Andreas Dilger wrote:
 Is your write cache persistent?

Yes. It is 512 MB battery backed.

 One major factor in having Lustre read and write alignment is that any 
 misaligned write will cause read-modify-write, and misaligned reads will 
 cause 2x reads if the RAID layer is doing parity verification.

 If your RAID layer is hiding this overhead via cache, you need to be 
 absolutely sure that it is safe in case of crashes and failover of 
 either or both the OSS and RAID controller.

The HP Smart Array P410 controller also has this setting called 
Accelerator Ratio which determines the amount of cache devoted to either 
reads or writes. Currently it is set (default) as follows:

  Accelerator Ratio: 25% Read / 75% Write

We can try setting it to one extreme and the other to see what difference 
it makes. This Lustre system is going to be used as /scratch for a broad 
range of HPC code with diverse requirements (large files, small files, 
many files, mostly reading, mostly writing) so I don't know how much we 
can tune this cache setting to help specific access patterns at the 
detriment of others, we are just looking for appropriate middle ground 
here. But for thread completeness, I post the sgpdd_survey results if 
there are any large differences in performance.

 Cheers, Andreas

thanks a bunch
-k
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] OST errors caused by residual client info?

2010-12-06 Thread Jeff Johnson
Greetings..

Is it possible that the below error can be derived from a client that 
has not been rebooted or had lustre kernel mods reloaded during a time 
when a few test file systems were built and mounted?

LustreError: 12967:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing 
error (-19)  r...@81032dd2d000 x1348952525350751/t0 o8-?@?:0/0 lens 
368/0 e 0 to 0 dl 1291669076 ref 1 fl Interpret:/0/0 rc -19/0
LustreError: 12967:0:(ldlm_lib.c:1914:target_send_reply_msg()) Skipped 55 
previous similar messages
LustreError: 137-5: UUID 'fs-OST0058_UUID' is not available  for connect (no 
target)


Normally this would be a back end storage issue. In this case, the oss 
where this error is logged doesn't have an ost OST0058. It has an ost 
OST006d. Regardless of the ost name, the backend raid is healthy with 
no hardware errors. No other h/w errors present on the oss node (e.g.: 
mce, panic, ib/enet failures, etc).

Previous test incarnations of this filesystem were built where ost name 
was not assigned (e.g.: OST) and was assigned upon first mount and 
connection to the mds. Is it possible that some clients have residual 
pointers or config data about the previously built file systems?

Thanks!

--Jeff

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST errors caused by residual client info?

2010-12-06 Thread Oleg Drokin
Hello!

On Dec 6, 2010, at 6:50 PM, Jeff Johnson wrote:
 Previous test incarnations of this filesystem were built where ost name 
 was not assigned (e.g.: OST) and was assigned upon first mount and 
 connection to the mds. Is it possible that some clients have residual 
 pointers or config data about the previously built file systems?

If you did not unmount clients from the previous incarnation of the filesystem,
those clients would still continue to try to contact the servers they know 
about even
after the servers themselves go away and are repurposed (since there is no way 
for the
client to know about this).

Bye,
Oleg
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST errors caused by residual client info?

2010-12-06 Thread Jeff Johnson
On 12/6/10 3:55 PM, Oleg Drokin wrote:
 Hello!

 On Dec 6, 2010, at 6:50 PM, Jeff Johnson wrote:
 Previous test incarnations of this filesystem were built where ost name
 was not assigned (e.g.: OST) and was assigned upon first mount and
 connection to the mds. Is it possible that some clients have residual
 pointers or config data about the previously built file systems?
 If you did not unmount clients from the previous incarnation of the 
 filesystem,
 those clients would still continue to try to contact the servers they know 
 about even
 after the servers themselves go away and are repurposed (since there is no 
 way for the
 client to know about this).
All clients were unmounted but the lustre kernel mods were never 
removed/reloaded nor were the clients rebooted.

Is it odd that this error would occur naming an ost that is not present 
on that oss? Should an oss only report this error about its own ost 
devices? As I said, this particular oss where the error came from only 
has an OST006c and OST006d. It does not have an OST0058 although it may 
have back when the filesystem was made with a simple test csv that did 
not specifically give index numbers as part of the mkfs.lustre process. 
They were named later, randomly, when the osts were first mounted and 
connected to the mds.

Do you think it is possible for a client to retain this information even 
though a umount/mount of the filesystem took place?

--Jeff
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST errors caused by residual client info?

2010-12-06 Thread Oleg Drokin
Hello!

On Dec 6, 2010, at 7:05 PM, Jeff Johnson wrote:
 Previous test incarnations of this filesystem were built where ost name
 was not assigned (e.g.: OST) and was assigned upon first mount and
 connection to the mds. Is it possible that some clients have residual
 pointers or config data about the previously built file systems?
 If you did not unmount clients from the previous incarnation of the 
 filesystem,
 those clients would still continue to try to contact the servers they know 
 about even
 after the servers themselves go away and are repurposed (since there is no 
 way for the
 client to know about this).
 All clients were unmounted but the lustre kernel mods were never 
 removed/reloaded nor were the clients rebooted.

If the clients were unmounted, then there is no information left in the kernel 
about those now vanished mountpoints.

 Is it odd that this error would occur naming an ost that is not present on 
 that oss? Should an oss only report this error about its own ost devices? As 
 I said, 

OSS would report such an error if a client contacted it trying to access an OST 
not present on this OSS.
This could be because of a client containing some stale information about 
services because it was not unmounted from previous incarnation of the 
filesystem
or it could be because there is an failover pair setup that names this OSS as a 
possible nid for a failover target.

 Do you think it is possible for a client to retain this information even 
 though a umount/mount of the filesystem took place?

If the clients unmounted cleanly, I don't think there is anywhere such info 
could be stored.

You could go back to the clients sending these requests (identify them by error 
messages in the logs, they'd complain about error -19 connecting to OSTs) and
see what's wrong with them, what do they have mounted and such.

Bye,
Oleg
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Quotas

2010-12-06 Thread Fan Yong
  On 12/7/10 4:35 AM, Mark Nelson wrote:
 Hi Guys,

 Several years ago there was a thread discussing some of the problems
 with Lustre Quotas and what kinds of things might be done to move
 forward.  I was wondering if/how things have improved since then?  Any
 one have any thoughts/experiences they would be willing to share?

 Here's the thread from 2008:
 http://lists.lustre.org/pipermail/lustre-devel/2008-May/002451.html

As I known, the progress is as following:

* Changes required to quotas because of architecture changes *
#1: Supporting quotas on HEAD (no CMD)
It has been done and released in lustre-2.0.

#2: Supporting quotas with CMD
Some design only, not implement yet.

#3: Supporting quotas with DMU
Seems in processing.

* Shortcomings of the current quota implementation *
Unfortunately, these known quota issues on lustre are not overcame yet.


Cheers,
--
Nasf
 Thanks,
 Mark

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Quotas

2010-12-06 Thread Landen Tian
Mark Nelson wrote:
 Hi Guys,

 Several years ago there was a thread discussing some of the problems 
 with Lustre Quotas and what kinds of things might be done to move 
 forward.  I was wondering if/how things have improved since then?  Any 
 one have any thoughts/experiences they would be willing to share?

 Here's the thread from 2008:
 http://lists.lustre.org/pipermail/lustre-devel/2008-May/002451.html

 Thanks,
 Mark
   
As everyone knows, HEAD(2.0) already supports quota. It has been done by 
Fanyong.
Currently, a redesign and reimplementation for porting quota to kdmu is 
going on. Its main tasks
include:
1. support new osd api
2. build separate quota connections between quota master and quota 
slaves, instead of using ldlm
reverse import.
Certainly, I will do some optimization for quota code at the same time. 
Other issues will be handled
in the future.

landen
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Announce: Lustre 1.8.5 is available!

2010-12-06 Thread Terry Rutledge
Hi all,

Lustre 1.8.5 is available on the Oracle Download Center Site.

http://www.oracle.com/technetwork/indexes/downloads/sun-az-index-095901.html#L

The change log can be read here:

http://wiki.lustre.org/index.php/Use:Change_Log_1.8

Here are some items that may interest you in this release:

* Changes to support matrix
 - Kernel update: SLES11 SP1 - 2.6.32.19-0.2.1
https://bugzilla.lustre.org/show_bug.cgi?id=21610
 - Kernel update: SLES10 SP3 - 2.6.16.60-0.69.1
https://bugzilla.lustre.org/show_bug.cgi?id=20744

* Significant Bugs
 - Fix a problem with atime which is not always updated
https://bugzilla.lustre.org/show_bug.cgi?id=23766
 - Fix an issue with file size which can be inconsistent
   between client nodes
https://bugzilla.lustre.org/show_bug.cgi?id=23174

As always, you can report issues via Bugzilla:
https://bugzilla.lustre.org/

Our next release is Lustre 2.1.0, expected in the next couple
of months.  The next 1.8.x release will be 1.8.6 and the
schedule for this is to be determined.

To access earlier releases of Lustre, please check the box
See previous products(P), then click L or scroll down to
Lustre, the current and all previous releases (1.8.0 - 1.8.5)
will be displayed.

Happy downloading!

-- The Lustre Team --


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss