Re: [Lustre-discuss] SAN, shared storage, iscsi using lustre?

2008-08-13 Thread Brian J. Murrell
On Wed, 2008-08-13 at 11:55 +0300, Alex wrote:
 Hello Brian,

Hi.

 Thanks for your prompt reply... See my comments inline..

NP.

 i have a cluster with 3 servers (let say they are web servers 
 for simplicity). All web servers are serving the same content from a shared 
 storage volume mounted as document root on all.

Ahhh.  Your 3 servers would in fact then be Lustre clients.  Given that
you have identified 3 Lustre clients and 8 disks you now need some
servers to be your Lustre servers.

 What we have in adition to above:
 - other N=8 computers (ore more). N will be what it need to be and can be 
 increased as needed.

Well, given that those are simply disks, you can/need to increase that
count only in so much as your bandwidth and capacity needs demand.

As an aside, it seems rather wasteful to dedicate a whole computer to
being nothing more than an iscsi disk exporter, so it's entirely
possible that I'm misunderstanding this aspect of it.  In any case, if
you do indeed have 1 disk in each of these N=8 computers exporting a
disk with iscsi, then so be it and each machine represents a disk.

 Nothing imposed. In my example, i said that all N 
 computers are exporting via iscsi their block devices (one block/computer) so 
 on ALL our websevers we have visible and available all 8 iscsi disks to build 
 a shared storage volume (like a SAN).

Right.  You need to unravel this.  If you want to use Lustre you need to
make those disks/that SAN available to Lustre servers, not your web
servers (which will be Lustre clients).

  Doesn't mean that is a must that all of 
 them to export disks. Part of them can achive other functions like MDS, or 
 perform other function if needed.

Not if you want to have redundancy.  If you want to use RAID to get
redundancy out of those iscsi disks then the machines exporting those
disks need to be dedicated to simply exporting the disks and you need to
introduce additional machines to take those exported block devices, make
RAID volumes out of them and then incorporate those RAID volumes into a
Lustre filesystem.  You can see why I think it seems wasteful to be
exporting these disks, 1 per machine as iscsi targets.

 Also, we can add more computers in above 
 schema, as you will suggest.

Well, you will need to add an MDS or two and 2 or more OSSes to achieve
redundancy.

 These 3 servers should be definitely our www servers. I don't know if can be 
 considered part of lustre...

Only Lustre clients then.

 Reading lustre faq, is 
 still unclear for me who are lustre clients.

Your 3 web servers would be the Lustre clients.

 - how may more machines i need

Well, I would say 3 minimum as per my previous plan.

 - how to group and their roles (which will be MDS, which will be OSSes, which 
 will be clients, etc)

Again, see my previous plan.  You could simplify a bit and use 4
machines, two acting as active/passive MDSes and two as active/active
OSSes.

 - what i have to do to unify all iscsi disks in order to have non SPOF

RAID them on the MDSes and OSSes.

 - will be ok to group our 8 iscsi disk in two 4 paired software raid (raid6) 
 arrays (md0, md1),

No.  Please see my previous e-mail about what you could do with 8 disks.

 form on top another raid1 (let say md2), and on top of md2 
 to use lvm?

You certainly could layer LVM between the RAID devices and Lustre, but
it doesn't seem necessary.

b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] OST disconnect messages on OSS

2008-08-13 Thread Alex Lee
I have a system thats been spitting out OST disconnect messages under 
heavy load. I'm guessing the OST eventually reconnects.
I want to say this happens when the OSS is extremely overloaded but I 
did notice this happening even under light load. Only the OSS seems to 
spit out any error messages. I dont see anything on the client side.

Should I be concern? Or does this typically happen on other sites too?

-Alex

clip off one of the OSS:

Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: 137-5: UUID 
'lfs-OST0004_UUID' is not available  for connect (no target)
Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: 
11094:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error 
(-19)  [EMAIL PROTECTED]
fff8101f4570600 x54/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 1218616308 
ref 1 fl Interpret:/0/0 rc -19/0
Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: 
11094:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 3 previous 
similar messag
es
Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: Skipped 3 previous 
similar messages
Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: 137-5: UUID 
'lfs-OST0004_UUID' is not available  for connect (no target)
Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: 
10984:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error 
(-19)  [EMAIL PROTECTED]
fff81010fc86600 x50/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 1218617636 
ref 1 fl Interpret:/0/0 rc -19/0
Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: 
10984:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous 
similar messag
e
Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: Skipped 1 previous 
similar message
Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: 137-5: UUID 
'lfs-OST0005_UUID' is not available  for connect (no target)
Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: 
11070:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error 
(-19)  [EMAIL PROTECTED]
fff81022861b400 x49/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 1218621159 
ref 1 fl Interpret:/0/0 rc -19/0
Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: 
11070:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous 
similar messag
e

Different OSS:
Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: 137-5: UUID 
'lfs-OST0050_UUID' is not available  for connect (no target)
Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: 
13527:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error 
(-19)  [EMAIL PROTECTED]
fff8103d3b79a00 x124/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 
1218539929 ref 1 fl Interpret:/0/0 rc -19/0
Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: 
13527:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous 
similar messag
e
Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: Skipped 1 previous 
similar message
Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: 137-5: UUID 
'lfs-OST004f_UUID' is not available  for connect (no target)
Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: 
13521:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error 
(-19)  [EMAIL PROTECTED]
fff8103d3e92a00 x125/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 
1218539935 ref 1 fl Interpret:/0/0 rc -19/0
Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: 
13521:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous 
similar messag
e
Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: Skipped 1 previous 
similar message
Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: 137-5: UUID 
'lfs-OST004f_UUID' is not available  for connect (no target)
Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: 
28121:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error 
(-19)  [EMAIL PROTECTED]
fff8103d3983c00 x125/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 
1218539938 ref 1 fl Interpret:/0/0 rc -19/0
Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: 
28121:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 5 previous 
similar messag
es

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] mv_sata patch

2008-08-13 Thread Brock Palen
Is the cache patch for mv_sata noted in the sun paper on the x4500  
available?  Or has it been rolled into the source distributed by sun?

Trying to avoid data loss.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mv_sata patch

2008-08-13 Thread Frank Leers

On Aug 13, 2008, at 12:38 PM, Brock Palen wrote:


Is the cache patch for mv_sata noted in the sun paper on the x4500
available?  Or has it been rolled into the source distributed by sun?



What source are you referring to?
It can be had here http://www.sun.com/servers/x64/x4500/downloads.jsp


Trying to avoid data loss.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss




smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mv_sata patch

2008-08-13 Thread Frank Leers
I just realized that I may not have answered your question, and I'm  
not sure if the patch is in the source posted at sun.com or not.


If not, it is in the bug as an attachment -
https://bugzilla.lustre.org/show_bug.cgi?id=14040

-frank

On Aug 13, 2008, at 1:07 PM, Frank Leers wrote:


On Aug 13, 2008, at 12:38 PM, Brock Palen wrote:


Is the cache patch for mv_sata noted in the sun paper on the x4500
available?  Or has it been rolled into the source distributed by sun?



What source are you referring to?
It can be had here http://www.sun.com/servers/x64/x4500/downloads.jsp


Trying to avoid data loss.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss






smime.p7s
Description: S/MIME cryptographic signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss