Re: [Lustre-discuss] SAN, shared storage, iscsi using lustre?
On Wed, 2008-08-13 at 11:55 +0300, Alex wrote: Hello Brian, Hi. Thanks for your prompt reply... See my comments inline.. NP. i have a cluster with 3 servers (let say they are web servers for simplicity). All web servers are serving the same content from a shared storage volume mounted as document root on all. Ahhh. Your 3 servers would in fact then be Lustre clients. Given that you have identified 3 Lustre clients and 8 disks you now need some servers to be your Lustre servers. What we have in adition to above: - other N=8 computers (ore more). N will be what it need to be and can be increased as needed. Well, given that those are simply disks, you can/need to increase that count only in so much as your bandwidth and capacity needs demand. As an aside, it seems rather wasteful to dedicate a whole computer to being nothing more than an iscsi disk exporter, so it's entirely possible that I'm misunderstanding this aspect of it. In any case, if you do indeed have 1 disk in each of these N=8 computers exporting a disk with iscsi, then so be it and each machine represents a disk. Nothing imposed. In my example, i said that all N computers are exporting via iscsi their block devices (one block/computer) so on ALL our websevers we have visible and available all 8 iscsi disks to build a shared storage volume (like a SAN). Right. You need to unravel this. If you want to use Lustre you need to make those disks/that SAN available to Lustre servers, not your web servers (which will be Lustre clients). Doesn't mean that is a must that all of them to export disks. Part of them can achive other functions like MDS, or perform other function if needed. Not if you want to have redundancy. If you want to use RAID to get redundancy out of those iscsi disks then the machines exporting those disks need to be dedicated to simply exporting the disks and you need to introduce additional machines to take those exported block devices, make RAID volumes out of them and then incorporate those RAID volumes into a Lustre filesystem. You can see why I think it seems wasteful to be exporting these disks, 1 per machine as iscsi targets. Also, we can add more computers in above schema, as you will suggest. Well, you will need to add an MDS or two and 2 or more OSSes to achieve redundancy. These 3 servers should be definitely our www servers. I don't know if can be considered part of lustre... Only Lustre clients then. Reading lustre faq, is still unclear for me who are lustre clients. Your 3 web servers would be the Lustre clients. - how may more machines i need Well, I would say 3 minimum as per my previous plan. - how to group and their roles (which will be MDS, which will be OSSes, which will be clients, etc) Again, see my previous plan. You could simplify a bit and use 4 machines, two acting as active/passive MDSes and two as active/active OSSes. - what i have to do to unify all iscsi disks in order to have non SPOF RAID them on the MDSes and OSSes. - will be ok to group our 8 iscsi disk in two 4 paired software raid (raid6) arrays (md0, md1), No. Please see my previous e-mail about what you could do with 8 disks. form on top another raid1 (let say md2), and on top of md2 to use lvm? You certainly could layer LVM between the RAID devices and Lustre, but it doesn't seem necessary. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] OST disconnect messages on OSS
I have a system thats been spitting out OST disconnect messages under heavy load. I'm guessing the OST eventually reconnects. I want to say this happens when the OSS is extremely overloaded but I did notice this happening even under light load. Only the OSS seems to spit out any error messages. I dont see anything on the client side. Should I be concern? Or does this typically happen on other sites too? -Alex clip off one of the OSS: Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: 137-5: UUID 'lfs-OST0004_UUID' is not available for connect (no target) Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: 11094:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-19) [EMAIL PROTECTED] fff8101f4570600 x54/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 1218616308 ref 1 fl Interpret:/0/0 rc -19/0 Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: 11094:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 3 previous similar messag es Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: Skipped 3 previous similar messages Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: 137-5: UUID 'lfs-OST0004_UUID' is not available for connect (no target) Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: 10984:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-19) [EMAIL PROTECTED] fff81010fc86600 x50/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 1218617636 ref 1 fl Interpret:/0/0 rc -19/0 Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: 10984:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous similar messag e Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: Skipped 1 previous similar message Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: 137-5: UUID 'lfs-OST0005_UUID' is not available for connect (no target) Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: 11070:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-19) [EMAIL PROTECTED] fff81022861b400 x49/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 1218621159 ref 1 fl Interpret:/0/0 rc -19/0 Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: 11070:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous similar messag e Different OSS: Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: 137-5: UUID 'lfs-OST0050_UUID' is not available for connect (no target) Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: 13527:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-19) [EMAIL PROTECTED] fff8103d3b79a00 x124/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 1218539929 ref 1 fl Interpret:/0/0 rc -19/0 Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: 13527:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous similar messag e Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: Skipped 1 previous similar message Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: 137-5: UUID 'lfs-OST004f_UUID' is not available for connect (no target) Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: 13521:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-19) [EMAIL PROTECTED] fff8103d3e92a00 x125/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 1218539935 ref 1 fl Interpret:/0/0 rc -19/0 Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: 13521:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous similar messag e Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: Skipped 1 previous similar message Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: 137-5: UUID 'lfs-OST004f_UUID' is not available for connect (no target) Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: 28121:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-19) [EMAIL PROTECTED] fff8103d3983c00 x125/t0 o8-?@?:0/0 lens 240/0 e 0 to 0 dl 1218539938 ref 1 fl Interpret:/0/0 rc -19/0 Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: 28121:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 5 previous similar messag es ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] mv_sata patch
Is the cache patch for mv_sata noted in the sun paper on the x4500 available? Or has it been rolled into the source distributed by sun? Trying to avoid data loss. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mv_sata patch
On Aug 13, 2008, at 12:38 PM, Brock Palen wrote: Is the cache patch for mv_sata noted in the sun paper on the x4500 available? Or has it been rolled into the source distributed by sun? What source are you referring to? It can be had here http://www.sun.com/servers/x64/x4500/downloads.jsp Trying to avoid data loss. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss smime.p7s Description: S/MIME cryptographic signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mv_sata patch
I just realized that I may not have answered your question, and I'm not sure if the patch is in the source posted at sun.com or not. If not, it is in the bug as an attachment - https://bugzilla.lustre.org/show_bug.cgi?id=14040 -frank On Aug 13, 2008, at 1:07 PM, Frank Leers wrote: On Aug 13, 2008, at 12:38 PM, Brock Palen wrote: Is the cache patch for mv_sata noted in the sun paper on the x4500 available? Or has it been rolled into the source distributed by sun? What source are you referring to? It can be had here http://www.sun.com/servers/x64/x4500/downloads.jsp Trying to avoid data loss. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss smime.p7s Description: S/MIME cryptographic signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss