Re: [Lustre-discuss] lustre-discuss@lists.lustre.org
Hi Aurélien, Do you have a specification for the junit test results you produce, or an example of one of your test results sets. We would be more than willing to pick up and go with something that can be used with a wider set of tools, with the obvious caveat that it provides everything needed to completely capture the test results for Lustre today and in the future. If you have some example results set's that you can forward please mail them to chris whamcloud.com Thanks Chris I see that PerfPublisher uses xml, although this seems to be the only specification. On 17/12/2010 20:11, Aurélien wrote: Robert Read a écrit : We don't plan to use Hudson to manage our testing results as I don't think it would scale very well for all the testing we might do for each build. We're currently building a more custom results server that's similar (in spirit at least) to the kinds of tools we had at Oracle. We'll make it available once it's in presentable form. Actually, our first step was to replace the acceptance-small.sh driver script with one that has a more sensible user interface for running the standard tests. Since the test-framework.sh on master has already been changed to produce test results in yaml format, the new script collects these with the logs, and is capable of submitting them to the test results server. Currently this is being run manually, though. Automating the test execution and connecting all the pieces will be next step. Ok. I will be very interested in seeing the final result. But I think it is a good idea to stick to standard format and tools as much as possible. This could be a pity if all your new work will be only usable by your tool. Junit is quite standard. PerfPublisher has its own format due to junit limitations. There is other ones. It could be really good if you do not create a new one. And indeed, acc-sm is a bit limited and improve it could be really interesting. Aurélien ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] how to reuse OST indices (EADDRINUSE)
Hello list, We recently evacuated several OSTs on a single OSS, replaced RAID controllers, re-initialized RAIDs for new OSTs, and made new lustre filesystems for them, using the same OST indices as we had before. The filesystem and all its clients have been up and running the whole time. We disabled the OSTs we were working on on all clients and our MGS/MDS (lctl dl shows them as IN everywhere). Now we want to bring the newly-formatted OSTs back online. When we try to mount the new OSTs, we get this for each one in this syslog of the OSS that has been under maintenance: Lustre: mgc10.13.28@o2ib: Reactivating import LustreError: 11-0: an error occurred while communicating with 10.13.28@o2ib. The mgs_target_reg operation failed with -98 LustreError: 6065:0:(obd_mount.c:1097:server_start_targets()) Required registration failed for cms-OST0006: -98 LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -98 LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd cms-OST0006 LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount()) cms-OST0006 not registered What do we need to do to get these OSTs back into the filesystem? We really want to reuse the original indices. This is Lustre 1.8.4, btw. Thanks, Craig Prescott UF HPC Center ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] how to reuse OST indices (EADDRINUSE)
Hello, Did you backup old magic files (last_rcvd, LAST_ID, CONFIG/*) from the original OSTs and put them back before trying to mount them? You probably didn't do that. So when you remount the OSTs with existing index, the MGS will refuse to add them without being told to writeconf, hence -EADDRINUSE. The proper ways to replace an OST are described in bug 24128. 在 2010-12-21,上午8:33, Craig Prescott 写道: Hello list, We recently evacuated several OSTs on a single OSS, replaced RAID controllers, re-initialized RAIDs for new OSTs, and made new lustre filesystems for them, using the same OST indices as we had before. The filesystem and all its clients have been up and running the whole time. We disabled the OSTs we were working on on all clients and our MGS/MDS (lctl dl shows them as IN everywhere). Now we want to bring the newly-formatted OSTs back online. When we try to mount the new OSTs, we get this for each one in this syslog of the OSS that has been under maintenance: Lustre: mgc10.13.28@o2ib: Reactivating import LustreError: 11-0: an error occurred while communicating with 10.13.28@o2ib. The mgs_target_reg operation failed with -98 LustreError: 6065:0:(obd_mount.c:1097:server_start_targets()) Required registration failed for cms-OST0006: -98 LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -98 LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd cms-OST0006 LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount()) cms-OST0006 not registered What do we need to do to get these OSTs back into the filesystem? We really want to reuse the original indices. This is Lustre 1.8.4, btw. Thanks, Craig Prescott UF HPC Center ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Daniel, It looks like your OST backend storage device may be having an issue. I would check the health and stability of the backend storage device or raid you are using for an OST device. It wouldn't likely cause a system reboot of your OSS system. There may be more problems, hardware and/or OS related that are causing the system to reboot in addition to Lustre complaining that it can't find the OST storage device. Others here on the list will likely give you a more detailed answer. The storage device is the place i would look first. --Jeff -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote: Hi Genius, Good Day !! I am Daniel. My OSS getting automatically rebooted again and again . kindly help to me Its showing the below error messages *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@810400e24400 x1353488904620274/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc -19/0 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available for connect (no target) kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8101124c7c00 x1353488904620359/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc -19/0 * Regards, Daniel A ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Hi Jeff, Thanks for your reply *Storage information *: DL380G5 == OSS + 16GB Ram OS== SFS G3.2-2 + centos 5.3 + lustre 1.8.3 MSA60 box == OST RAID 6 Regards, Daniel A On Tue, Dec 21, 2010 at 11:45 AM, Jeff Johnson jeff.john...@aeoncomputing.com wrote: Daniel, It looks like your OST backend storage device may be having an issue. I would check the health and stability of the backend storage device or raid you are using for an OST device. It wouldn't likely cause a system reboot of your OSS system. There may be more problems, hardware and/or OS related that are causing the system to reboot in addition to Lustre complaining that it can't find the OST storage device. Others here on the list will likely give you a more detailed answer. The storage device is the place i would look first. --Jeff -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.comwrote: Hi Genius, Good Day !! I am Daniel. My OSS getting automatically rebooted again and again . kindly help to me Its showing the below error messages *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@810400e24400 x1353488904620274/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc -19/0 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available for connect (no target) kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8101124c7c00 x1353488904620359/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc -19/0 * Regards, Daniel A ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Daniel, Check the health and stability of your raid-6 volume. Make sure the raid is healthy and online. Use whatever monitor utility came with your raid card or check /proc/mdstat if it's a Linux mdraid. Check /var/log/messages for error messages from your raid or other hardware. --Jeff ---mobile signature--- Jeff Johnson - Aeon Computing jeff.john...@aeoncomputing.com On Dec 20, 2010, at 22:27, Daniel Raj danielraj2...@gmail.com wrote: Hi Jeff, Thanks for your reply Storage information : DL380G5 == OSS + 16GB Ram OS== SFS G3.2-2 + centos 5.3 + lustre 1.8.3 MSA60 box == OST RAID 6 Regards, Daniel A On Tue, Dec 21, 2010 at 11:45 AM, Jeff Johnson jeff.john...@aeoncomputing.com wrote: Daniel, It looks like your OST backend storage device may be having an issue. I would check the health and stability of the backend storage device or raid you are using for an OST device. It wouldn't likely cause a system reboot of your OSS system. There may be more problems, hardware and/or OS related that are causing the system to reboot in addition to Lustre complaining that it can't find the OST storage device. Others here on the list will likely give you a more detailed answer. The storage device is the place i would look first. --Jeff -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote: Hi Genius, Good Day !! I am Daniel. My OSS getting automatically rebooted again and again . kindly help to me Its showing the below error messages kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@810400e24400 x1353488904620274/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc -19/0 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available for connect (no target) kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8101124c7c00 x1353488904620359/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc -19/0 Regards, Daniel A ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss