Re: [Lustre-discuss] Alternative to DRBD
On Jul 20, 2009 23:45 -0400, Brian J. Murrell wrote: On Mon, 2009-07-20 at 23:41 -0400, Mag Gam wrote: Other than DRBD and Hot standby are there any other alternatives? We want to have a redundant copy of our data and was wondering if rsync is the only way to accomplish this. Until the replication feature is available, rsync (or a suitable replacement) is about the only way to have an independent, redundant copy of your data. In the 2.0 release there will be an efficient mechanism (Changelogs) along with an rsync replacement to incrementally sync a Lustre filesystem to another filesystem. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Alternative to DRBD
- Andreas Dilger adil...@sun.com wrote: Until the replication feature is available, rsync (or a suitable replacement) is about the only way to have an independent, redundant copy of your data. In the 2.0 release there will be an efficient mechanism (Changelogs) along with an rsync replacement to incrementally sync a Lustre filesystem to another filesystem. We are currently using e2scan and distributed rsyncs across the compute farm to do the same thing with Lustre v1.6/1.8 to mirror our main filesystem to a backup every night. Daire ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Selection of kernel options for the distributed Lustre kernels
Hi Brian, In addition to providing an updated I/B stack from OFED we also provide the iSCSI stack from OFED as well, as it's generally newer than what the vendor provides. In order to do so and minimize confusion with the vendor supplied kernel, we disable the vendor's iSCSI (as well as infiniband) stacks. You will find the iSCSI stack in the kernel-ib package we distribute with our release. Let me know if that doesn't pan out. iSCSI is not something we routinely test due to lack of hardware but more importantly, simply lack of desire in our customer base. iSCSI is just not a common deployment method within the Lustre customer base. We plan to use iSCSI arrays over TCP/IP as the shared storage for the MDT. So for me it would be of course much more convenient to have the corresponding support for that in the pre-built rpms as it saves me the hassle to re-compile (so much for customer demand ;-). Also, as you can turn a disk server into an iSCSI target I would think that this will more and more become an option to create relatively cheap shared storage for the MDT. Any chance that at least the TCP/IP iSCSI initiator stuff is set back to the RHEL defaults? -- CONFIG_SCSI_ISCSI_ATTRS=m CONFIG_ISCSI_TCP=m CONFIG_SCSI_QLA_ISCSI=m -- Thanks, Arne smime.p7s Description: S/MIME Cryptographic Signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Selection of kernel options for the distributed Lustre kernels
On Tue, 2009-07-21 at 10:59 +0200, Arne Wiebalck wrote: Hi Brian, Hi Arne, We plan to use iSCSI arrays over TCP/IP as the shared storage for the MDT. So for me it would be of course much more convenient to have the corresponding support for that in the pre-built rpms as it saves me the hassle to re-compile (so much for customer demand ;-). It is. As I said in my last e-mail, the iSCSI support is in the kernel-ib RPM as we build the iSCSI stack from the OFED release. Also, as you can turn a disk server into an iSCSI target I would think that this will more and more become an option to create relatively cheap shared storage for the MDT. Perhaps. Perhaps not. I would think that TCP/IP to the storage adds latencies. Latencies would be quite bad for the MDT. Any chance that at least the TCP/IP iSCSI initiator stuff is set back to the RHEL defaults? Is the iSCSI stack in the kernel-ib RPM not working for you? Please describe if that is the case. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Selection of kernel options for the distributed Lustre kernels
Hi Brian, It is. As I said in my last e-mail, the iSCSI support is in the kernel-ib RPM as we build the iSCSI stack from the OFED release. Ups, sorry, seems I missed that point. Also, as you can turn a disk server into an iSCSI target I would think that this will more and more become an option to create relatively cheap shared storage for the MDT. Perhaps. Perhaps not. I would think that TCP/IP to the storage adds latencies. Latencies would be quite bad for the MDT. Yes ... perhaps :) Let's see. Any chance that at least the TCP/IP iSCSI initiator stuff is set back to the RHEL defaults? Is the iSCSI stack in the kernel-ib RPM not working for you? Please describe if that is the case. I will try it out and let you know. Thanks for you help, Arne smime.p7s Description: S/MIME Cryptographic Signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Alternative to DRBD
On Tue, Jul 21, 2009 at 4:02 AM, Daire Byrnedaire.by...@framestore.com wrote: - Andreas Dilger adil...@sun.com wrote: Until the replication feature is available, rsync (or a suitable replacement) is about the only way to have an independent, redundant copy of your data. In the 2.0 release there will be an efficient mechanism (Changelogs) along with an rsync replacement to incrementally sync a Lustre filesystem to another filesystem. We are currently using e2scan and distributed rsyncs across the compute farm to do the same thing with Lustre v1.6/1.8 to mirror our main filesystem to a backup every night. Is this a script you'd be willing to share? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Selection of kernel options for the distributed Lustre kernels
Hi Brian, Is the iSCSI stack in the kernel-ib RPM not working for you? Please describe if that is the case. I will try it out and let you know. I reinstalled the pre-built kernel from the rpm as available from the Lustre website and installed the kernel-ib package as well: -- [r...@host ~]# rpm -qa | grep kernel-lustre kernel-lustre-smp-2.6.18-128.1.6.el5_lustre.1.8.0.1 [r...@host ~]# rpm -qa | grep kernel-ib kernel-ib-1.4.1-2.6.18_128.1.6.el5 [r...@host ~]# uname -a Linux host 2.6.18-128.1.6.el5_lustre.1.8.0.1smp #1 SMP Thu Jun 18 17:06:38 MDT 2009 x86_64 x86_64 x86_64 GNU/Linux -- It seems that the iSCSI modules cannot be loaded: -- [r...@host ~]# modprobe scsi_transport_iscsi FATAL: Module scsi_transport_iscsi not found. -- That's probably as modprobe will check in /lib/modules/2.6.18-128.1.6.el5_lustre.1.8.0.1smp not in /lib/modules/2.6.18-128.1.6.el5 where kernel-ib installs its modules So, insmod maybe? -- [r...@host ~]# insmod /lib/modules/2.6.18-128.1.6.el5/updates/kernel/drivers/scsi/scsi_transport_iscsi.ko insmod: error inserting '/lib/modules/2.6.18-128.1.6.el5/updates/kernel/drivers/scsi/scsi_transport_iscsi.ko': -1 Invalid module format -- Am I missing anything obvious here? TIA, Arne smime.p7s Description: S/MIME Cryptographic Signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Alternative to DRBD
Michael, - Michael Di Domenico mdidomeni...@gmail.com wrote: We are currently using e2scan and distributed rsyncs across the compute farm to do the same thing with Lustre v1.6/1.8 to mirror our main filesystem to a backup every night. Is this a script you'd be willing to share? Sure, but it is fairly specific to our setup and (to be honest) is pretty hacktastic! The basic idea is that an e2scan is run on the production filesystem every night and new files (last 72 hours) are copied over (incremental). Then once a week we include deletes which then compares the e2scan file list from both the production and backup filesystems to work out what needs to be deleted from the backup to synchronise the two filesystems. We never really ever have to do a full backup. We also use LVM snapshots on the backup to give us a couple of weeks of retention. It works for us but your mileage may vary. Daire flbackup Description: Binary data ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] 1.8.1(-ish) client vs. 1.6.7.2 server
I added this to bugzilla. https://bugzilla.lustre.org/show_bug.cgi?id=20227 cheers, robin On Wed, Jul 15, 2009 at 01:09:33PM -0400, Robin Humble wrote: On Wed, Jul 15, 2009 at 08:46:12AM -0400, Robin Humble wrote: I get a ferocious set of error messages when I mount a 1.6.7.2 filesystem on a b_release_1_8_1 client. is this expected? just to annotate the below a bit in case it's not clear... sorry - should have done that in the first email :-/ 10.8.30.244 is MGS and one MDS, 10.8.30.245 is the other MDS in the failover pair. 10.8.30.201 - 208 are OSS's (one OST per OSS), and the fs is mounted in the usual failover way eg. mount -t lustre 10.8.30@o2ib:10.8.30@o2ib:/system /system from the below (and other similar logs) it kinda looks like the client fails and then renegotiates with all the servers. cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility Lustre: 13800:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 13799:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: mgc10.8.30@o2ib: Reactivating import Lustre: 13797:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 13798:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: Client system-client has started Lustre: 13798:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 ... last message repeated 17 times ... Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 13798:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 13797:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 13800:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 looks like it succeeds in the end, but only after a struggle. I don't have any problems with 1.8.1 - 1.8.1 or 1.6.7.2 - 1.6.7.2. servers are rhel5 x86_64 2.6.18-92.1.26.el5 1.6.7.2 + bz18793 (group quota fix). client is rhel5 x86_64 patched 2.6.18-128.1.16.el5-b_release_1_8_1 from cvs 20090712131220 + bz18793 again. BTW, should I be using cvs tag v1_8_1_RC1 instead of b_release_1_8_1? I'm confused about which is closest to the final 1.8.1 :-/ cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mds adjust qunit failed
On Tue, Jul 21, 2009 at 01:50:43PM +0800, Lu Wang wrote: Dear list, I have gotten over 19000 quota-related errors on one MDS since 18:00 yesterday like: Jul 20 18:24:04 * kernel: LustreError: 10999:0:(quota_master.c:507:mds_quota_adjust()) mds adjust qunit failed! (opc:4 rc:-122) if you look through the Linux errno header files, you'll find -122 is EDQUOT/* Quota exceeded */ so someone or some group is over quota - either inodes or diskspace. it would be really good if this message said which uid/gid was over quota, and from which client, and on which filesystem. as you have found, the current message is not very informative and overly verbose. I was looking at the quota code around this message a few days ago, and it looks like it'd be really easy to add some extra info to the message but I have yet to test a toy patch I wrote... cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility Jul 20 18:29:27 * kernel: LustreError: 11007:0:(quota_master.c:507:mds_quota_adjust()) mds adjust qunit failed! (opc:4 rc:-122) Jul 21 13:44:27 * kernel: LustreError: 10999:0:(quota_master.c:507:mds_quota_adjust()) mds adjust qunit failed! (opc:4 rc:-122) # grep master /var/log/messages |wc 19628 255058 2665136 Dose any one know what does this mean? The mds is running on 2.6.9-67.0.22.EL_lustre.1.6.6smp. Best Regards Lu Wang -- Computing Center IHEP Office: Computing Center,123 19B Yuquan RoadTel: (+86) 10 88236012-607 P.O. Box 918-7 Fax: (+86) 10 8823 6839 Beijing 100049,China Email: lu.w...@ihep.ac.cn -- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] client missing inodes
Hi all, recently, many of our clients have been reporting errors of the type: Jul 21 14:40:41 kernel: LustreError: 26871:0:(file.c:3024:ll_inode_revalidate_fini()) failure -2 inode 72692692 -2 is no such file or directory Am I right that this means some inode info got lost on the MDT? Also, people have reported problems finding their own files directly after writing them. However, later on/ on other clients, these files could be accessed. I'm not sure about the files' content, though. Regards, Thomas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] 1.8.1(-ish) client vs. 1.6.7.2 server
Robin, These messages should be harmless, 1.8.1 is using new o2iblnd message protocol, so there is a version negotiation if o2iblnd version of client is older, is there any other o2ib error messages like Deleting messages for xxx.xxx.xxx@o2b: connection failed when you see IO failure? Anyway, if you got more complaint from o2ib except these information, could you please post them on the bug you filed. Thanks Liang Robin Humble wrote: I added this to bugzilla. https://bugzilla.lustre.org/show_bug.cgi?id=20227 cheers, robin On Wed, Jul 15, 2009 at 01:09:33PM -0400, Robin Humble wrote: On Wed, Jul 15, 2009 at 08:46:12AM -0400, Robin Humble wrote: I get a ferocious set of error messages when I mount a 1.6.7.2 filesystem on a b_release_1_8_1 client. is this expected? just to annotate the below a bit in case it's not clear... sorry - should have done that in the first email :-/ 10.8.30.244 is MGS and one MDS, 10.8.30.245 is the other MDS in the failover pair. 10.8.30.201 - 208 are OSS's (one OST per OSS), and the fs is mounted in the usual failover way eg. mount -t lustre 10.8.30@o2ib:10.8.30@o2ib:/system /system from the below (and other similar logs) it kinda looks like the client fails and then renegotiates with all the servers. cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility Lustre: 13800:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 13799:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: mgc10.8.30@o2ib: Reactivating import Lustre: 13797:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 13798:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: Client system-client has started Lustre: 13798:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 ... last message repeated 17 times ... Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 13798:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 13797:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30@o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096 Lustre: 13800:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30@o2ib failed: 5 looks like it succeeds in the end, but only after a struggle. I don't have any problems with 1.8.1 - 1.8.1 or 1.6.7.2 - 1.6.7.2. servers are rhel5 x86_64 2.6.18-92.1.26.el5 1.6.7.2 + bz18793 (group quota fix). client is rhel5 x86_64 patched 2.6.18-128.1.16.el5-b_release_1_8_1 from cvs 20090712131220 + bz18793 again. BTW, should I be using cvs tag v1_8_1_RC1 instead of b_release_1_8_1? I'm confused about which is closest to the final 1.8.1 :-/ cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] quota type - 3?
Hi, The manual says there are two types quotas available v1 and v2. I checked the quota_type parameter on a MDT as: [r...@localhost ~]# cat /proc/fs/lustre/mds/atlantic-MDT/quota_type 3 Does this mean quota type 3 or no quota is enabled? It returns 'ug3' after doing lfs quotacheck. Also, where can I find proc entry for an OST? Am I missing something? Thanks, Shantanu Pavgi. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] detecting problematic files via mds/oss syslog messages
Andreas and Brian, Thank you for invaluable information you gave. Regards, Ender GULER On Tue, Jul 21, 2009 at 4:51 PM, Brian J. Murrell brian.murr...@sun.comwrote: On Mon, 2009-07-20 at 16:06 -0600, Andreas Dilger wrote: On Jul 20, 2009 13:20 +0300, Ender G�ler wrote: Are there any ways of detecting the problematic file names from the mds/oss syslog messages? Or to be more definite, are there any ways to find a map of file name to inode number or file name to object id or inode number to object id? I'm trying to understand the insights of lustre and some times I wonder which files are affected from a lustre error? For example, when I see an object id in log messages, how can I understand which file is this? Is there any look up table, or map for these purposes? If there are particularly bad error messages, you could file a bug with details. It is also possible to manually map an OST object ID to an MDS filename. For example, objid 620032 on my filesystem: This exact procedure is being added to the manual FWIW. Details are in bug 19753. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Alternative to DRBD
Can't wait until 2.0 and SNS. I think thats the only die-hard feature Lustre is really missing. On Tue, Jul 21, 2009 at 10:19 AM, daire.by...@framestore.com wrote: Michael, - Michael Di Domenico mdidomeni...@gmail.com wrote: We are currently using e2scan and distributed rsyncs across the compute farm to do the same thing with Lustre v1.6/1.8 to mirror our main filesystem to a backup every night. Is this a script you'd be willing to share? Sure, but it is fairly specific to our setup and (to be honest) is pretty hacktastic! The basic idea is that an e2scan is run on the production filesystem every night and new files (last 72 hours) are copied over (incremental). Then once a week we include deletes which then compares the e2scan file list from both the production and backup filesystems to work out what needs to be deleted from the backup to synchronise the two filesystems. We never really ever have to do a full backup. We also use LVM snapshots on the backup to give us a couple of weeks of retention. It works for us but your mileage may vary. Daire ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss