[Lustre-discuss] lustre startup sequence Re: OSTs not activating following MGS/MDS move
Hi Colin. This is not what the manual says. Shall it be corrected then? Or, add description for startup sequence in different situations (first start, restart). The manual (or online information) does not describe graceful shutdown sequence for separate MGS/MDT configuration, it will be nice to add that too. Alex. E.g. http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html#50438194_24122 and similar http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.50438194_24122 13.2 Starting Lustre The startup order of Lustre components depends on whether you have a combined MGS/MDT or these components are separate. If you have a combined MGS/MDT, the recommended startup order is OSTs, then the MGS/MDT, and then clients. If the MGS and MDT are separate, the recommended startup order is: MGS, then OSTs, then the MDT, and then clients. On Mar 7, 2013, at 9:51 AM, Colin Faber wrote: Hi Christopher, In general this can happen when your initial remount of the various services is in the wrong order. Such as MGS - OST - MDT - Client. or MGS - MDT - Clients - OST, etc. During initial mount and registration it's critical that your mount be in the correct order: MGS - MDT - OST(s) - Client(s) CATALOG corruption, or out of order sequence is more rare on active file system, but is possible. The simple fix here as described below is to just truncate it and all should be well again. -cf ailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Document Database Re: [wc-discuss] Seeking contributors for Lustre User Manual
On Nov 13, 2012, at 3:26 PM, Ned Bass wrote: On Tue, Nov 13, 2012 at 11:48:35AM -0800, Nathan Rutman wrote: Would it be easier to move the manual back to a Wiki? The low hassle factor of wikis has always been a draw for contribution. The openSFS site is up and running with MediaWiki now (wiki.opensfs.org). Easier? Yes, probably. Better? I personally don't think so. Wikis are great collaboration tools for informally sharing information, but I don't think the paradigm scales well for documents of this size and complexity. And a wiki isn't the right tool for producing a formal professional-quality document, which is what I think the Lustre manual should strive to be. True, we would lower the bar for contributions, but for that we would sacrifice the following features that I consider essential. - Ability to export to multiple formats (pdf, html, epub) from one source http://www.docbook.org ? - Consistency of formatting and navigation elements - A review process for proposed changes that assures a high standard of quality - ability to track changes between document versions to incrementally update 'higher level' documents However, there are some short articles that probably do belong in the wiki that could be poached from the manual, i.e. installation and configuration procedures, etc. Right. And also the other way around: detailed articles on wiki written by developers can be later 'harvested' by professional writer into manual chapter, referencing to wiki for details. Lowering entry bar is vital to encourage developers to write or update documentation. DB: In addition to wiki and manual it will be nice to have Document Database, where conference reports, RFCs, RFP, HLD, DLD, ... can be committed, updated and later searched. Something like DocDB http://sourceforge.net/projects/docdb-v/ Document format can be any. DocDB has been created to keep track of documentation in large collaboration - BTeV experiment - and then used by several others. DocDB has ability to manage access rights to some documents. I think we need all three - wiki, DocDB and manuals, they serve different purpose. KB: Right now lustre support tips and hints are living on lustre-discuss list. It is tedious to search emails (no tags,no links), and when the answer found, there is no guarantee it is still relevant. It can be useful to accumulate tips and best practices in Knowledge Base and have mechanisms to update it, e.g. instead of answering directly to the list create entry in KB and post the ref. to the list. Alex. Ned ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Tar backup of MDT runs extremely slow, tar pauses on pointers to very large files
Is this the same issue as at backup MDT question (and follow up) http://lists.lustre.org/pipermail/lustre-discuss/2009-April/010151.html due to sparse files on MDT? Does tar take a lot of CPU? Alex. On May 30, 2012, at 5:02 PM, Andreas Dilger wrote: The tar backup of the MDT is taking a very long time. So far it has backed up 1.6GB of the 5.0GB used in nine hours. In watching the tar process pointers to small or average size files are backed up quickly and at a consistent pace. When tar encounters a pointer/inode belonging to a very large file (100GB+) the tar process stalls on that file for a very long time, as if it were trying to archive the real filesize amount of data rather than the pointer/inode. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] 2.1.1? Re: 1.8.x server for el6
On Jan 5, 2012, at 12:12 PM, Peter Jones wrote: On 12-01-05 9:21 AM, Andreas Dilger wrote: ... For new deployments the recommended version is 2.1.0 with RHEL6.1. We are starting work on a 2.1.1 maintenance release for the spring. While it is not often that I would disagree with Andreas, I would say that the answer on this point depends upon your timing. Right now, if stability is your primary driver (and it sounds like it is) then I would recommend 1.8.7-wc1. The early feedback from 2.1 is very encouraging, but I think that we need a little more production feedback before we could confidently assert that 2.1.x is the default option. Peter, if by some reason (features) we need to stick with 2.1.x and rebuild 2.1.0 with 2.1.1 patches to get more stability if needed, is there - separate branch for it (2.1.1) - JIRA tracker for bug fixes for 2.1.1 release Alex. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre 2.0 client cache size
We used POSIX_FADV_DONTNEED in distributed iozone test to clear cache on slave client after initial write and before client reads back same data. It helped to see real data rates instead of unrealistically high read rate due do cacheing. Perhaps it was non-lustre (NFS) file server. If you do scripting, small executable like http://www.citi.umich.edu/projects/asci/benchmarks.html (scroll down to Clearcache) can be called after cp or dd. Alex. On Mar 19, 2011, at 12:47 AM, Jay wrote: After checking 2.6.35 kernel source code, POSIX_FADV_NOREUSE actually doesn't do anything. So I don't know how it helps. Probably we should do POSIX_FADV_DONTNEED after reading? Jay On Mar 18, 2011, at 1:07 AM, DEGREMONT Aurelien wrote: ... snip... Hmm... I do not want to patch 'cp' or 'dd' :) ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss