Re: [Lustre-discuss] [zfs-discuss] Problems getting Lustre started with ZFS

2013-10-23 Thread Ned Bass
On Wed, Oct 23, 2013 at 05:46:41PM +0100, Andrew Holway wrote:
 Hello,
 
 I have hit a wall trying to get lustre started. I have followed this
 to some extent:
 
 http://zfsonlinux.org/lustre-configure-single.html
 
 If someone could give me some guidance how to get these services
 started it would be much appreciated.
 
 I running on Centos 6.4 and am getting my packages from:
 http://archive.zfsonlinux.org/epel/6/SRPMS/
 
 Thanks,
 
 Andrew
 
 
 [root@lustre1 ~]# zfs get lustre:svname
 NAME  PROPERTY   VALUE   SOURCE
 lustre-mdt0   lustre:svname  -   -
 lustre-mdt0/mdt0  lustre:svname  lustre:MDT  local
 lustre-mgslustre:svname  -   -
 lustre-mgs/mgslustre:svname  MGS local
 [root@lustre1 ~]# zfs get lustre:svname
 NAME  PROPERTY   VALUE   SOURCE
 lustre-mdt0   lustre:svname  -   -
 lustre-mdt0/mdt0  lustre:svname  lustre:MDT  local
 lustre-mgslustre:svname  -   -
 lustre-mgs/mgslustre:svname  MGS local
 [root@lustre1 ~]# /etc/init.d/lustre
 anaconda-ks.cfg   .bash_profile .cshrc
 install.log.syslog.ssh/ .viminfo
 .bash_logout  .bashrc   install.log
 ks-post-anaconda.log  .tcshrc
 [root@lustre1 ~]# /etc/init.d/lustre start
 [root@lustre1 ~]# /etc/init.d/lustre start lustre-MDT
 lustre-MDT is not a valid lustre label on this node
 [root@lustre1 ~]# /etc/init.d/lustre start MGS
 MGS is not a valid lustre label on this node

You need to configure an /etc/ldev.conf file.  See man ldev.conf(5).
Make sure the first field matches `uname -n`.

 
 I have configured three OSS's with a single OST:
 
 Andrews-MacBook-Air:~ andrew$ for i in {201..204}; do ssh
 root@192.168.0.$i hostname; zfs get lustre:svname; done
 lustre1.calthrop.com
 NAME  PROPERTY   VALUE   SOURCE
 lustre-mdt0   lustre:svname  -   -
 lustre-mdt0/mdt0  lustre:svname  lustre:MDT  local
 lustre-mgslustre:svname  -   -
 lustre-mgs/mgslustre:svname  MGS local
 lustre2.calthrop.com
 NAME  PROPERTY   VALUE   SOURCE
 lustre-ost0   lustre:svname  -   -
 lustre-ost0/ost0  lustre:svname  lustre:OST  local
 lustre3.calthrop.com
 NAME  PROPERTY   VALUE   SOURCE
 lustre-ost0   lustre:svname  -   -
 lustre-ost0/ost0  lustre:svname  lustre:OST  local

You need to use unique index numbers for each OST, i.e. OST,
OST1, etc.

Ned
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldiskfs for MDT and zfs for OSTs?

2013-10-08 Thread Ned Bass
On Tue, Oct 08, 2013 at 11:40:30AM -0400, Anjana Kar wrote:
 The git checkout was on Sep. 20. Was the patch before or after?

The bug was introduced on Sep. 10 and reverted on Sep. 24, so you hit
the lucky window.  :)

 The zpool create command successfully creates a raidz2 pool, and mkfs.lustre
 does not complain, but

The pool you created with zpool create was just for testing.  I would
recommend destroying that pool, rebuilding your lustre packages from the
latest master (or better yet, a stable tag such as v2_4_1_0), and
starting over with your original mkfs.lustre command.  This would ensure
that your pool is properly configured for use with lustre.

If you'd prefer to keep this pool, you should set canmount=off on the
root dataset, as mkfs.lustre would have done:

  zfs set canmount=off lustre-ost0

 
 [root@cajal kar]# zpool list
 NAME  SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 lustre-ost0  36.2T  2.24M  36.2T 0%  1.00x  ONLINE  -
 
 [root@cajal kar]# /usr/sbin/mkfs.lustre --fsname=cajalfs --ost
 --backfstype=zfs --index=0 --mgsnode=10.10.101.171@o2ib lustre-ost0

This command seems to be missing the dataset name, i.e. lustre-ost0/ost0

 
 [root@cajal kar]# /sbin/service lustre start lustre-ost0
 lustre-ost0 is not a valid lustre label on this node

As mentioned elsewhere, this looks like an ldev.conf configuration
error.

Ned
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldiskfs for MDT and zfs for OSTs?

2013-10-07 Thread Ned Bass
On Mon, Oct 07, 2013 at 02:23:32PM -0400, Anjana Kar wrote:
 Here is the exact command used to create a raidz2 pool with 8+2 drives,
 followed by the error messages:
 
 mkfs.lustre --fsname=cajalfs --reformat --ost --backfstype=zfs
 --index=0 --mgsnode=10.10.101.171@o2ib lustre-ost0/ost0 raidz2
 /dev/sda /dev/sdc /dev/sde /dev/sdg /dev/sdi /dev/sdk /dev/sdm
 /dev/sdo /dev/sdq /dev/sds
 
 mkfs.lustre FATAL: Invalid filesystem name /dev/sds

It seems that either the version of mkfs.lustre you are using has a
parsing bug, or there was some sort of syntax error in the actual
command entered.  If you are certain your command line is free from
errors, please post the version of lustre you are using, or report the
bug in the Lustre issue tracker.

Thanks,
Ned
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ldiskfs for MDT and zfs for OSTs?

2013-10-07 Thread Ned Bass
I'm guessing your git checkout doesn't include this commit:

* 010a78e Revert LU-3682 tunefs: prevent tunefs running on a mounted device

It looks like the LU-3682 patch introduced a bug that could cause your issue,
so its reverted in the latest master.

Ned

On Mon, Oct 07, 2013 at 04:54:13PM -0400, Anjana Kar wrote:
 On 10/07/2013 04:27 PM, Ned Bass wrote:
 On Mon, Oct 07, 2013 at 02:23:32PM -0400, Anjana Kar wrote:
 Here is the exact command used to create a raidz2 pool with 8+2 drives,
 followed by the error messages:
 
 mkfs.lustre --fsname=cajalfs --reformat --ost --backfstype=zfs
 --index=0 --mgsnode=10.10.101.171@o2ib lustre-ost0/ost0 raidz2
 /dev/sda /dev/sdc /dev/sde /dev/sdg /dev/sdi /dev/sdk /dev/sdm
 /dev/sdo /dev/sdq /dev/sds
 
 mkfs.lustre FATAL: Invalid filesystem name /dev/sds
 It seems that either the version of mkfs.lustre you are using has a
 parsing bug, or there was some sort of syntax error in the actual
 command entered.  If you are certain your command line is free from
 errors, please post the version of lustre you are using, or report the
 bug in the Lustre issue tracker.
 
 Thanks,
 Ned
 
 For building this server, I followed steps from the walk-thru-build*
 for Centos 6.4,
 and added --with-spl and --with-zfs when configuring lustre..
 *https://wiki.hpdd.intel.com/pages/viewpage.action?pageId=8126821
 
 spl and zfs modules were installed from source for the lustre 2.4 kernel
 2.6.32.358.18.1.el6_lustre2.4
 
 Device sds appears to be valid, but I will try issuing the command
 using by-path
 names..
 
 -Anjana
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Can't install lustre-tests

2013-07-22 Thread Ned Bass
On Mon, Jul 22, 2013 at 09:20:30AM -0700, Prakash Surya wrote:
 Interesting.. Any chance somebody can provide an example URL?

If you browse to Status - Build Artifacts for a Jenkins build, there is
a link to download all files as a zip file.  There is a repodata
directory there, so I suspect the unzipped archive could be used as a
local yum repo.  However, I get a permission denied error, or sometimes
an authentication popup, when I click the zip file link.  I opened a
Jira issue to report the permission problem.

https://jira.hpdd.intel.com/browse/LU-3615

Ned
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] /etc/init.d/lnet

2013-07-01 Thread Ned Bass
Hi Brian,

On Sun, Jun 30, 2013 at 05:37:42PM +, Andrus, Brian Contractor wrote:
 All,
 
 I am finding that on reboot, my client systems hang requiring a hard reset 
 because of the lnet service.
 
 It doesn't unload all the modules in the proper order for me.
 I get errors like module osc has non-zero count
 
 I do some hunting and see lov needs unloaded before osc
 I also find 
 fid and fld need unloaded before ptlrpc
 ofd needs unloaded before ost
 
 Sometimes others as well
 
 Is there an updated lnet script available that is more complete? For the 
 time, I have been modifying the stock one to include the 'missing' modules 
 for processing.

There is a patch to improve the module unloading behavior here, but it
is not yet landed:

http://review.whamcloud.com/5478/

It would be valuable to know if the proposed changes fix your issue.

Ned
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [wc-discuss] Seeking contributors for Lustre User Manual

2012-11-14 Thread Ned Bass
On Thu, Nov 15, 2012 at 12:12:40AM +, Dilger, Andreas wrote:

 I would prefer to see a fix immediately rather than someone filing a
 ticket to describe the fix, since the documentation fix should be
 self-describing.  However, if there is a problem that isn't immediately
 resolved then a Jira ticket should be submitted in order to track the
 defect and allow assigning the work to someone.

LUDOC-11 seems to be a catch-all issue for submitting fixes to minor
problems like typos.  However it sounds like you're saying we can bypass
Jira altogether for such patches.  That would be nice; linking to a
ticket with no useful content doesn't serve any purpose that I can see.

The Making changes to the Lustre Manual source article currently
instructs the reader to file an LUDOC bug for change tracking in Jira
as the first step.  To avoid discouraging submission of minor fixes,
perhaps a more lightweight process for that case should be covered
first.  In particular, say either that minor fixes should reference
LUDOC-11 in the summary, or just omit the Jira reference altogether,
whichever is appropriate.

Ned
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [wc-discuss] Seeking contributors for Lustre User Manual

2012-11-13 Thread Ned Bass
On Sat, Nov 10, 2012 at 12:09:09AM +, Dilger, Andreas wrote:
 The manual source is hosted in a Git/Gerrit repository in Docbook XML
 format and can be downloaded at:
 
 git clone http://git.whamcloud.com/doc/manual lustre-manual

That doesn't work for me:

  % git clone http://review.whamcloud.com/doc/manual  lustre-manual
  Cloning into lustre-manual...
  fatal: http://review.whamcloud.com/doc/manual/info/refs not found: did
you run git update-server-info on the server?

I've only been able to check out the manual source directly from gerrit:

  git clone http://review.whamcloud.com/p/doc/manual lustre-manual

or,

  git clone ssh://nedb...@review.whamcloud.com:29418/doc/manual lustre-manual


Ned
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [wc-discuss] Seeking contributors for Lustre User Manual

2012-11-13 Thread Ned Bass
On Tue, Nov 13, 2012 at 11:48:35AM -0800, Nathan Rutman wrote:
 Would it be easier to move the manual back to a Wiki?  The low hassle
 factor of wikis has always been a draw for contribution.  The openSFS
 site is up and running with MediaWiki now (wiki.opensfs.org).

Easier? Yes, probably. Better? I personally don't think so.  Wikis are
great collaboration tools for informally sharing information, but I
don't think the paradigm scales well for documents of this size and
complexity. And a wiki isn't the right tool for producing a formal
professional-quality document, which is what I think the Lustre manual
should strive to be.

True, we would lower the bar for contributions, but for that we would
sacrifice the following features that I consider essential.

- Ability to export to multiple formats (pdf, html, epub) from one source
- Consistency of formatting and navigation elements
- A review process for proposed changes that assures a high standard of quality

However, there are some short articles that probably do belong in the
wiki that could be poached from the manual, i.e. installation and
configuration procedures, etc.

Ned
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss