Re: [Lustre-discuss] ldiskfs for MDT and zfs for OSTs?

2013-10-09 Thread Thomas Stibor
Hello Anjana,

I can confirm that this setup works (ZFS-MGS/MDT or LDFISKFS-MGS/MDT and
ZFS-OSS/OST)

I used a Cent OS 6.4
build: 
2.4.0-RC2-gd3f91c4-PRISTINE-2.6.32-358.6.2.el6_lustre.g230b174.x86_64
and the Lustre Packages from
http://downloads.whamcloud.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64/

ZFS is downloaded from ZOL and compiled/installed.

SPL: Loaded module v0.6.2-1
SPL: using hostid 0x
ZFS: Loaded module v0.6.2-1, ZFS pool version 5000, ZFS filesystem version 5

I first run in the same problem:

mkfs.lustre --fsname=lustrefs --reformat --ost --backfstype=zfs .
mkfs.lustre FATAL: unable to prepare backend (22)
mkfs.lustre: exiting with 22 (Invalid argument)

and saw that ZFS libraries in /usr/local/lib where not known to Cent OS 6.4.

A quick:

echo /usr/local/lib  /etc/ld.so.conf.d/zfs.conf
echo /usr/local/lib64  /etc/ld.so.conf.d/zfs.conf
ldconfig

solved the problem.

(LDISKFS)
mkfs.lustre --reformat --mgs /dev/sda16
mkfs.lustre --reformat --fsname=zlust --mgsnode=10.16.0.104@o2ib0 --mdt
--index=0 /dev/sda5

(ZFS)
mkfs.lustre --reformat --mgs --backfstype=zfs mgs/mgs /dev/sda16
mkfs.lustre --reformat --fsname=zlust --mgsnode=10.16.0.104@o2ib0 --mdt
--index=0 --backfstype=zfs mdt0/mdt0 /dev/sda5

is working fine.
The OSS/OST is a debian wheezy box with 70 disks JBOD and kernel
3.6.11-lustre-tstibor-build with patch series 3.x-fc18.series
and SPL/ZFS v0.6.2-1

Best,
 Thomas

On 10/08/2013 05:40 PM, Anjana Kar wrote:
 The git checkout was on Sep. 20. Was the patch before or after?

 The zpool create command successfully creates a raidz2 pool, and mkfs.lustre
 does not complain, but

 [root@cajal kar]# zpool list
 NAME  SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 lustre-ost0  36.2T  2.24M  36.2T 0%  1.00x  ONLINE  -

 [root@cajal kar]# /usr/sbin/mkfs.lustre --fsname=cajalfs --ost 
 --backfstype=zfs --index=0 --mgsnode=10.10.101.171@o2ib lustre-ost0

 [root@cajal kar]# /sbin/service lustre start lustre-ost0
 lustre-ost0 is not a valid lustre label on this node

 I think we'll be splitting up the MDS and OSTs on 2 nodes as some of you 
 said
 there could be other issues down the road, but thanks for all the good 
 suggestions.

 -Anjana

 On 10/07/2013 07:24 PM, Ned Bass wrote:
 I'm guessing your git checkout doesn't include this commit:

 * 010a78e Revert LU-3682 tunefs: prevent tunefs running on a mounted device

 It looks like the LU-3682 patch introduced a bug that could cause your issue,
 so its reverted in the latest master.

 Ned

 On Mon, Oct 07, 2013 at 04:54:13PM -0400, Anjana Kar wrote:
 On 10/07/2013 04:27 PM, Ned Bass wrote:
 On Mon, Oct 07, 2013 at 02:23:32PM -0400, Anjana Kar wrote:
 Here is the exact command used to create a raidz2 pool with 8+2 drives,
 followed by the error messages:

 mkfs.lustre --fsname=cajalfs --reformat --ost --backfstype=zfs
 --index=0 --mgsnode=10.10.101.171@o2ib lustre-ost0/ost0 raidz2
 /dev/sda /dev/sdc /dev/sde /dev/sdg /dev/sdi /dev/sdk /dev/sdm
 /dev/sdo /dev/sdq /dev/sds

 mkfs.lustre FATAL: Invalid filesystem name /dev/sds
 It seems that either the version of mkfs.lustre you are using has a
 parsing bug, or there was some sort of syntax error in the actual
 command entered.  If you are certain your command line is free from
 errors, please post the version of lustre you are using, or report the
 bug in the Lustre issue tracker.

 Thanks,
 Ned
 For building this server, I followed steps from the walk-thru-build*
 for Centos 6.4,
 and added --with-spl and --with-zfs when configuring lustre..
 *https://wiki.hpdd.intel.com/pages/viewpage.action?pageId=8126821

 spl and zfs modules were installed from source for the lustre 2.4 kernel
 2.6.32.358.18.1.el6_lustre2.4

 Device sds appears to be valid, but I will try issuing the command
 using by-path
 names..

 -Anjana
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




smime.p7s
Description: S/MIME Cryptographic Signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre on debian

2013-11-25 Thread Thomas Stibor
Hello Eli,

there are no official Debian packages for Lustre 2.3/2.4/2.5.
The instructions on http://wiki.lustre.org/index.php/Debian_Install
are still working for 2.3/2.4/2.5 with some tiny tricks. You can either
switch to the supported RH Kernel and use them in Debian, so you can
apply the proper patch series. With Lustre 2.5 and configure 
settings --with-zfs --with-spl --disable-ldiskfs
you can use it with the 3.6.11 vanilla Kernel and ZFS in Debian Wheezy.

Regarding backward compatibility there is a post from Andreas Dilger
http://lists.lustre.org/pipermail/lustre-discuss/2013-January/017075.html

Cheers
 Thomas

On Mon, Nov 25, 2013 at 05:48:06PM +0200, E.S. Rosenberg wrote:
 Since in Linux we are mostly a debian shop we'd like to stick with
 debian for our calculation nodes if possible.
 So I wanted to ask the lustre 2.2 instructions for Debian are they
 more or less relevant to lustre 2.4/2.5 or am I going headlong into a
 tall brick wall.
 
 Also are newer clients backwards compatible with older server
 software? I am currently just setting up a demo environment and don't
 know what version of lustre the vendor will install on the full
 fledged  version yet (though I hope they'll go with 2.4/2.5).
 
 Thanks,
 Eli
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre on debian

2013-11-25 Thread Thomas Stibor
Forgot to mention that: I have built Debian Wheezy packages which
are available at:

http://web-docs.gsi.de/~tstibor/lustre/lustre-builds/

On Mon, Nov 25, 2013 at 05:48:06PM +0200, E.S. Rosenberg wrote:
 Since in Linux we are mostly a debian shop we'd like to stick with
 debian for our calculation nodes if possible.
 So I wanted to ask the lustre 2.2 instructions for Debian are they
 more or less relevant to lustre 2.4/2.5 or am I going headlong into a
 tall brick wall.
 
 Also are newer clients backwards compatible with older server
 software? I am currently just setting up a demo environment and don't
 know what version of lustre the vendor will install on the full
 fledged  version yet (though I hope they'll go with 2.4/2.5).
 
 Thanks,
 Eli
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Build - Ubuntu 14.04 LTS

2014-05-03 Thread Thomas Stibor
Hi Steven,

the current kernel version in Ubuntu 14.04 TLS is 3.13.0-24-generic
#46-Ubuntu and
there are still open issues for 3.12 to be solved
(https://jira.hpdd.intel.com/browse/LU-4416) before it can be merged
into the master. If you checkout from git.whamcloud.com the master and
try to compile
(./configure --disable-server --disable-client  make)  on Ubuntu 14.04
Lustre you will run into:

/home/thomas/tmp/lustre-release/libcfs/include/libcfs/linux/linux-mem.h:
In function 'set_shrinker':
/home/thomas/tmp/lustre-release/libcfs/include/libcfs/linux/linux-mem.h:140:10:
error: 'struct shrinker' has no member named 'shrink'
 s-shrink = func;
  ^
cc1: all warnings being treated as errors
make[6]: ***
[/home/thomas/tmp/lustre-release/libcfs/libcfs/linux/linux-tracefile.o]
Error 1
make[5]: *** [/home/thomas/tmp/lustre-release/libcfs/libcfs] Error 2
make[4]: *** [/home/thomas/tmp/lustre-release/libcfs] Error 2
make[3]: *** [_module_/home/thomas/tmp/lustre-release] Error 2
make[3]: Leaving directory `/usr/src/linux-headers-3.13.0-24-generic'
make[2]: *** [modules] Error 2
make[2]: Leaving directory `/home/thomas/tmp/lustre-release'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/thomas/tmp/lustre-release'
make: *** [all] Error 2

For Lustre client only you can do the following (however not for
3.13/3.12). I tested that for Debian Wheezy:

1.) Install kernel package in Debian and unpack in /usr/src

2.) Checkout Lustre and change file debian/rules for building client
only to:

./configure --disable-server --disable-ldfiskfs --with-o2ib
--enable-quota --enable-snmp --with-linux=/usr/src/linux-3.2.51

3.) Run the following script:

#!/bin/bash
unset DEBEMAIL
unset EMAIL
unset DEBFULLNAME
unset NAME

export DEBFULLNAME=Niemand Nobody
export EMAIL=npcompl...@example.com

# Extract lustre version, replace _ by . and remove leading letter v.
LUSTRE_VERSION=$(echo `git describe` | sed -e s/_/\./g | cut -c2-)

# Add entry into debian/changelog such that packages have proper version
names.
dch --newversion $LUSTRE_VERSION --distribution unstable --nomultimaint
-t Build from official master upstream.

#
sh ./autogen.sh

# Build debian packages.
dpkg-buildpackage

# Build modules.
export MODULE_LOC=${PWD}
cd /usr/src/linux
make-kpkg modules_image --append-to-version -lustre-my-build --revision
`date +%Y%m%d`

#
The build DEBs can be e.g. found here:
http://web-docs.gsi.de/~tstibor/lustre/lustre-builds/wheezy/debian-3.2.0-4-amd64/

If you want to build Lustre with server support you have to make sure,
that you actual kernel version matches
the one listed in directory lustre/kernel_patches/series:
-rw-rw-r-- 1 thomas thomas 239 May  3 17:21 2.6-rhel6.series
-rw-rw-r-- 1 thomas thomas 163 May  3 17:21 2.6-sles11.series
-rw-rw-r-- 1 thomas thomas 175 May  3 17:21 3.0-sles11.series
-rw-rw-r-- 1 thomas thomas 178 May  3 17:21 3.0-sles11sp3.series
-rw-rw-r-- 1 thomas thomas 106 May  3 17:21 3.x-fc18.series

The full howto is e.g. here:
https://wiki.hpdd.intel.com/display/PUB/Building+Lustre+from+Source

There is currently another patch in review
(http://review.whamcloud.com/#/c/6427/).
However, it fixes the warnings: e.g.
...
dh_installdeb: This package will soon FTBFS; time to fix it!
dh_fixperms: No compatibility level specified in debian/compat
...

and issues on not used linked libs.

Cheers
 Thomas


On 05/02/2014 09:35 PM, Steven Lokie wrote:
 So I'm building off the 2.5 branch and when I'm trying to build out
 the debian packages I get a general failure 


 rm -f autoMakefile
 make[4]: Leaving directory
 `/home/imemadmin/Desktop/lustre-release/debian/lustre-source/usr/src/modules/lustre/ldiskfs'
 Making distclean in .
 make[4]: Entering directory
 `/home/imemadmin/Desktop/lustre-release/debian/lustre-source/usr/src/modules/lustre'
 test -z .*.cmd .*.flags *.o *.ko *.mod.c .depend .*.1.*
 Modules.symvers Module.symvers || rm -f .*.cmd .*.flags *.o *.ko
 *.mod.c .depend .*.1.* Modules.symvers Module.symvers
 test -z Makefile Rules lustre.spec
 lustre/kernel_patches/targets/2.6-rhel6.target
 lustre/kernel_patches/targets/2.6-rhel5.target
 lustre/kernel_patches/targets/2.6-sles11.target
 lustre/kernel_patches/targets/3.0-sles11.target
 lustre/kernel_patches/targets/3.0-sles11sp3.target
 lustre/kernel_patches/targets/2.6-fc11.target
 lustre/kernel_patches/targets/2.6-fc12.target
 lustre/kernel_patches/targets/2.6-fc15.target
 lustre/kernel_patches/targets/3.x-fc18.target || rm -f Makefile Rules
 lustre.spec lustre/kernel_patches/targets/2.6-rhel6.target
 lustre/kernel_patches/targets/2.6-rhel5.target
 lustre/kernel_patches/targets/2.6-sles11.target
 lustre/kernel_patches/targets/3.0-sles11.target
 lustre/kernel_patches/targets/3.0-sles11sp3.target
 lustre/kernel_patches/targets/2.6-fc11.target
 lustre/kernel_patches/targets/2.6-fc12.target
 lustre/kernel_patches/targets/2.6-fc15.target
 lustre/kernel_patches/targets/3.x-fc18.target
 rm -f 

Re: [lustre-discuss] lshowmount equivalent?

2015-12-15 Thread Thomas Stibor

I have pushed an updated version of lshowmount where warnings and mostly
strcat -> strncat, sprintf -> snprintf are fixed, as well as other issues.

This is a very cool and useful tool which I was not aware before.
I did tested parameter "-l -v -e" combinations on MDT/MGS and OSS,
and it works so far.

Cheers
 Thomas


I did some testing with the "old" lshowmount tool and found it very usefu

On 12/15/2015 01:26 AM, Dilger, Andreas wrote:

I've pushed patch http://review.whamcloud.com/17593 to restore this tool
to the tree, but I'm not even sure if it builds yet.  If someone with a
vested interest in using this tool could take over that patch, then it can
land in a finite time, as I've never used it myself and have lots of other
things to work on.

That means someone who knows how this tool is supposed to work needs to
fix any compile problems, test it a bit manually, and make a short test in
conf-sanity.sh that verifies it continues to work as expected in the
future.

I don't mind to carry this in the Lustre tree, so that it can be updated
as things change (e.g. /proc to /sys conversion and such), but it needs at
minimum a new test so that it doesn't silently break in the future.

Cheers, Andreas

On 2015/12/14, 09:08, "lustre-discuss on behalf of Scott Nolin"
 wrote:


On 12/14/2015 12:43 AM, Dilger, Andreas wrote:
...

Is this a tool that you are using?  IIRC, there wasn't a particular
reason
that it was removed, except that when we asked LLNL (the authors) they
said they were no longer using it, and we couldn't find anyone that was
using it so it was removed in commit b5a7260ae8f along with a bunch of
other old tools.

Thanks for the reply, indeed we were using it. We don't use it daily,
but when doing some things it is really convenient.


If there is a demand for lshowmount I don't think it would be hard to
reinstate.


If it makes more sense for it to be a separate tool outside the lustre
code base, that'd be fine too I think.

Thanks,
Scott





Cheers, Andreas


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Compiling from sources with Debian 8

2015-12-01 Thread Thomas Stibor

Looks like that still staging Lustre modules are loaded first,
and then the remaining newer compiled modules. To make sure that ONLY
Lustre kernel modules from "extra" directory are loaded one can do the 
following:


DEPMOD_DIR='/etc/depmod.d'
mkdir -p ${DEPMOD_DIR}
echo "search extra built-in" > ${DEPMOD_DIR}/lustre.conf
depmod -a

This sets the search order in directory "extra" first and thus your 
compiled modules

in /lib/modules/3.16.0-4-amd64/extra are loaded (not the staging one).

Cheers
 Thomas

On 11/30/2015 06:18 PM, Jérôme BECOT wrote:

So,

I could move on with building the modules. They are successfully build 
and installed but I had to change the module destination directory in 
config/lustre-build-linux.m4 to target /lib/modules/`uname -r`/extra 
directory (else it goes in "kernel" subdirectory and they never get 
loaded)


Then it won't load anyway. I'm getting [440444.832446] lnet: no symbol 
version for module_layout in dmesg
If i copy the Module.symvers generated into the lustre-release folder 
to /usr/src/linux, then i get



[438311.953707] lnet: disagrees about version of symbol 
libcfs_deregister_ioctl

[438311.953710] lnet: Unknown symbol libcfs_deregister_ioctl (err -22)
[438311.953725] lnet: Unknown symbol cfs_str2num_check (err 0)
[438311.953760] lnet: Unknown symbol cfs_gettok (err 0)
[438311.953782] lnet: Unknown symbol lprocfs_call_handler (err 0)

(this is a sample of various symbol error)

Any clue ? I'm close to make it now (i guess)


Le 28/11/2015 18:07, Dilger, Andreas a écrit :
The 2.3.64 version means you are using the in-kernel Lustre client 
(confirmed by the waning messages about "staging"), and not the 2.7.x 
version from the Lustre master branch.


It looks like Ubuntu is building the in-kernel client, and your 
modules are not being loaded.


Cheers, Andreas

On Nov 28, 2015, at 03:26, Jérôme BECOT 
<jerome.be...@inserm.fr<mailto:jerome.be...@inserm.fr>> wrote:


Hi there,

We run lustre 2.6/2.7 on our Centos 6.6 (servers) and 7 (clients) 
cluster. We have a few webservers running Debian that need to access 
the storage. I followed the procedure given by Thomas Stibor about 
Ubuntu 14 last year.


I could successfully compile the binaries and modules after some 
digging. He also left an already compiled lustre 2.7.63 and modules 
for kernel 3.16.0-4 online.


If I install his binaries, it works well. If I install the one 
generated by the procedure, the modules don't load and a weird thing 
happen. Running dmesg warns me about one surprising thing :


  > With his packages
[212417.535369] LNet: HW CPU cores: 1, npartitions: 1
[212417.538430] alg: No test for adler32 (adler32-zlib)
[212417.538456] alg: No test for crc32 (crc32-table)
[212425.548907] Lustre: Lustre: Build Version: 
v2_7_60_0-ge686e57-CHANGED-3.16.0-4-amd64

[212425.565330] LNet: Added LNI 172.27.7.118@tcp1 [8/256/0/180]
[212425.565354] LNet: Accept secure, port 988
[212425.595531] Lustre: Mounted lustre-client


With mine

[209942.090874] LNet: HW CPU cores: 1, npartitions: 1
[209942.092902] alg: No test for adler32 (adler32-zlib)
[209950.092501] lnet: module is from the staging directory, the 
quality is unknown, you have been warned.
[209950.093589] lvfs: module is from the staging directory, the 
quality is unknown, you have been warned.
[209950.094634] obdclass: module is from the staging directory, the 
quality is unknown, you have been warned.
[209950.098595] Lustre: Lustre: Build Version: 
v2_3_64_0-g6e62c21-CHANGED-3.9.0
[209950.09] ptlrpc: module is from the staging directory, the 
quality is unknown, you have been warned.
[209950.104615] ksocklnd: module is from the staging directory, the 
quality is unknown, you have been warned.
[209950.105237] LNetError: 
845:0:(linux-tcpip.c:82:libcfs_ipif_query()) Can't get flags for 
interface eth0
[209950.105862] LNetError: 845:0:(socklnd.c:2824:ksocknal_startup()) 
Can't get interface eth0 info: -515

[209951.104194] LNetError: 105-4: Error -100 starting up LNI tcp
[209951.104852] LustreError: 
845:0:(events.c:566:ptlrpc_init_portals()) network initialisation failed
[209990.787541] ptlrpc: module is from the staging directory, the 
quality is unknown, you have been warned.


I pulled the master git branch, and coul obtain
linux-patch-lustre_2.7.63.0-16-g8524994_all.deb 
lustre-client-modules-3.16.7-ckt11-lustre-my-build_2.7.63.0-16-g8524994_amd64.deb 
lustre-tests_2.7.63.0-16-g8524994_amd64.deb
lustre_2.7.63.0-16-g8524994_amd64.changes 
lustre-dev_2.7.63.0-16-g8524994_amd64.deb 
lustre-utils_2.7.63.0-16-g8524994_amd64.deb

lustre_2.7.63.0-16-g8524994.dsc  lustre-release
lustre_2.7.63.0-16-g8524994.tar.gz 
lustre-source_2.7.63.0-16-g8524994_all.deb


I just don't get it. Why the shown version of the module is 2.3 ?
I tried to compile from the 2.7 branch but the 2.7.0 version doesn't 
compile with kernel 3.16, as suggested in LU-7042


I probably miss someth

Re: [lustre-discuss] Distributing locally....

2016-11-25 Thread Thomas Stibor
Remove in debian/lustre-dev.install the line
-debian/tmp/usr/lib/*.so.*  usr/lib
and it will work.

@@ -1,6 +1,5 @@
 lustre/contrib/README  usr/share/doc/lustre-dev/contrib
 lustre/contrib/mpich-1.2.6-lustre.patch usr/share/doc/lustre-dev/contrib
 debian/tmp/usr/include/lustre/*usr/include/lustre
-debian/tmp/usr/lib/*.so.*  usr/lib
 debian/tmp/usr/lib/*.sousr/lib
 debian/tmp/usr/lib/*.a usr/lib

Note, also make sure to update
debian/changelog
e.g. with cmd

export DEBFULLNAME="My Name"
export EMAIL="myn...@mydomain.cz"

# Extract lustre version, replace "_" by "." and remove leading letter "v".
LUSTRE_VERSION=$(echo `git describe` | sed -e "s/_/\./g" | cut -c2-)
LUSTRE_DEBIAN_REV='1'

# Add entry into debian/changelog such that packages have proper version names.
dch --newversion ${LUSTRE_VERSION}-${LUSTRE_DEBIAN_REV} --distribution unstable 
--nomultimaint -t "Build from official master upstream."

otherwise you get package version names according to top entry in 
debian/changelog
which does not usually match with the GIT version you are compiling.

Cheers
 Thomas

On Fri, Nov 25, 2016 at 10:04:06AM +, Phill Harvey-Smith wrote:
> On 02/11/2016 17:54, Dilger, Andreas wrote:
> >There is a "make debs" target, but I don't know how often this is
> >tested.  That would be the best thing to use for Ubuntu, and if it isn't
> >working then please feel free to report to the list and/or Jira.
> 
> Just got back to this,
> 
> make debs gets further but still seems to crash out
> 
> Steps :
> 
> Get source from git.
> Select 2.8.0 with : git checkout 2.8.0
> sh ./autogen.sh
> ./configure --disable-server --with-o2ib=no
> make
> 
> The make completes correctly, without errors, I have done a make install
> on this node in the past with this version which is up and running
> correctly.
> 
> make debs
> 
> bombs out, log below :
> 
> I've uploaded the log to :
> 
> http://penguin.stats.warwick.ac.uk/~stsxab/Lustre/lustre_make_deb_error.txt
> 
> As the list refused to accept it as it was too big :(
> 
> Cheers.
> 
> Phill.
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Distributing locally....

2016-11-29 Thread Thomas Stibor
Hi Andreas,

I created JIRA ticket https://jira.hpdd.intel.com/browse/LU-8869
for this problem.

Regarding the changelog update I was actually wrong.
The command "make debs" is checking and updating debian/changelog.
I just checked it, and currently it updated debian/changelog to

lustre (2.8.60-24-g075f98e-1) unstable; urgency=low

  * Automated changelog entry update

 -- Brian J. Murrell <br...@interlinx.bc.ca>  Tue, 29 Nov 2016 10:36:40 +0100

Regarding the other simple problem (-debian/tmp/usr/lib/*.so.*) I will
submit a patch.

Cheers
 Thomas


On Fri, Nov 25, 2016 at 08:50:03PM +, Dilger, Andreas wrote:
> On Nov 25, 2016, at 04:27, Thomas Stibor <t.sti...@gsi.de> wrote:
> > 
> > Remove in debian/lustre-dev.install the line
> > -debian/tmp/usr/lib/*.so.*  usr/lib
> > and it will work.
> > 
> > @@ -1,6 +1,5 @@
> > lustre/contrib/README   usr/share/doc/lustre-dev/contrib
> > lustre/contrib/mpich-1.2.6-lustre.patch usr/share/doc/lustre-dev/contrib
> > debian/tmp/usr/include/lustre/* usr/include/lustre
> > -debian/tmp/usr/lib/*.so.*  usr/lib
> > debian/tmp/usr/lib/*.so usr/lib
> > debian/tmp/usr/lib/*.a  usr/lib
> 
> Thomas or Phill,
> could you please submit a patch to Gerrit with this change.
> 
> > Note, also make sure to update
> > debian/changelog
> > e.g. with cmd
> > 
> > export DEBFULLNAME="My Name"
> > export EMAIL="myn...@mydomain.cz"
> > 
> > # Extract lustre version, replace "_" by "." and remove leading letter "v".
> > LUSTRE_VERSION=$(echo `git describe` | sed -e "s/_/\./g" | cut -c2-)
> > LUSTRE_DEBIAN_REV='1'
> > 
> > # Add entry into debian/changelog such that packages have proper version 
> > names.
> > dch --newversion ${LUSTRE_VERSION}-${LUSTRE_DEBIAN_REV} --distribution 
> > unstable --nomultimaint -t "Build from official master upstream."
> > 
> > otherwise you get package version names according to top entry in 
> > debian/changelog
> > which does not usually match with the GIT version you are compiling.
> 
> It would be nice to add this as part of the "make debs" target so that the 
> build is
> done with the right version.  Bonus points if it checks the top changelog 
> entry to
> see there is already an entry for the current version and doesn't add a new 
> entry.
> 
> Cheers, Andreas
> 
> > Cheers
> > Thomas
> > 
> > On Fri, Nov 25, 2016 at 10:04:06AM +, Phill Harvey-Smith wrote:
> >> On 02/11/2016 17:54, Dilger, Andreas wrote:
> >>> There is a "make debs" target, but I don't know how often this is
> >>> tested.  That would be the best thing to use for Ubuntu, and if it isn't
> >>> working then please feel free to report to the list and/or Jira.
> >> 
> >> Just got back to this,
> >> 
> >> make debs gets further but still seems to crash out
> >> 
> >> Steps :
> >> 
> >> Get source from git.
> >> Select 2.8.0 with : git checkout 2.8.0
> >> sh ./autogen.sh
> >> ./configure --disable-server --with-o2ib=no
> >> make
> >> 
> >> The make completes correctly, without errors, I have done a make install
> >> on this node in the past with this version which is up and running
> >> correctly.
> >> 
> >> make debs
> >> 
> >> bombs out, log below :
> >> 
> >> I've uploaded the log to :
> >> 
> >> http://penguin.stats.warwick.ac.uk/~stsxab/Lustre/lustre_make_deb_error.txt
> >> 
> >> As the list refused to accept it as it was too big :(
> >> 
> >> Cheers.
> >> 
> >> Phill.
> >> 
> >> ___
> >> lustre-discuss mailing list
> >> lustre-discuss@lists.lustre.org
> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre won't build anymore on RHEL 7.3

2016-11-30 Thread Thomas Stibor
Hi there,

on DEB distro's there is a similar problem, due to conflicts of (old)
staged Lustre modules and the e.g. new installed modules. The result is, that
first the staged modules are loaded, and then the loaders tries
to load the remaining/missing new modules and fails. On DEB distro's
the problem can be overcome by building the packages,e.g. with
--with-kmp-moddir=updates and telling the mod loader to use
search order: "search updates built-in". It first looks into
/lib/modules/`uname -r`/updates and then in the remaining directories.
The search order string "search updates built-in"
must be placed in e.g. /etc/depmod.d/lustre.conf, and depmod -a must be called.
Probably this could work also on RPM distro's.

Cheers
 Thomas
 

On Wed, Nov 30, 2016 at 08:55:16AM -0800, Riccardo Veraldi wrote:
> On 11/29/16 10:40 PM, Jeff Johnson wrote:
> > I did some updating to this kernel as well using the cr repo. I ran
> > into some similar differences. There is a new version of kmod and it
> > appeared some file locations differed from 7.2.
> yes indeed they are in different locations from the usual kernel/fs/lustre/
> 
> 
> >
> > --Jeff
> >
> > On Tue, Nov 29, 2016 at 10:27 PM, Riccardo Veraldi
> > >
> > wrote:
> >
> > I fixed it building Lustre 2.8.60 and it works.
> > Anyway the kernel modules osd_zfs.ko and so on are placed in
> > /lib/modules/3.10.0-514 .el7.x86_64/fs/
> > instead of /lib/modules/3.10.0-514
> > .el7.x86_64/kernel/fs/lustre and
> > /lib/modules/3.10.0-514.el7.x86_64/kernel/fs/extra
> > so I had to modify the src.rpm accordingly to rebuild it properly.
> > Any hint about this,  on how to restore the standard path of the
> > lustre,
> > lnet, osd_zfs  kernel modules ?
> >
> > thank you
> >
> > Riccardo
> >
> >
> > On 11/29/16 2:25 PM, Riccardo Veraldi wrote:
> > > Hello.
> > >
> > > Today I rebuilt Lustre for the new kernel which is inside RHEL
> > > 7.3/CentOS 7.3 3.10.0-514 .el7.x86_64
> > > I do not know what changed in the distribution but it is not
> > compiling
> > > anymore.
> > > What changed in my environment was a yum update which brought
> > the system
> > > from RHEL 7.2 kernel 3.10.0-327.36.3.el7.x86_64
> > > to RHEL7.3 kernel 3.10.0-514.el7.x86_64
> > > Anyone has the same issue ?
> > >
> > > thank you
> > >
> > > CC:gcc
> > > LD:/usr/bin/ld -m elf_x86_64
> > > CPPFLAGS:  -include /root/rpmbuild/BUILD/lustre-2.8.0/undef.h
> > > -include /root/rpmbuild/BUILD/lustre-2.8.0/config.h
> > > -I/root/rpmbuild/BUILD/lustre-2.8.0/libcfs/include
> > > -I/root/rpmbuild/BUILD/lustre-2.8.0/lnet/include
> > > -I/root/rpmbuild/BUILD/lustre-2.8.0/lustre/include
> > > CFLAGS:-g -O2 -Werror -Wall -Werror
> > > EXTRA_KCFLAGS: -include /root/rpmbuild/BUILD/lustre-2.8.0/undef.h
> > > -include /root/rpmbuild/BUILD/lustre-2.8.0/config.h  -g
> > > -I/root/rpmbuild/BUILD/lustre-2.8.0/libcfs/include
> > > -I/root/rpmbuild/BUILD/lustre-2.8.0/lnet/include
> > > -I/root/rpmbuild/BUILD/lustre-2.8.0/lustre/include
> > >
> > > Type 'make' to build Lustre.
> > > + make -j2 -s
> > > Making all in .
> > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c: In
> > > function 'kiblnd_hdev_get_attr':
> > >
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2153:9:
> > > error: implicit declaration of function 'ib_query_device'
> > > [-Werror=implicit-function-declaration]
> > >  rc = ib_query_device(hdev->ibh_ibdev, attr);
> > >  ^
> > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c: In
> > > function 'kiblnd_dev_need_failover':
> > >
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2251:9:
> > > error: passing argument 1 of 'rdma_create_id' from incompatible
> > pointer
> > > type [-Werror]
> > >  cmid = kiblnd_rdma_create_id(kiblnd_dummy_callback, dev,
> > > RDMA_PS_TCP,
> > >  ^
> > > In file included from
> > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0,
> > >  from
> > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42:
> > >
> > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20:
> > > note: expected 'struct net *' but argument is of type 'int
> > (*)(struct
> > > rdma_cm_id *, struct rdma_cm_event *)'
> > >  struct rdma_cm_id *rdma_create_id(struct net *net,
> > > ^
> > >
> > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2251:9:
> > > error: passing argument 2 of 'rdma_create_id' from incompatible
> > pointer
> > > type [-Werror]
> > >  cmid = 

Re: [lustre-discuss] Clients looses IB connection to OSS.

2017-05-01 Thread Thomas Stibor
Hi,

see JIRA: https://jira.hpdd.intel.com/browse/LU-5718

What seems to work as a quick fix (for older versions) is to set the
value of parameter max_pages_per_rpc=64

As written in https://jira.hpdd.intel.com/browse/LU-5718
the issue is resolved, however for upcoming version 2.10.0

Cheers
 Thomas

On Mon, May 01, 2017 at 04:47:32PM +0200, Hans Henrik Happe wrote:
> Hi,
> 
> We have experienced problems with loosing connection to OSS. It starts with:
> 
> May  1 03:35:46 node872 kernel: LNetError:
> 5545:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) RDMA has too many
> fragments for peer 10.21.10.116@o2ib (256), src idx/frags: 128/236 dst
> idx/frags: 128/236
> May  1 03:35:46 node872 kernel: LNetError:
> 5545:0:(o2iblnd_cb.c:1689:kiblnd_reply()) Can't setup rdma for GET from
> 10.21.10.116@o2ib: -90
> 
> The rest of the log is attached.
> 
> After this Lustre access is very slow. I.e. a 'df' can take minutes.
> Also 'lctl ping' to the OSS give I/O errors. Doing 'lnet net del/add'
> makes ping work again until file I/O starts. Then I/O errors again.
> 
> We use both IB and TCP on servers, so no routers.
> 
> In the attached log astro-OST0001 has been moved to the other server in
> the HA pair. This is because 'lctl dl -t' showed strange output when on
> the right server:
> 
> # lctl dl -t
>   0 UP mgc MGC10.21.10.102@o2ib 0b0bbbce-63b6-bf47-403c-28f0c53e8307 5
>   1 UP lov astro-clilov-88107412e800
> 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 4
>   2 UP lmv astro-clilmv-88107412e800
> 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 4
>   3 UP mdc astro-MDT-mdc-88107412e800
> 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.102@o2ib
>   4 UP osc astro-OST0002-osc-88107412e800
> 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.116@o2ib
>   5 UP osc astro-OST0001-osc-88107412e800
> 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 172.20.10.115@tcp1
>   6 UP osc astro-OST0003-osc-88107412e800
> 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.117@o2ib
>   7 UP osc astro-OST-osc-88107412e800
> 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.114@o2ib
> 
> So astro-OST0001 seems to be connected through 172.20.10.115@tcp1, even
> though it uses 10.21.10.115@o2ib (verified by performance test and
> disabling tcp1 on IB nodes).
> 
> Please ask for more details if needed.
> 
> Cheers,
> Hans Henrik
> 

> May  1 03:35:46 node872 kernel: LNetError: 
> 5545:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) RDMA has too many fragments for 
> peer 10.21.10.116@o2ib (256), src idx/frags: 128/236 dst idx/frags: 128/236
> May  1 03:35:46 node872 kernel: LNetError: 
> 5545:0:(o2iblnd_cb.c:1689:kiblnd_reply()) Can't setup rdma for GET from 
> 10.21.10.116@o2ib: -90
> May  1 03:35:46 node872 kernel: LustreError: 
> 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 
> 88103dd63000
> May  1 03:35:46 node872 kernel: Lustre: 
> 5606:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has 
> failed due to network error: [sent 1493602541/real 1493602541]  
> req@880e99cea080 x1565604440535580/t0(0) 
> o4->astro-OST0002-osc-881070c95c00@10.21.10.116@o2ib:6/4 lens 608/448 e 0 
> to 1 dl 1493602585 ref 2 fl Rpc:X/0/ rc 0/-1
> May  1 03:35:46 node872 kernel: Lustre: astro-OST0002-osc-881070c95c00: 
> Connection to astro-OST0002 (at 10.21.10.116@o2ib) was lost; in progress 
> operations using this service will wait for recovery to complete
> May  1 03:35:46 node872 kernel: Lustre: astro-OST0002-osc-881070c95c00: 
> Connection restored to 10.21.10.116@o2ib (at 10.21.10.116@o2ib)
> May  1 03:35:46 node872 kernel: LustreError: 
> 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 
> 88103dd63000
> May  1 03:35:46 node872 kernel: LustreError: 
> 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 
> 88103dd63000
> May  1 03:35:46 node872 kernel: LustreError: 
> 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 
> 88103dd63000
> May  1 03:35:46 node872 kernel: LustreError: 
> 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 
> 88103dd63000
> May  1 03:35:46 node872 kernel: LustreError: 
> 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 
> 88103dd63000
> May  1 03:35:46 node872 kernel: LustreError: 
> 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 
> 88103dd63000
> May  1 03:35:46 node872 kernel: LustreError: 
> 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 
> 88103dd63000
> May  1 03:35:52 node872 kernel: Lustre: 
> 5579:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed 
> out for slow reply: [sent 1493602546/real 1493602546]  req@88103e0f10c0 
> x1565604440535684/t0(0) 
> o8->astro-OST0002-osc-881070c95c00@10.21.10.116@o2ib:28/4 lens 520/544 e 
> 0 to 1 dl 1493602552 ref 1 fl Rpc:XN/0/ rc 0/-1
> May  1 03:35:52 node872 kernel: Lustre: 
>