Re: [Lustre-discuss] ldiskfs for MDT and zfs for OSTs?
Hello Anjana, I can confirm that this setup works (ZFS-MGS/MDT or LDFISKFS-MGS/MDT and ZFS-OSS/OST) I used a Cent OS 6.4 build: 2.4.0-RC2-gd3f91c4-PRISTINE-2.6.32-358.6.2.el6_lustre.g230b174.x86_64 and the Lustre Packages from http://downloads.whamcloud.com/public/lustre/latest-feature-release/el6/server/RPMS/x86_64/ ZFS is downloaded from ZOL and compiled/installed. SPL: Loaded module v0.6.2-1 SPL: using hostid 0x ZFS: Loaded module v0.6.2-1, ZFS pool version 5000, ZFS filesystem version 5 I first run in the same problem: mkfs.lustre --fsname=lustrefs --reformat --ost --backfstype=zfs . mkfs.lustre FATAL: unable to prepare backend (22) mkfs.lustre: exiting with 22 (Invalid argument) and saw that ZFS libraries in /usr/local/lib where not known to Cent OS 6.4. A quick: echo /usr/local/lib /etc/ld.so.conf.d/zfs.conf echo /usr/local/lib64 /etc/ld.so.conf.d/zfs.conf ldconfig solved the problem. (LDISKFS) mkfs.lustre --reformat --mgs /dev/sda16 mkfs.lustre --reformat --fsname=zlust --mgsnode=10.16.0.104@o2ib0 --mdt --index=0 /dev/sda5 (ZFS) mkfs.lustre --reformat --mgs --backfstype=zfs mgs/mgs /dev/sda16 mkfs.lustre --reformat --fsname=zlust --mgsnode=10.16.0.104@o2ib0 --mdt --index=0 --backfstype=zfs mdt0/mdt0 /dev/sda5 is working fine. The OSS/OST is a debian wheezy box with 70 disks JBOD and kernel 3.6.11-lustre-tstibor-build with patch series 3.x-fc18.series and SPL/ZFS v0.6.2-1 Best, Thomas On 10/08/2013 05:40 PM, Anjana Kar wrote: The git checkout was on Sep. 20. Was the patch before or after? The zpool create command successfully creates a raidz2 pool, and mkfs.lustre does not complain, but [root@cajal kar]# zpool list NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT lustre-ost0 36.2T 2.24M 36.2T 0% 1.00x ONLINE - [root@cajal kar]# /usr/sbin/mkfs.lustre --fsname=cajalfs --ost --backfstype=zfs --index=0 --mgsnode=10.10.101.171@o2ib lustre-ost0 [root@cajal kar]# /sbin/service lustre start lustre-ost0 lustre-ost0 is not a valid lustre label on this node I think we'll be splitting up the MDS and OSTs on 2 nodes as some of you said there could be other issues down the road, but thanks for all the good suggestions. -Anjana On 10/07/2013 07:24 PM, Ned Bass wrote: I'm guessing your git checkout doesn't include this commit: * 010a78e Revert LU-3682 tunefs: prevent tunefs running on a mounted device It looks like the LU-3682 patch introduced a bug that could cause your issue, so its reverted in the latest master. Ned On Mon, Oct 07, 2013 at 04:54:13PM -0400, Anjana Kar wrote: On 10/07/2013 04:27 PM, Ned Bass wrote: On Mon, Oct 07, 2013 at 02:23:32PM -0400, Anjana Kar wrote: Here is the exact command used to create a raidz2 pool with 8+2 drives, followed by the error messages: mkfs.lustre --fsname=cajalfs --reformat --ost --backfstype=zfs --index=0 --mgsnode=10.10.101.171@o2ib lustre-ost0/ost0 raidz2 /dev/sda /dev/sdc /dev/sde /dev/sdg /dev/sdi /dev/sdk /dev/sdm /dev/sdo /dev/sdq /dev/sds mkfs.lustre FATAL: Invalid filesystem name /dev/sds It seems that either the version of mkfs.lustre you are using has a parsing bug, or there was some sort of syntax error in the actual command entered. If you are certain your command line is free from errors, please post the version of lustre you are using, or report the bug in the Lustre issue tracker. Thanks, Ned For building this server, I followed steps from the walk-thru-build* for Centos 6.4, and added --with-spl and --with-zfs when configuring lustre.. *https://wiki.hpdd.intel.com/pages/viewpage.action?pageId=8126821 spl and zfs modules were installed from source for the lustre 2.4 kernel 2.6.32.358.18.1.el6_lustre2.4 Device sds appears to be valid, but I will try issuing the command using by-path names.. -Anjana ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss smime.p7s Description: S/MIME Cryptographic Signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre on debian
Hello Eli, there are no official Debian packages for Lustre 2.3/2.4/2.5. The instructions on http://wiki.lustre.org/index.php/Debian_Install are still working for 2.3/2.4/2.5 with some tiny tricks. You can either switch to the supported RH Kernel and use them in Debian, so you can apply the proper patch series. With Lustre 2.5 and configure settings --with-zfs --with-spl --disable-ldiskfs you can use it with the 3.6.11 vanilla Kernel and ZFS in Debian Wheezy. Regarding backward compatibility there is a post from Andreas Dilger http://lists.lustre.org/pipermail/lustre-discuss/2013-January/017075.html Cheers Thomas On Mon, Nov 25, 2013 at 05:48:06PM +0200, E.S. Rosenberg wrote: Since in Linux we are mostly a debian shop we'd like to stick with debian for our calculation nodes if possible. So I wanted to ask the lustre 2.2 instructions for Debian are they more or less relevant to lustre 2.4/2.5 or am I going headlong into a tall brick wall. Also are newer clients backwards compatible with older server software? I am currently just setting up a demo environment and don't know what version of lustre the vendor will install on the full fledged version yet (though I hope they'll go with 2.4/2.5). Thanks, Eli ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre on debian
Forgot to mention that: I have built Debian Wheezy packages which are available at: http://web-docs.gsi.de/~tstibor/lustre/lustre-builds/ On Mon, Nov 25, 2013 at 05:48:06PM +0200, E.S. Rosenberg wrote: Since in Linux we are mostly a debian shop we'd like to stick with debian for our calculation nodes if possible. So I wanted to ask the lustre 2.2 instructions for Debian are they more or less relevant to lustre 2.4/2.5 or am I going headlong into a tall brick wall. Also are newer clients backwards compatible with older server software? I am currently just setting up a demo environment and don't know what version of lustre the vendor will install on the full fledged version yet (though I hope they'll go with 2.4/2.5). Thanks, Eli ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre Build - Ubuntu 14.04 LTS
Hi Steven, the current kernel version in Ubuntu 14.04 TLS is 3.13.0-24-generic #46-Ubuntu and there are still open issues for 3.12 to be solved (https://jira.hpdd.intel.com/browse/LU-4416) before it can be merged into the master. If you checkout from git.whamcloud.com the master and try to compile (./configure --disable-server --disable-client make) on Ubuntu 14.04 Lustre you will run into: /home/thomas/tmp/lustre-release/libcfs/include/libcfs/linux/linux-mem.h: In function 'set_shrinker': /home/thomas/tmp/lustre-release/libcfs/include/libcfs/linux/linux-mem.h:140:10: error: 'struct shrinker' has no member named 'shrink' s-shrink = func; ^ cc1: all warnings being treated as errors make[6]: *** [/home/thomas/tmp/lustre-release/libcfs/libcfs/linux/linux-tracefile.o] Error 1 make[5]: *** [/home/thomas/tmp/lustre-release/libcfs/libcfs] Error 2 make[4]: *** [/home/thomas/tmp/lustre-release/libcfs] Error 2 make[3]: *** [_module_/home/thomas/tmp/lustre-release] Error 2 make[3]: Leaving directory `/usr/src/linux-headers-3.13.0-24-generic' make[2]: *** [modules] Error 2 make[2]: Leaving directory `/home/thomas/tmp/lustre-release' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/thomas/tmp/lustre-release' make: *** [all] Error 2 For Lustre client only you can do the following (however not for 3.13/3.12). I tested that for Debian Wheezy: 1.) Install kernel package in Debian and unpack in /usr/src 2.) Checkout Lustre and change file debian/rules for building client only to: ./configure --disable-server --disable-ldfiskfs --with-o2ib --enable-quota --enable-snmp --with-linux=/usr/src/linux-3.2.51 3.) Run the following script: #!/bin/bash unset DEBEMAIL unset EMAIL unset DEBFULLNAME unset NAME export DEBFULLNAME=Niemand Nobody export EMAIL=npcompl...@example.com # Extract lustre version, replace _ by . and remove leading letter v. LUSTRE_VERSION=$(echo `git describe` | sed -e s/_/\./g | cut -c2-) # Add entry into debian/changelog such that packages have proper version names. dch --newversion $LUSTRE_VERSION --distribution unstable --nomultimaint -t Build from official master upstream. # sh ./autogen.sh # Build debian packages. dpkg-buildpackage # Build modules. export MODULE_LOC=${PWD} cd /usr/src/linux make-kpkg modules_image --append-to-version -lustre-my-build --revision `date +%Y%m%d` # The build DEBs can be e.g. found here: http://web-docs.gsi.de/~tstibor/lustre/lustre-builds/wheezy/debian-3.2.0-4-amd64/ If you want to build Lustre with server support you have to make sure, that you actual kernel version matches the one listed in directory lustre/kernel_patches/series: -rw-rw-r-- 1 thomas thomas 239 May 3 17:21 2.6-rhel6.series -rw-rw-r-- 1 thomas thomas 163 May 3 17:21 2.6-sles11.series -rw-rw-r-- 1 thomas thomas 175 May 3 17:21 3.0-sles11.series -rw-rw-r-- 1 thomas thomas 178 May 3 17:21 3.0-sles11sp3.series -rw-rw-r-- 1 thomas thomas 106 May 3 17:21 3.x-fc18.series The full howto is e.g. here: https://wiki.hpdd.intel.com/display/PUB/Building+Lustre+from+Source There is currently another patch in review (http://review.whamcloud.com/#/c/6427/). However, it fixes the warnings: e.g. ... dh_installdeb: This package will soon FTBFS; time to fix it! dh_fixperms: No compatibility level specified in debian/compat ... and issues on not used linked libs. Cheers Thomas On 05/02/2014 09:35 PM, Steven Lokie wrote: So I'm building off the 2.5 branch and when I'm trying to build out the debian packages I get a general failure rm -f autoMakefile make[4]: Leaving directory `/home/imemadmin/Desktop/lustre-release/debian/lustre-source/usr/src/modules/lustre/ldiskfs' Making distclean in . make[4]: Entering directory `/home/imemadmin/Desktop/lustre-release/debian/lustre-source/usr/src/modules/lustre' test -z .*.cmd .*.flags *.o *.ko *.mod.c .depend .*.1.* Modules.symvers Module.symvers || rm -f .*.cmd .*.flags *.o *.ko *.mod.c .depend .*.1.* Modules.symvers Module.symvers test -z Makefile Rules lustre.spec lustre/kernel_patches/targets/2.6-rhel6.target lustre/kernel_patches/targets/2.6-rhel5.target lustre/kernel_patches/targets/2.6-sles11.target lustre/kernel_patches/targets/3.0-sles11.target lustre/kernel_patches/targets/3.0-sles11sp3.target lustre/kernel_patches/targets/2.6-fc11.target lustre/kernel_patches/targets/2.6-fc12.target lustre/kernel_patches/targets/2.6-fc15.target lustre/kernel_patches/targets/3.x-fc18.target || rm -f Makefile Rules lustre.spec lustre/kernel_patches/targets/2.6-rhel6.target lustre/kernel_patches/targets/2.6-rhel5.target lustre/kernel_patches/targets/2.6-sles11.target lustre/kernel_patches/targets/3.0-sles11.target lustre/kernel_patches/targets/3.0-sles11sp3.target lustre/kernel_patches/targets/2.6-fc11.target lustre/kernel_patches/targets/2.6-fc12.target lustre/kernel_patches/targets/2.6-fc15.target lustre/kernel_patches/targets/3.x-fc18.target rm -f
Re: [lustre-discuss] lshowmount equivalent?
I have pushed an updated version of lshowmount where warnings and mostly strcat -> strncat, sprintf -> snprintf are fixed, as well as other issues. This is a very cool and useful tool which I was not aware before. I did tested parameter "-l -v -e" combinations on MDT/MGS and OSS, and it works so far. Cheers Thomas I did some testing with the "old" lshowmount tool and found it very usefu On 12/15/2015 01:26 AM, Dilger, Andreas wrote: I've pushed patch http://review.whamcloud.com/17593 to restore this tool to the tree, but I'm not even sure if it builds yet. If someone with a vested interest in using this tool could take over that patch, then it can land in a finite time, as I've never used it myself and have lots of other things to work on. That means someone who knows how this tool is supposed to work needs to fix any compile problems, test it a bit manually, and make a short test in conf-sanity.sh that verifies it continues to work as expected in the future. I don't mind to carry this in the Lustre tree, so that it can be updated as things change (e.g. /proc to /sys conversion and such), but it needs at minimum a new test so that it doesn't silently break in the future. Cheers, Andreas On 2015/12/14, 09:08, "lustre-discuss on behalf of Scott Nolin"wrote: On 12/14/2015 12:43 AM, Dilger, Andreas wrote: ... Is this a tool that you are using? IIRC, there wasn't a particular reason that it was removed, except that when we asked LLNL (the authors) they said they were no longer using it, and we couldn't find anyone that was using it so it was removed in commit b5a7260ae8f along with a bunch of other old tools. Thanks for the reply, indeed we were using it. We don't use it daily, but when doing some things it is really convenient. If there is a demand for lshowmount I don't think it would be hard to reinstate. If it makes more sense for it to be a separate tool outside the lustre code base, that'd be fine too I think. Thanks, Scott Cheers, Andreas ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Compiling from sources with Debian 8
Looks like that still staging Lustre modules are loaded first, and then the remaining newer compiled modules. To make sure that ONLY Lustre kernel modules from "extra" directory are loaded one can do the following: DEPMOD_DIR='/etc/depmod.d' mkdir -p ${DEPMOD_DIR} echo "search extra built-in" > ${DEPMOD_DIR}/lustre.conf depmod -a This sets the search order in directory "extra" first and thus your compiled modules in /lib/modules/3.16.0-4-amd64/extra are loaded (not the staging one). Cheers Thomas On 11/30/2015 06:18 PM, Jérôme BECOT wrote: So, I could move on with building the modules. They are successfully build and installed but I had to change the module destination directory in config/lustre-build-linux.m4 to target /lib/modules/`uname -r`/extra directory (else it goes in "kernel" subdirectory and they never get loaded) Then it won't load anyway. I'm getting [440444.832446] lnet: no symbol version for module_layout in dmesg If i copy the Module.symvers generated into the lustre-release folder to /usr/src/linux, then i get [438311.953707] lnet: disagrees about version of symbol libcfs_deregister_ioctl [438311.953710] lnet: Unknown symbol libcfs_deregister_ioctl (err -22) [438311.953725] lnet: Unknown symbol cfs_str2num_check (err 0) [438311.953760] lnet: Unknown symbol cfs_gettok (err 0) [438311.953782] lnet: Unknown symbol lprocfs_call_handler (err 0) (this is a sample of various symbol error) Any clue ? I'm close to make it now (i guess) Le 28/11/2015 18:07, Dilger, Andreas a écrit : The 2.3.64 version means you are using the in-kernel Lustre client (confirmed by the waning messages about "staging"), and not the 2.7.x version from the Lustre master branch. It looks like Ubuntu is building the in-kernel client, and your modules are not being loaded. Cheers, Andreas On Nov 28, 2015, at 03:26, Jérôme BECOT <jerome.be...@inserm.fr<mailto:jerome.be...@inserm.fr>> wrote: Hi there, We run lustre 2.6/2.7 on our Centos 6.6 (servers) and 7 (clients) cluster. We have a few webservers running Debian that need to access the storage. I followed the procedure given by Thomas Stibor about Ubuntu 14 last year. I could successfully compile the binaries and modules after some digging. He also left an already compiled lustre 2.7.63 and modules for kernel 3.16.0-4 online. If I install his binaries, it works well. If I install the one generated by the procedure, the modules don't load and a weird thing happen. Running dmesg warns me about one surprising thing : > With his packages [212417.535369] LNet: HW CPU cores: 1, npartitions: 1 [212417.538430] alg: No test for adler32 (adler32-zlib) [212417.538456] alg: No test for crc32 (crc32-table) [212425.548907] Lustre: Lustre: Build Version: v2_7_60_0-ge686e57-CHANGED-3.16.0-4-amd64 [212425.565330] LNet: Added LNI 172.27.7.118@tcp1 [8/256/0/180] [212425.565354] LNet: Accept secure, port 988 [212425.595531] Lustre: Mounted lustre-client With mine [209942.090874] LNet: HW CPU cores: 1, npartitions: 1 [209942.092902] alg: No test for adler32 (adler32-zlib) [209950.092501] lnet: module is from the staging directory, the quality is unknown, you have been warned. [209950.093589] lvfs: module is from the staging directory, the quality is unknown, you have been warned. [209950.094634] obdclass: module is from the staging directory, the quality is unknown, you have been warned. [209950.098595] Lustre: Lustre: Build Version: v2_3_64_0-g6e62c21-CHANGED-3.9.0 [209950.09] ptlrpc: module is from the staging directory, the quality is unknown, you have been warned. [209950.104615] ksocklnd: module is from the staging directory, the quality is unknown, you have been warned. [209950.105237] LNetError: 845:0:(linux-tcpip.c:82:libcfs_ipif_query()) Can't get flags for interface eth0 [209950.105862] LNetError: 845:0:(socklnd.c:2824:ksocknal_startup()) Can't get interface eth0 info: -515 [209951.104194] LNetError: 105-4: Error -100 starting up LNI tcp [209951.104852] LustreError: 845:0:(events.c:566:ptlrpc_init_portals()) network initialisation failed [209990.787541] ptlrpc: module is from the staging directory, the quality is unknown, you have been warned. I pulled the master git branch, and coul obtain linux-patch-lustre_2.7.63.0-16-g8524994_all.deb lustre-client-modules-3.16.7-ckt11-lustre-my-build_2.7.63.0-16-g8524994_amd64.deb lustre-tests_2.7.63.0-16-g8524994_amd64.deb lustre_2.7.63.0-16-g8524994_amd64.changes lustre-dev_2.7.63.0-16-g8524994_amd64.deb lustre-utils_2.7.63.0-16-g8524994_amd64.deb lustre_2.7.63.0-16-g8524994.dsc lustre-release lustre_2.7.63.0-16-g8524994.tar.gz lustre-source_2.7.63.0-16-g8524994_all.deb I just don't get it. Why the shown version of the module is 2.3 ? I tried to compile from the 2.7 branch but the 2.7.0 version doesn't compile with kernel 3.16, as suggested in LU-7042 I probably miss someth
Re: [lustre-discuss] Distributing locally....
Remove in debian/lustre-dev.install the line -debian/tmp/usr/lib/*.so.* usr/lib and it will work. @@ -1,6 +1,5 @@ lustre/contrib/README usr/share/doc/lustre-dev/contrib lustre/contrib/mpich-1.2.6-lustre.patch usr/share/doc/lustre-dev/contrib debian/tmp/usr/include/lustre/*usr/include/lustre -debian/tmp/usr/lib/*.so.* usr/lib debian/tmp/usr/lib/*.sousr/lib debian/tmp/usr/lib/*.a usr/lib Note, also make sure to update debian/changelog e.g. with cmd export DEBFULLNAME="My Name" export EMAIL="myn...@mydomain.cz" # Extract lustre version, replace "_" by "." and remove leading letter "v". LUSTRE_VERSION=$(echo `git describe` | sed -e "s/_/\./g" | cut -c2-) LUSTRE_DEBIAN_REV='1' # Add entry into debian/changelog such that packages have proper version names. dch --newversion ${LUSTRE_VERSION}-${LUSTRE_DEBIAN_REV} --distribution unstable --nomultimaint -t "Build from official master upstream." otherwise you get package version names according to top entry in debian/changelog which does not usually match with the GIT version you are compiling. Cheers Thomas On Fri, Nov 25, 2016 at 10:04:06AM +, Phill Harvey-Smith wrote: > On 02/11/2016 17:54, Dilger, Andreas wrote: > >There is a "make debs" target, but I don't know how often this is > >tested. That would be the best thing to use for Ubuntu, and if it isn't > >working then please feel free to report to the list and/or Jira. > > Just got back to this, > > make debs gets further but still seems to crash out > > Steps : > > Get source from git. > Select 2.8.0 with : git checkout 2.8.0 > sh ./autogen.sh > ./configure --disable-server --with-o2ib=no > make > > The make completes correctly, without errors, I have done a make install > on this node in the past with this version which is up and running > correctly. > > make debs > > bombs out, log below : > > I've uploaded the log to : > > http://penguin.stats.warwick.ac.uk/~stsxab/Lustre/lustre_make_deb_error.txt > > As the list refused to accept it as it was too big :( > > Cheers. > > Phill. > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Distributing locally....
Hi Andreas, I created JIRA ticket https://jira.hpdd.intel.com/browse/LU-8869 for this problem. Regarding the changelog update I was actually wrong. The command "make debs" is checking and updating debian/changelog. I just checked it, and currently it updated debian/changelog to lustre (2.8.60-24-g075f98e-1) unstable; urgency=low * Automated changelog entry update -- Brian J. Murrell <br...@interlinx.bc.ca> Tue, 29 Nov 2016 10:36:40 +0100 Regarding the other simple problem (-debian/tmp/usr/lib/*.so.*) I will submit a patch. Cheers Thomas On Fri, Nov 25, 2016 at 08:50:03PM +, Dilger, Andreas wrote: > On Nov 25, 2016, at 04:27, Thomas Stibor <t.sti...@gsi.de> wrote: > > > > Remove in debian/lustre-dev.install the line > > -debian/tmp/usr/lib/*.so.* usr/lib > > and it will work. > > > > @@ -1,6 +1,5 @@ > > lustre/contrib/README usr/share/doc/lustre-dev/contrib > > lustre/contrib/mpich-1.2.6-lustre.patch usr/share/doc/lustre-dev/contrib > > debian/tmp/usr/include/lustre/* usr/include/lustre > > -debian/tmp/usr/lib/*.so.* usr/lib > > debian/tmp/usr/lib/*.so usr/lib > > debian/tmp/usr/lib/*.a usr/lib > > Thomas or Phill, > could you please submit a patch to Gerrit with this change. > > > Note, also make sure to update > > debian/changelog > > e.g. with cmd > > > > export DEBFULLNAME="My Name" > > export EMAIL="myn...@mydomain.cz" > > > > # Extract lustre version, replace "_" by "." and remove leading letter "v". > > LUSTRE_VERSION=$(echo `git describe` | sed -e "s/_/\./g" | cut -c2-) > > LUSTRE_DEBIAN_REV='1' > > > > # Add entry into debian/changelog such that packages have proper version > > names. > > dch --newversion ${LUSTRE_VERSION}-${LUSTRE_DEBIAN_REV} --distribution > > unstable --nomultimaint -t "Build from official master upstream." > > > > otherwise you get package version names according to top entry in > > debian/changelog > > which does not usually match with the GIT version you are compiling. > > It would be nice to add this as part of the "make debs" target so that the > build is > done with the right version. Bonus points if it checks the top changelog > entry to > see there is already an entry for the current version and doesn't add a new > entry. > > Cheers, Andreas > > > Cheers > > Thomas > > > > On Fri, Nov 25, 2016 at 10:04:06AM +, Phill Harvey-Smith wrote: > >> On 02/11/2016 17:54, Dilger, Andreas wrote: > >>> There is a "make debs" target, but I don't know how often this is > >>> tested. That would be the best thing to use for Ubuntu, and if it isn't > >>> working then please feel free to report to the list and/or Jira. > >> > >> Just got back to this, > >> > >> make debs gets further but still seems to crash out > >> > >> Steps : > >> > >> Get source from git. > >> Select 2.8.0 with : git checkout 2.8.0 > >> sh ./autogen.sh > >> ./configure --disable-server --with-o2ib=no > >> make > >> > >> The make completes correctly, without errors, I have done a make install > >> on this node in the past with this version which is up and running > >> correctly. > >> > >> make debs > >> > >> bombs out, log below : > >> > >> I've uploaded the log to : > >> > >> http://penguin.stats.warwick.ac.uk/~stsxab/Lustre/lustre_make_deb_error.txt > >> > >> As the list refused to accept it as it was too big :( > >> > >> Cheers. > >> > >> Phill. > >> > >> ___ > >> lustre-discuss mailing list > >> lustre-discuss@lists.lustre.org > >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] lustre won't build anymore on RHEL 7.3
Hi there, on DEB distro's there is a similar problem, due to conflicts of (old) staged Lustre modules and the e.g. new installed modules. The result is, that first the staged modules are loaded, and then the loaders tries to load the remaining/missing new modules and fails. On DEB distro's the problem can be overcome by building the packages,e.g. with --with-kmp-moddir=updates and telling the mod loader to use search order: "search updates built-in". It first looks into /lib/modules/`uname -r`/updates and then in the remaining directories. The search order string "search updates built-in" must be placed in e.g. /etc/depmod.d/lustre.conf, and depmod -a must be called. Probably this could work also on RPM distro's. Cheers Thomas On Wed, Nov 30, 2016 at 08:55:16AM -0800, Riccardo Veraldi wrote: > On 11/29/16 10:40 PM, Jeff Johnson wrote: > > I did some updating to this kernel as well using the cr repo. I ran > > into some similar differences. There is a new version of kmod and it > > appeared some file locations differed from 7.2. > yes indeed they are in different locations from the usual kernel/fs/lustre/ > > > > > > --Jeff > > > > On Tue, Nov 29, 2016 at 10:27 PM, Riccardo Veraldi > >> > > wrote: > > > > I fixed it building Lustre 2.8.60 and it works. > > Anyway the kernel modules osd_zfs.ko and so on are placed in > > /lib/modules/3.10.0-514 .el7.x86_64/fs/ > > instead of /lib/modules/3.10.0-514 > > .el7.x86_64/kernel/fs/lustre and > > /lib/modules/3.10.0-514.el7.x86_64/kernel/fs/extra > > so I had to modify the src.rpm accordingly to rebuild it properly. > > Any hint about this, on how to restore the standard path of the > > lustre, > > lnet, osd_zfs kernel modules ? > > > > thank you > > > > Riccardo > > > > > > On 11/29/16 2:25 PM, Riccardo Veraldi wrote: > > > Hello. > > > > > > Today I rebuilt Lustre for the new kernel which is inside RHEL > > > 7.3/CentOS 7.3 3.10.0-514 .el7.x86_64 > > > I do not know what changed in the distribution but it is not > > compiling > > > anymore. > > > What changed in my environment was a yum update which brought > > the system > > > from RHEL 7.2 kernel 3.10.0-327.36.3.el7.x86_64 > > > to RHEL7.3 kernel 3.10.0-514.el7.x86_64 > > > Anyone has the same issue ? > > > > > > thank you > > > > > > CC:gcc > > > LD:/usr/bin/ld -m elf_x86_64 > > > CPPFLAGS: -include /root/rpmbuild/BUILD/lustre-2.8.0/undef.h > > > -include /root/rpmbuild/BUILD/lustre-2.8.0/config.h > > > -I/root/rpmbuild/BUILD/lustre-2.8.0/libcfs/include > > > -I/root/rpmbuild/BUILD/lustre-2.8.0/lnet/include > > > -I/root/rpmbuild/BUILD/lustre-2.8.0/lustre/include > > > CFLAGS:-g -O2 -Werror -Wall -Werror > > > EXTRA_KCFLAGS: -include /root/rpmbuild/BUILD/lustre-2.8.0/undef.h > > > -include /root/rpmbuild/BUILD/lustre-2.8.0/config.h -g > > > -I/root/rpmbuild/BUILD/lustre-2.8.0/libcfs/include > > > -I/root/rpmbuild/BUILD/lustre-2.8.0/lnet/include > > > -I/root/rpmbuild/BUILD/lustre-2.8.0/lustre/include > > > > > > Type 'make' to build Lustre. > > > + make -j2 -s > > > Making all in . > > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c: In > > > function 'kiblnd_hdev_get_attr': > > > > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2153:9: > > > error: implicit declaration of function 'ib_query_device' > > > [-Werror=implicit-function-declaration] > > > rc = ib_query_device(hdev->ibh_ibdev, attr); > > > ^ > > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c: In > > > function 'kiblnd_dev_need_failover': > > > > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2251:9: > > > error: passing argument 1 of 'rdma_create_id' from incompatible > > pointer > > > type [-Werror] > > > cmid = kiblnd_rdma_create_id(kiblnd_dummy_callback, dev, > > > RDMA_PS_TCP, > > > ^ > > > In file included from > > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.h:74:0, > > > from > > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:42: > > > > > /usr/src/kernels/3.10.0-514.el7.x86_64/include/rdma/rdma_cm.h:172:20: > > > note: expected 'struct net *' but argument is of type 'int > > (*)(struct > > > rdma_cm_id *, struct rdma_cm_event *)' > > > struct rdma_cm_id *rdma_create_id(struct net *net, > > > ^ > > > > > /root/rpmbuild/BUILD/lustre-2.8.0/lnet/klnds/o2iblnd/o2iblnd.c:2251:9: > > > error: passing argument 2 of 'rdma_create_id' from incompatible > > pointer > > > type [-Werror] > > > cmid =
Re: [lustre-discuss] Clients looses IB connection to OSS.
Hi, see JIRA: https://jira.hpdd.intel.com/browse/LU-5718 What seems to work as a quick fix (for older versions) is to set the value of parameter max_pages_per_rpc=64 As written in https://jira.hpdd.intel.com/browse/LU-5718 the issue is resolved, however for upcoming version 2.10.0 Cheers Thomas On Mon, May 01, 2017 at 04:47:32PM +0200, Hans Henrik Happe wrote: > Hi, > > We have experienced problems with loosing connection to OSS. It starts with: > > May 1 03:35:46 node872 kernel: LNetError: > 5545:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) RDMA has too many > fragments for peer 10.21.10.116@o2ib (256), src idx/frags: 128/236 dst > idx/frags: 128/236 > May 1 03:35:46 node872 kernel: LNetError: > 5545:0:(o2iblnd_cb.c:1689:kiblnd_reply()) Can't setup rdma for GET from > 10.21.10.116@o2ib: -90 > > The rest of the log is attached. > > After this Lustre access is very slow. I.e. a 'df' can take minutes. > Also 'lctl ping' to the OSS give I/O errors. Doing 'lnet net del/add' > makes ping work again until file I/O starts. Then I/O errors again. > > We use both IB and TCP on servers, so no routers. > > In the attached log astro-OST0001 has been moved to the other server in > the HA pair. This is because 'lctl dl -t' showed strange output when on > the right server: > > # lctl dl -t > 0 UP mgc MGC10.21.10.102@o2ib 0b0bbbce-63b6-bf47-403c-28f0c53e8307 5 > 1 UP lov astro-clilov-88107412e800 > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 4 > 2 UP lmv astro-clilmv-88107412e800 > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 4 > 3 UP mdc astro-MDT-mdc-88107412e800 > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.102@o2ib > 4 UP osc astro-OST0002-osc-88107412e800 > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.116@o2ib > 5 UP osc astro-OST0001-osc-88107412e800 > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 172.20.10.115@tcp1 > 6 UP osc astro-OST0003-osc-88107412e800 > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.117@o2ib > 7 UP osc astro-OST-osc-88107412e800 > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.114@o2ib > > So astro-OST0001 seems to be connected through 172.20.10.115@tcp1, even > though it uses 10.21.10.115@o2ib (verified by performance test and > disabling tcp1 on IB nodes). > > Please ask for more details if needed. > > Cheers, > Hans Henrik > > May 1 03:35:46 node872 kernel: LNetError: > 5545:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) RDMA has too many fragments for > peer 10.21.10.116@o2ib (256), src idx/frags: 128/236 dst idx/frags: 128/236 > May 1 03:35:46 node872 kernel: LNetError: > 5545:0:(o2iblnd_cb.c:1689:kiblnd_reply()) Can't setup rdma for GET from > 10.21.10.116@o2ib: -90 > May 1 03:35:46 node872 kernel: LustreError: > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc > 88103dd63000 > May 1 03:35:46 node872 kernel: Lustre: > 5606:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has > failed due to network error: [sent 1493602541/real 1493602541] > req@880e99cea080 x1565604440535580/t0(0) > o4->astro-OST0002-osc-881070c95c00@10.21.10.116@o2ib:6/4 lens 608/448 e 0 > to 1 dl 1493602585 ref 2 fl Rpc:X/0/ rc 0/-1 > May 1 03:35:46 node872 kernel: Lustre: astro-OST0002-osc-881070c95c00: > Connection to astro-OST0002 (at 10.21.10.116@o2ib) was lost; in progress > operations using this service will wait for recovery to complete > May 1 03:35:46 node872 kernel: Lustre: astro-OST0002-osc-881070c95c00: > Connection restored to 10.21.10.116@o2ib (at 10.21.10.116@o2ib) > May 1 03:35:46 node872 kernel: LustreError: > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc > 88103dd63000 > May 1 03:35:46 node872 kernel: LustreError: > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc > 88103dd63000 > May 1 03:35:46 node872 kernel: LustreError: > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc > 88103dd63000 > May 1 03:35:46 node872 kernel: LustreError: > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc > 88103dd63000 > May 1 03:35:46 node872 kernel: LustreError: > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc > 88103dd63000 > May 1 03:35:46 node872 kernel: LustreError: > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc > 88103dd63000 > May 1 03:35:46 node872 kernel: LustreError: > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc > 88103dd63000 > May 1 03:35:52 node872 kernel: Lustre: > 5579:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed > out for slow reply: [sent 1493602546/real 1493602546] req@88103e0f10c0 > x1565604440535684/t0(0) > o8->astro-OST0002-osc-881070c95c00@10.21.10.116@o2ib:28/4 lens 520/544 e > 0 to 1 dl 1493602552 ref 1 fl Rpc:XN/0/ rc 0/-1 > May 1 03:35:52 node872 kernel: Lustre: >