Re: [lustre-discuss] dkms-2.8.6 breaks installation of lustre-zfs-dkms-2.12.7-1.el7.noarch
yes, same problem for me, I Addressed this a few weeks go and I think I Reported to the mailing list. This is my patch to make things works and build the lustre-dkms rpm diff -ru lustre-2.12.7/lustre-dkms_pre-build.sh lustre-2.12.7-dkms-pcds/lustre-dkms_pre-build.sh --- lustre-2.12.7/lustre-dkms_pre-build.sh 2021-07-14 22:06:05.0 -0700 +++ lustre-2.12.7-dkms-pcds/lustre-dkms_pre-build.sh 2021-09-26 08:30:54.09600 -0700 @@ -20,18 +20,16 @@ fi # ZFS and SPL are version locked - ZFS_VERSION=$(dkms status -m zfs -k $3 -a $5 | awk -F', ' '{print $2; exit 0}' | grep -v ': added$') + ZFS_VERSION=$(dkms status -m zfs | awk ' { print $1 } ' | sed -e 's/zfs\///' -e 's/,//') + if [ -z $ZFS_VERSION ] ; then echo "zfs-dkms package must already be installed and built under DKMS control" exit 1 fi SERVER="--enable-server $LDISKFS \ - --with-linux=$4 --with-linux-obj=$4 \ - --with-spl=$6/spl-${ZFS_VERSION} \ - --with-spl-obj=$7/spl/${ZFS_VERSION}/$3/$5 \ - --with-zfs=$6/zfs-${ZFS_VERSION} \ - --with-zfs-obj=$7/zfs/${ZFS_VERSION}/$3/$5" + --with-zfs=/usr/src/zfs-${ZFS_VERSION} \ + --with-zfs-obj=/var/lib/dkms/zfs/${ZFS_VERSION}/$(uname -r)/x86_64" KERNEL_STUFF="--with-linux=$4 --with-linux-obj=$4" ;; On 10/13/21 2:30 PM, Fredrik Nyström via lustre-discuss wrote: dkms was recently updated to version 2.8.6 in epel/7. After this update installation of lustre-zfs-dkms-2.12.7-1.el7.noarch fails with following error: ./configure: line 33341: test: zfs: integer expression expected configure: error: Breakage seems to be caused by following dkms commit: https://github.com/dell/dkms/commit/f83b758b6fb8ca67b1ab65df9e3d2a1e994eb483 configure line 33341: if test x$enable_modules = xyes && test $ZFS_MAJOR -eq 0 && test $ZFS_MINOR -lt 8; then : Not sure exactly how but it ends up with ZFS_MAJOR=zfs, ZFS_MINOR=zfs instead of: ZFS_MAJOR=0, ZFS_MINOR=7 Downgrading to older dkms or manually reverting the commit mentioned above solved this problem for me. Regards / Fredrik N. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] dkms-2.8.6 breaks installation of lustre-zfs-dkms-2.12.7-1.el7.noarch
dkms was recently updated to version 2.8.6 in epel/7. After this update installation of lustre-zfs-dkms-2.12.7-1.el7.noarch fails with following error: ./configure: line 33341: test: zfs: integer expression expected configure: error: Breakage seems to be caused by following dkms commit: https://github.com/dell/dkms/commit/f83b758b6fb8ca67b1ab65df9e3d2a1e994eb483 configure line 33341: if test x$enable_modules = xyes && test $ZFS_MAJOR -eq 0 && test $ZFS_MINOR -lt 8; then : Not sure exactly how but it ends up with ZFS_MAJOR=zfs, ZFS_MINOR=zfs instead of: ZFS_MAJOR=0, ZFS_MINOR=7 Downgrading to older dkms or manually reverting the commit mentioned above solved this problem for me. Regards / Fredrik N. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] [EXTERNAL] Re: No read throughput shown for the sequential read write Filebench workload
Even with clearing the locks (clear all the LDLM locks on the client after the sync and drop caches commands), the stats look just the same as before. For extents_stats, the stats look like these: llite.hasanfs-882fdc929800.extents_stats= snapshot_time: 1634138089.122870 (secs.usecs) read |write extentscalls% cum% | calls% cum% 0K -4K :450 100 100 | 000 4K -8K : 00 100 | 000 8K - 16K : 00 100 | 000 16K - 32K : 00 100 | 000 32K - 64K : 00 100 | 000 64K - 128K : 00 100 | 000 128K - 256K : 00 100 | 000 256K - 512K : 00 100 | 000 512K - 1024K : 00 100 | 000 1M -2M : 00 100 | 1600 46 46 2M -4M : 00 100 | 00 46 4M -8M : 00 100 | 00 46 8M - 16M : 00 100 | 00 46 16M - 32M : 00 100 | 1826 53 100 If we take a look at the client-side rpc_stats following are observed: osc.hasanfs-OST-osc-882fdc929800.rpc_stats= snapshot_time: 1634137979.38657 (secs.usecs) read RPCs in flight: 0 write RPCs in flight: 0 pending write pages: 0 pending read pages: 0 readwrite pages per rpc rpcs % cum % | rpcs % cum % 1: 0 0 0 | 0 0 0 2: 0 0 0 | 0 0 0 4: 0 0 0 | 0 0 0 8: 0 0 0 | 0 0 0 16: 0 0 0 | 0 0 0 32: 0 0 0 | 0 0 0 64: 0 0 0 | 0 0 0 128: 0 0 0 | 0 0 0 256: 0 0 0 | 3648 100 100 readwrite rpcs in flightrpcs % cum % | rpcs % cum % 0: 0 0 0 | 0 0 0 1: 0 0 0 | 47 1 1 2: 0 0 0 | 47 1 2 3: 0 0 0 | 48 1 3 4: 0 0 0 | 49 1 5 5: 0 0 0 | 50 1 6 6: 0 0 0 | 59 1 8 7: 0 0 0 | 55 1 9 8: 0 0 0 | 68 1 11 9: 0 0 0 | 2209 60 72 10: 0 0 0 | 1015 27 99 11: 0 0 0 | 1 0 100 readwrite offsetrpcs % cum % | rpcs % cum % 0: 0 0 0 | 66 1 1 1: 0 0 0 | 0 0 1 2: 0 0 0 | 0 0 1 4: 0 0 0 | 0 0 1 8: 0 0 0 | 0 0 1 16: 0 0 0 | 0 0 1 32: 0 0 0 | 0 0 1 64: 0 0 0 | 0 0 1 128: 0 0 0 | 0 0 1 256: 0 0 0 | 66 1 3 512: 0 0 0 |132 3 7 1024:0 0 0 |264 7 14 2048:0 0 0 |528 14 28 4096:0 0 0 |864 23 52 8192:0 0 0 | 1728 47 100 osc.hasanfs-OST0001-osc-882fdc929800.rpc_stats= snapshot_time: 1634137979.39942 (secs.usecs) read RPCs in flight: 0 write RPCs in flight: 0 pending write pages: 0 pending read pages: 0 readwrite pages per rpc rpcs % cum % | rpcs % cum % 1: 61 93 93 | 0 0 0 2: 1 1 95 | 0 0 0 4: 0 0 95 | 0 0 0 8: 0 0 95 | 0 0 0 16: 0 0 95 | 0 0 0 32: 0 0 95 | 0 0 0 64: 0 0 95 | 0 0 0 128: 1 1 96 | 1 0 0 256: 2 3 100 | 3792 99 100 readwrite
[lustre-discuss] No read throughput shown for the sequential read write Filebench workload
Hello Everyone, I am running a Filebench workload which is provided below: define fileset name="testF",entries=100,filesize=16m,prealloc,path="/mnt/hasanfs/tmp1" define process name="readerP",instances=2 { thread name="readerT",instances=4 { flowop openfile name="openOP",filesetname="testF" flowop writewholefile name="writeOP",iters=4,filesetname="testF" flowop readwholefile name="readOP",iters=1,filesetname="testF" flowop closefile name="closeOP" } } create files system "sync" system "echo 3 > /proc/sys/vm/drop_caches" run 60 I am running the workload in a Lustre cluster. When I check the log from the server-side, it shows the following (I have provided stats of one OSS): obdfilter.hasanfs-OST.stats= snapshot_time 1633361244.519122 secs.usecs read_bytes1 samples [bytes] 4096 4096 4096 write_bytes 8479 samples [bytes] 1048576 1048576 8890875904 destroy 13 samples [reqs] statfs76 samples [reqs] preprw8480 samples [reqs] commitrw 8480 samples [reqs] ping 57 samples [reqs] obdfilter.hasanfs-OST.brw_stats= snapshot_time: 1633361244.519588 (secs.usecs) read | write pages per bulk r/w rpcs % cum % | rpcs% cum % 256: 0 0 0 | 8479 100 100 read | write discontiguous pagesrpcs % cum % | rpcs% cum % 0: 0 0 0 | 8479 100 100 read | write discontiguous blocks rpcs % cum % | rpcs% cum % 0: 0 0 0 | 8479 100 100 read | write disk fragmented I/Os ios % cum % | ios % cum % 1: 0 0 0 | 7499 88 88 2: 0 0 0 | 980 11 100 read | write disk I/Os in flightios % cum % | ios % cum % 1: 0 0 0 | 7921 83 83 2: 0 0 0 | 1407 14 98 3: 0 0 0 | 116 1 99 4: 0 0 0 | 15 0 100 read | write I/O time (1/1000s) ios % cum % | ios % cum % 2: 0 0 0 | 1963 23 23 4: 0 0 0 | 5627 66 89 8: 0 0 0 | 837 9 99 16: 0 0 0 | 25 0 99 32: 0 0 0 | 27 0 100 read | write disk I/O size ios % cum % | ios % cum % 4K: 0 0 0 | 62 0 0 8K: 0 0 0 | 127 1 1 16K: 0 0 0 | 113 1 3 32K: 0 0 0 |0 0 3 64K: 0 0 0 |0 0 3 128K:0 0 0 | 65 0 3 256K:0 0 0 |0 0 3 512K:0 0 0 | 1127 11 15 1M: 0 0 0 | 7965 84 100 *Can anyone please explain to me why I am not seeing any read operations in the stats?* Thanks, Md. Hasanur Rashid ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] /home remounted and running for 6 hours
Well my saga with /home locking up was partially resolved for about 6 hours today. I rebooted the MDS and re mounted the MGS and lustre MDT and home MDT and after a while it all came good, then rebooted each compute node and we were operational for about 6 hours when it all locked up again, /lustre worked fine but /home just locked solid.. I'm suspecting corruption but I don't know how to fix it... I have found that once I restart the MDS I can do a remount of home and all the D state processes come good and we are up and running. Is there a tool that can specifically check an individual MDT / OST etc? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org