Re: [lustre-discuss] dkms-2.8.6 breaks installation of lustre-zfs-dkms-2.12.7-1.el7.noarch

2021-10-13 Thread Riccardo Veraldi
yes, same problem for me, I Addressed this a few weeks go and I think I 
Reported to the mailing list.


This is my patch to make things works and build the lustre-dkms rpm


diff -ru lustre-2.12.7/lustre-dkms_pre-build.sh 
lustre-2.12.7-dkms-pcds/lustre-dkms_pre-build.sh
--- lustre-2.12.7/lustre-dkms_pre-build.sh    2021-07-14 
22:06:05.0 -0700
+++ lustre-2.12.7-dkms-pcds/lustre-dkms_pre-build.sh    2021-09-26 
08:30:54.09600 -0700

@@ -20,18 +20,16 @@
 fi

 # ZFS and SPL are version locked
-    ZFS_VERSION=$(dkms status -m zfs -k $3 -a $5 | awk -F', ' '{print 
$2; exit 0}' | grep -v ': added$')
+    ZFS_VERSION=$(dkms status -m zfs | awk ' { print $1 } ' | sed -e 
's/zfs\///' -e 's/,//')

+
 if [ -z $ZFS_VERSION ] ; then
     echo "zfs-dkms package must already be installed and built 
under DKMS control"

     exit 1
 fi

 SERVER="--enable-server $LDISKFS \
-        --with-linux=$4 --with-linux-obj=$4 \
-        --with-spl=$6/spl-${ZFS_VERSION} \
-        --with-spl-obj=$7/spl/${ZFS_VERSION}/$3/$5 \
-        --with-zfs=$6/zfs-${ZFS_VERSION} \
-        --with-zfs-obj=$7/zfs/${ZFS_VERSION}/$3/$5"
+        --with-zfs=/usr/src/zfs-${ZFS_VERSION} \
+        --with-zfs-obj=/var/lib/dkms/zfs/${ZFS_VERSION}/$(uname -r)/x86_64"

 KERNEL_STUFF="--with-linux=$4 --with-linux-obj=$4"
 ;;




On 10/13/21 2:30 PM, Fredrik Nyström via lustre-discuss wrote:

dkms was recently updated to version 2.8.6 in epel/7.

After this update installation of lustre-zfs-dkms-2.12.7-1.el7.noarch
fails with following error:

./configure: line 33341: test: zfs: integer expression expected
configure: error:


Breakage seems to be caused by following dkms commit:
https://github.com/dell/dkms/commit/f83b758b6fb8ca67b1ab65df9e3d2a1e994eb483


configure line 33341:
if test x$enable_modules = xyes && test $ZFS_MAJOR -eq 0 && test $ZFS_MINOR -lt 
8; then :

Not sure exactly how but it ends up with ZFS_MAJOR=zfs, ZFS_MINOR=zfs
instead of: ZFS_MAJOR=0, ZFS_MINOR=7


Downgrading to older dkms or manually reverting the commit mentioned
above solved this problem for me.


Regards / Fredrik N.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] dkms-2.8.6 breaks installation of lustre-zfs-dkms-2.12.7-1.el7.noarch

2021-10-13 Thread Fredrik Nyström via lustre-discuss
dkms was recently updated to version 2.8.6 in epel/7.

After this update installation of lustre-zfs-dkms-2.12.7-1.el7.noarch 
fails with following error:

./configure: line 33341: test: zfs: integer expression expected
configure: error: 


Breakage seems to be caused by following dkms commit:
https://github.com/dell/dkms/commit/f83b758b6fb8ca67b1ab65df9e3d2a1e994eb483


configure line 33341:
if test x$enable_modules = xyes && test $ZFS_MAJOR -eq 0 && test $ZFS_MINOR -lt 
8; then :

Not sure exactly how but it ends up with ZFS_MAJOR=zfs, ZFS_MINOR=zfs 
instead of: ZFS_MAJOR=0, ZFS_MINOR=7


Downgrading to older dkms or manually reverting the commit mentioned 
above solved this problem for me.


Regards / Fredrik N.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] Re: No read throughput shown for the sequential read write Filebench workload

2021-10-13 Thread Md Hasanur Rashid via lustre-discuss
Even with clearing the locks (clear all the LDLM locks on the client after
the sync and drop caches commands), the stats look just the same as before.

For extents_stats, the stats look like these:

llite.hasanfs-882fdc929800.extents_stats=
snapshot_time: 1634138089.122870 (secs.usecs)
   read   |write
  extentscalls% cum%  |  calls% cum%
   0K -4K :450  100  100  |  000
   4K -8K :  00  100  |  000
   8K -   16K :  00  100  |  000
  16K -   32K :  00  100  |  000
  32K -   64K :  00  100  |  000
  64K -  128K :  00  100  |  000
 128K -  256K :  00  100  |  000
 256K -  512K :  00  100  |  000
 512K - 1024K :  00  100  |  000
   1M -2M :  00  100  |   1600   46   46
   2M -4M :  00  100  |  00   46
   4M -8M :  00  100  |  00   46
   8M -   16M :  00  100  |  00   46
  16M -   32M :  00  100  |   1826   53  100

If we take a look at the client-side rpc_stats following are observed:

osc.hasanfs-OST-osc-882fdc929800.rpc_stats=
snapshot_time: 1634137979.38657 (secs.usecs)
read RPCs in flight:  0
write RPCs in flight: 0
pending write pages:  0
pending read pages:   0

readwrite
pages per rpc rpcs   % cum % |   rpcs   % cum %
1:   0   0   0   |  0   0   0
2:   0   0   0   |  0   0   0
4:   0   0   0   |  0   0   0
8:   0   0   0   |  0   0   0
16:  0   0   0   |  0   0   0
32:  0   0   0   |  0   0   0
64:  0   0   0   |  0   0   0
128: 0   0   0   |  0   0   0
256: 0   0   0   |   3648 100 100

readwrite
rpcs in flightrpcs   % cum % |   rpcs   % cum %
0:   0   0   0   |  0   0   0
1:   0   0   0   | 47   1   1
2:   0   0   0   | 47   1   2
3:   0   0   0   | 48   1   3
4:   0   0   0   | 49   1   5
5:   0   0   0   | 50   1   6
6:   0   0   0   | 59   1   8
7:   0   0   0   | 55   1   9
8:   0   0   0   | 68   1  11
9:   0   0   0   |   2209  60  72
10:  0   0   0   |   1015  27  99
11:  0   0   0   |  1   0 100

readwrite
offsetrpcs   % cum % |   rpcs   % cum %
0:   0   0   0   | 66   1   1
1:   0   0   0   |  0   0   1
2:   0   0   0   |  0   0   1
4:   0   0   0   |  0   0   1
8:   0   0   0   |  0   0   1
16:  0   0   0   |  0   0   1
32:  0   0   0   |  0   0   1
64:  0   0   0   |  0   0   1
128: 0   0   0   |  0   0   1
256: 0   0   0   | 66   1   3
512: 0   0   0   |132   3   7
1024:0   0   0   |264   7  14
2048:0   0   0   |528  14  28
4096:0   0   0   |864  23  52
8192:0   0   0   |   1728  47 100
osc.hasanfs-OST0001-osc-882fdc929800.rpc_stats=
snapshot_time: 1634137979.39942 (secs.usecs)
read RPCs in flight:  0
write RPCs in flight: 0
pending write pages:  0
pending read pages:   0

readwrite
pages per rpc rpcs   % cum % |   rpcs   % cum %
1:  61  93  93   |  0   0   0
2:   1   1  95   |  0   0   0
4:   0   0  95   |  0   0   0
8:   0   0  95   |  0   0   0
16:  0   0  95   |  0   0   0
32:  0   0  95   |  0   0   0
64:  0   0  95   |  0   0   0
128: 1   1  96   |  1   0   0
256: 2   3 100   |   3792  99 100

readwrite

[lustre-discuss] No read throughput shown for the sequential read write Filebench workload

2021-10-13 Thread Md Hasanur Rashid via lustre-discuss
Hello Everyone,

I am running a Filebench workload which is provided below:

define fileset 
name="testF",entries=100,filesize=16m,prealloc,path="/mnt/hasanfs/tmp1"

define process name="readerP",instances=2 {
  thread name="readerT",instances=4 {
  flowop openfile name="openOP",filesetname="testF"
  flowop writewholefile name="writeOP",iters=4,filesetname="testF"
  flowop readwholefile name="readOP",iters=1,filesetname="testF"
  flowop closefile name="closeOP"
  }
}

create files
system "sync"
system "echo 3 > /proc/sys/vm/drop_caches"

run 60

I am running the workload in a Lustre cluster. When I check the log from
the server-side, it shows the following (I have provided stats of one OSS):

obdfilter.hasanfs-OST.stats=
snapshot_time 1633361244.519122 secs.usecs
read_bytes1 samples [bytes] 4096 4096 4096
write_bytes   8479 samples [bytes] 1048576 1048576 8890875904
destroy   13 samples [reqs]
statfs76 samples [reqs]
preprw8480 samples [reqs]
commitrw  8480 samples [reqs]
ping  57 samples [reqs]

obdfilter.hasanfs-OST.brw_stats=
snapshot_time: 1633361244.519588 (secs.usecs)

   read  | write
pages per bulk r/w rpcs  % cum % |  rpcs% cum %
256: 0   0   0   | 8479 100 100

   read  | write
discontiguous pagesrpcs  % cum % |  rpcs% cum %
0:   0   0   0   | 8479 100 100

   read  | write
discontiguous blocks   rpcs  % cum % |  rpcs% cum %
0:   0   0   0   | 8479 100 100

   read  | write
disk fragmented I/Os   ios   % cum % |  ios % cum %
1:   0   0   0   | 7499  88  88
2:   0   0   0   |  980  11 100

   read  | write
disk I/Os in flightios   % cum % |  ios % cum %
1:   0   0   0   | 7921  83  83
2:   0   0   0   | 1407  14  98
3:   0   0   0   |  116   1  99
4:   0   0   0   |   15   0 100

   read  | write
I/O time (1/1000s) ios   % cum % |  ios % cum %
2:   0   0   0   | 1963  23  23
4:   0   0   0   | 5627  66  89
8:   0   0   0   |  837   9  99
16:  0   0   0   |   25   0  99
32:  0   0   0   |   27   0 100

   read  | write
disk I/O size  ios   % cum % |  ios % cum %
4K:  0   0   0   |   62   0   0
8K:  0   0   0   |  127   1   1
16K: 0   0   0   |  113   1   3
32K: 0   0   0   |0   0   3
64K: 0   0   0   |0   0   3
128K:0   0   0   |   65   0   3
256K:0   0   0   |0   0   3
512K:0   0   0   | 1127  11  15
1M:  0   0   0   | 7965  84 100

*Can anyone please explain to me why I am not seeing any read operations in
the stats?*

Thanks,
Md. Hasanur Rashid
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] /home remounted and running for 6 hours

2021-10-13 Thread Sid Young via lustre-discuss
Well my saga with /home locking up was partially resolved for about 6 hours
today. I rebooted the MDS and re mounted the MGS and lustre MDT and home
MDT and after a while it all came good, then rebooted each compute node and
we were operational for about 6 hours when it all locked up again, /lustre
worked fine but /home just locked solid.. I'm suspecting corruption but I
don't know how to fix it...

I have found that once I restart the MDS I can do a remount of home and all
the D state processes come good and we are up and running.

Is there a tool that can specifically check an individual MDT / OST etc?



Sid Young
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org