On Apr 5, 2018, at 19:44, Faaland, Olaf P. <faala...@llnl.gov> wrote: > > Hi, > > I have a couple of questions about these stats. If these are documented > somewhere, by all means point me to them. What I found in the operations > manual and on the web did not answer my questions. > > What do > > read_bytes 25673 samples [bytes] 1 3366225 145121869 > write_bytes 13641 samples [bytes] 1 3366225 468230469 > > mean in more detail? I understand that the last three values are > MIN/MAX/SUM, and that their units are bytes, and that they reflect activity > since the file system was mounted or since the stats were last cleared. But > more specifically: > > samples: Is this the number of requests issued to servers, e.g. RPC issued > with opcode OST_READ?
No, these stats in the llite.*.stats file are "llite level" stats (i.e. they relate to the VFS operations). If you want to get RPC-level stats you need to look at osc.*.stats. > So if the user called read() 200 times on the same 1K file, which didn't ever > change and remained cached by the lustre client, and all the data was fetched > in a single RPC in the first place, then samples would be 1? > > And in that case, would the sum be 1K rather than 200K? Simple testing shows that the read_bytes line has the number of read() syscalls and the total number of bytes read by the syscall (not the data read from the OST), even though both reads are from cache: # lctl set_param llite.*.stats=clear llite.testfs-ffff880007524000.stats=clear # dd if=/dev/zero of=/mnt/testfs/ff bs=1M count=1 1048576 bytes (1.0 MB) copied, 0.00220207 s, 476 MB/s # dd of=/dev/null if=/mnt/testfs/ff bs=1k count=1k 1048576 bytes (1.0 MB) copied, 0.00197065 s, 532 MB/s # dd of=/dev/null if=/mnt/testfs/ff bs=1k count=1k 1048576 bytes (1.0 MB) copied, 0.00188529 s, 556 MB/s # lctl get_param llite.*.stats llite.testfs-ffff880007524000.stats= snapshot_time 1523008010.817348638 secs.nsecs read_bytes 2048 samples [bytes] 1024 1024 2097152 write_bytes 1 samples [bytes] 1048576 1048576 1048576 open 3 samples [regs] close 3 samples [regs] seek 2 samples [regs] truncate 1 samples [regs] getxattr 1 samples [regs] removexattr 1 samples [regs] inode_permission 7 samples [regs] Checking the OSC-level stats shows that there was a single write RPC of 1MB, and no read RPC at all, since the data remains in the client cache. # lfs getstripe -i /mnt/testfs/ff 2 # lctl get_param osc.testfs-OST0002*.stats osc.testfs-OST0002-osc-ffff880007524000.stats= snapshot_time 1523008200.913698356 secs.nsecs req_waittime 83 samples [usec] 119 2461 51353 41125171 req_active 83 samples [reqs] 1 1 83 83 ldlm_extent_enqueue 1 samples [reqs] 1 1 1 1 write_bytes 1 samples [bytes] 1048576 1048576 1048576 1099511627776 ost_write 1 samples [usec] 2461 2461 2461 6056521 ost_connect 1 samples [usec] 280 280 280 78400 ost_punch 1 samples [usec] 291 291 291 84681 ost_statfs 1 samples [usec] 119 119 119 14161 obd_ping 78 samples [usec] 164 1352 46717 29485783 Similarly, the ost.OSS.ost_io.stats file on the OSS will show the RPC stats handled by the whole server, while obdfilter.testfs-OST0002.stats will show the RPCs handled by this target, and osd-*.testfs-OST0002.brw_stats will show how the write was sent to disk (it will not show any read). If a read is processed from the OSS read cache, it will appear at the ost_io and obdfilter level, but not at the osd-* level, since there was not actually any IO to disk. Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Intel Corporation _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org