I recently installed a new host. So new I couldn't install LTS on it so I've installed 151026.
This host is strictly for serving ZFS-based NFS & CIFS. Everything else is just default. Over time it has become fairly obvious to me that NFS writes are ... well, abysmal. This example is copying a 36GB directory of mixed size/type files. The first copy is strictly on a filesystem on the new server. The second is reading from the new server to an existing one. The third is doing the same read/write activity as test one but on an existing server running 151022. on new fileserver: : || nomad@omics1 fs2test ; time cp -rp 004test omics1/004test-1 real 22m27.225s user 0m0.188s sys 0m29.880s reading from new fileserver, writing to existing fileserver: : || nomad@omics1 hvfs2test ; time cp -rp /misc/fs2test/004test . real 2m9.770s user 0m0.180s sys 0m28.694s existing fileserver: : || nomad@omics1 hvfs2test ; time cp -rp 004test omics1/004test-1 real 2m14.158s user 0m0.242s sys 0m30.313s While the user and system times are consistent across all tests the wall clock time of the first test is 10x that of the others. I've seen wall clock time on these tests take as long as 50 minutes. All tests were done on the same CentOS 7 host. Watching snoop collect packets I see multiple-minutes-long pauses while writing to the new server. If I'm reading the heat maps right - https://drive.google.com/open?id=1zcX9ryXjrPMH0_uUbfywiTTnJDau4WW0 - it seems to be spending about 81% of its time in _t_cancel, waiting on a thread to cancel. I'm not a dev, haven't looked at the code, so it's quite possible I'm misunderstanding what the map is saying. The client spends a lot of time so stuck in diskwait that it can take several minutes to respond after a SIGINT, SIGHUP, or SIGKILL to the cp process. Is anyone else seeing similar problems? nomad
_______________________________________________ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss