Public bug reported:

Problem:
--------------------------------------------------------------
I just had to remove ubuntu server 12.04 to install redhat enterprise linux 6. 
The intermittent slowness was
completely unacceptable for the users, who have workstations with /home mounted 
with nfs4 on this server.
The mail server, also accesses the /home because the /home/$USER/Maildir 
directories are there.
Using nfs4, the kernel nfs threads caused enormous load.

The users had frozen desktops (greyed out windows) and mail slowed or
arrived days later as a result.

With RHEL6, all nfs4 problems are completely gone. I used the exact same 
/etc/exports file, and the same
settings and mount options on the workstations, the same number of nfs threads. 
Both the redhat and ubuntu systems are KVM virtual guests on an redhat 6 
virtual host (one of 3 actually).
The storage backend is a very fast equallogic array, which exports iscsi 
targets to the virtual hosts.

I am sorry, but I have to conclude the current nfs4 implementation of ubuntu 
server 12.04 is NOT fit for use.
A complete university department suffered for weeks while I tried to solve the 
problems with ubuntu, but
in the end it was decided to install redhat instead, re-using the same iscsi 
targets for system, home and data.
A missed chance for ubuntu...

Therefore I urge Canonical's people to classify this bug as critical.

Also I think quality assurance should have caught this bug before
shipping.

Analysis
------------------------------------------------------------
LOAD:
The nfs threads cause the kernel to use enormous amounts of 'sy' time as 
measured in top.
I will attach a sample of top's output, of a particular _quiet_ time on the 
network. Load is 7.82.
On busier moments, the load went through the roof, beyond 50 and further. It 
consumes
actual CPU cycles.
Each thread consumes upto 30% of a cpu core. I enabled 128 threads.
rx an tx block sizes are 32768 on the clients. Both server and clients used 
async, both on redhat and ubuntu.

SYSTEM vs IO-WAIT:
The replacement redhat system can surely be overloaded, but then it does not 
consume CPU cycles doing so. Top does report high load, but it spends in in the 
'wa' state. This indicates it is simply waiting for its backend iscsi devices 
to complete writes. I tested this by simultaneously letting all workstations 
write multi-gigabyte files with dd to /home.
On ubuntu, the nfs threads spend their time in 'sy', doing who-knows-what. 

LOGS:
Nothing at all appears in the logs. But when I set bitwise debug options in the 
/proc/sys/sunrpc/*debug files,
lots of log entries appear. Those seem like normal NFS protocol messages to me 
though. 
I also tried to discover what was happening with wireshark, but the traffic 
looks like normal nfs4 traffic to me.

SLOWNESS:
That is the thing. The ubuntu nfs server is actually NOT slow at all. During my 
dd tests, it wrote half a gigabyte per second to its iscsi backends. It's 
_throughput_ is better than that of the redhat server. 
As far as I can tell, it falls down because it makes client side processes that 
want to do IO wait on other writes. A simple 'ls' has to wait until a write has 
been completed. And both server and client used async nfs. People's firefoxes 
freeze all the time because firefox need to read and write a lot to its cache 
and other files in the .mozilla directory. 
The dovecot imap server almost grinds to a halt trying to write all those 
little files in people's /home/$USER/Maildir's. 
The problems go on and on. Basically, a complete network of workstations is 
almost unusable because of this.
Upfront tests were done of course, but showed only the excellent throughput but 
not the appalling `waiting´ behaviour.
With redhat 6 there is no such problem.

SITUATION:
People use their own, and each others linux workstations for science, doing 
large calculations and writing a lot of big and small files to nfs. The nfs 
server serves /export/home, and also raw data storage from /export/data with 
nfs4. The clients mount those under /home and /data/misc respectively.  Also 
there is a read-only software mount for certain scientific packages.

CONFIG:
client fstab lines:
#### nfs entries ###
sw.lorentz.leidenuniv.nl:/sw /sw nfs4 
hard,intr,ro,tcp,rsize=32768,wsize=32768,bg,acl,async
home.lorentz.leidenuniv.nl:/home /home nfs4 
hard,intr,rw,tcp,rsize=32768,wsize=32768,bg,acl,async

server exports file:
/export                         
132.229.227.0/24(ro,sync,insecure,root_squash,no_subtree_check,nohide,fsid=0)\
                                
132.229.216.128/26(ro,sync,insecure,no_root_squash,no_subtree_check,nohide,fsid=0)\
                                
132.229.226.3(ro,sync,insecure,no_root_squash,no_subtree_check,nohide,fsid=0)\
                                
132.229.226.4(ro,sync,insecure,no_root_squash,no_subtree_check,nohide,fsid=0)
/export/home                    
132.229.227.0/24(rw,async,insecure,root_squash,no_subtree_check,nohide)\
                                
132.229.216.128/26(rw,async,insecure,no_root_squash,no_subtree_check,nohide)\
                                
132.229.226.3(rw,async,insecure,no_root_squash,no_subtree_check,nohide)\
                                
132.229.226.4(rw,async,insecure,no_root_squash,no_subtree_check,nohide)\
                                
132.229.214.41(rw,async,insecure,no_root_squash,no_subtree_check,nohide)
/export/data                    
132.229.227.0/24(rw,async,insecure,root_squash,no_subtree_check,nohide)\
                                
132.229.216.128/26(rw,async,insecure,no_root_squash,no_subtree_check,nohide)\
                                
132.229.226.3(rw,async,insecure,no_root_squash,no_subtree_check,nohide)\
                                
132.229.226.4(rw,async,insecure,no_root_squash,no_subtree_check,nohide)
/export/sw                      
132.229.227.0/24(rw,async,insecure,root_squash,no_subtree_check,nohide)\
                                
132.229.216.128/26(rw,async,insecure,no_root_squash,no_subtree_check,nohide)\
                                
132.229.226.3(rw,async,insecure,no_root_squash,no_subtree_check,nohide)\
                                
132.229.226.4(rw,async,insecure,no_root_squash,no_subtree_check,nohide)

root@gaia:~# lsb_release -rd
Description:    Ubuntu 12.04 LTS
Release:        12.04

root@gaia2:~# dpkg -l | grep -E 'nfs|linux-image'
ii  libnfsidmap2                     0.25-1ubuntu2              NFS idmapping 
library
ii  linux-image-3.2.0-18-generic     3.2.0-18.29                Linux kernel 
image for version 3.2.0 on 64 bit x86 SMP
ii  linux-image-3.2.0-19-generic     3.2.0-19.31                Linux kernel 
image for version 3.2.0 on 64 bit x86 SMP
ii  linux-image-3.2.0-20-generic     3.2.0-20.33                Linux kernel 
image for version 3.2.0 on 64 bit x86 SMP
ii  linux-image-3.2.0-21-generic     3.2.0-21.34                Linux kernel 
image for version 3.2.0 on 64 bit x86 SMP
ii  linux-image-3.2.0-23-generic     3.2.0-23.36                Linux kernel 
image for version 3.2.0 on 64 bit x86 SMP
ii  linux-image-3.2.0-24-generic     3.2.0-24.39                Linux kernel 
image for version 3.2.0 on 64 bit x86 SMP
ii  linux-image-server               3.2.0.24.26                Linux kernel 
image on Server Equipment.
ii  nfs-common                       1:1.2.5-3ubuntu3           NFS support 
files common to client and server
ii  nfs-kernel-server                1:1.2.5-3ubuntu3           support for NFS 
kernel server
ii  nfswatch                         4.99.11-1                  Program to 
monitor NFS traffic for the console


WHAT I EXPECTED TO HAPPEN
----------------------------------
A fast and responsive nfs service.

WHAT HAPPENED INSTEAD
-----------------------------
I got fast, but also intermittently totally unresponsive.

** Affects: linux-meta (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: high load nfs4

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1006446

Title:
  nfs4 causes enormous load in ubuntu-server making it unusable

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/1006446/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to