After quite a bit of messing around, I found a way to test kernels on
16.04 beta properly.  I installed my HightPoint Rocket 622A eSATA card
and plugged the Samsung HD103UJ drive into it using a long eSATA to SATA
cable.  That allowed me to boot from my 16.04 daily DVD and do an
install to the HD103UJ without any problems.  I did an apt-get upgrade
and got the latest kernel (4.4.0-16-generic), and also installed the
mainline kernels 4.4.0-040400-generic and 4.5.0-040500-generic.  Then I
checked the other drive installed on the box (Seagate ST31000528AS, used
for testing Windows 10) and found that its 200 Gibyte NTFS data
partition was almost empty, so I resized it and created a tiny EXT2
partition for Grub2, a 50 Gibyte EXT4 partition for intalling to, and a
10 Gibyte swap partition, all at the end of that drive.  Then I rebooted
to the 16.04 beta DVD and installed to the Seagate drive.  Then I
rebooted to the install on the Samsung drive and ran update-grub, to get
the install on the Seagate drive bootable from the Grub on the Samsung
drive.  Then I booted using the Samsung drive and selected the new 16.04
beta install on the Seagate drive to boot.  It did, to my surprise, as
the Seagate drive was on the motherboard nForce 430 SATA controller.  So
I then mounted the Samsung drive from the booted Seagate install, and
tried the test dd commands, and they also all worked with no errors.

So the first conclusion I have come to is that the bug seems to only be
triggered by the Samsung HD103UJ drive when it is on a motherboard
nForce 430 SATA port.  It does not happen when that drive is on the
Rocket 622A's Marvell SATA port.  And the bug also does not happen when
using the Seagate ST31000528AS drive on a motherboard nForce 430 SATA
port.  It seems to require that particular drive on that particular SATA
controller, and using the standard 16.04 beta kernels, for the bug to
occur.

To prevent problems with the swapper using the swap partition on the
Samsung HD103UJ drive, I edited fstab on both 16.04 installs to use the
new swap partition on the Seagate ST31000528AS drive only.

The next test was to shut down and move the Samsung HD103UJ to its
motherboard nForce 430 SATA port, then reboot using the Grub on that
drive to run the install on the Seagate ST31000528AS drive.  Again, the
boot worked, which I expected as there should be little or no writing to
the Samsung drive during that boot process.  I mounted the Samsung drive
16.04 install partition from the Seagate install, and ran the test dd
commands.  I was again surprised that they worked without errors - I
would have expected that a boot of the 16.04 beta standard kernels from
that drive would work the same as a boot of the 16.04 standard kernel
from my install DVD, and would fail when writing to the Samsung HD103UJ
drive when it is on the motherboard nForce 430 SATA port.

The next test was to reboot to the 16.04 beta partition on the Samsung
HD103UJ drive.  As expected, that boot failed badly, and I had to use
the PC's reset button to restart it, after which I rebooted to the 16.04
beta install on the Seagate ST31000528AS drive again and used that
install to run fsck to repair the 16.04 beta install partition on the
Samsung HD103UJ drive.   The fsck check showed two errors that needed
fixing, where the number of blocks and number of inodes were both wrong.
Once fsck had fixed the partition, I mounted it and looked at the
kern.log file from the bad boot.  It looked normal up to a certain
point, after which it was corrupt - I think it had a block full of
zeroes.  So it looks like as soon as the bug hits, no more successful
log writes occur, which makes it difficult to debug.

I do have a serial port on this motherboard, so I looked to see if I
could use that to get debug information during a bad boot, but it turned
out that I do not have the necessary serial cross-over cable to plug the
motherboard's serial port into any of my other PCs' serial ports.  Last
time I needed a cross-over cable, I must have borrowed one from work,
and unfortunately that is no longer possible.

The next test I ran was to boot the Samsung HD103UJ install on the
nForce 430 port, but using Grub to select the mainline
4.4.0-040400-generic kernel.  That also failed badly in exactly the sam
manner, so I rebooted and repaired the partition again, ready for the
final test.

For the last test, I rebooted to the Samsung HD103UJ install on the
nForce 430 port using the mainline 4.5.0-040500-generic kernel, and it
booted without errors.

So it looks like whatever bug is causing this problem has already been
fixed in the upstream 4.5.0 kernels.  However, if 16.04 is going to be
released using 4.4.0 kernels, I hope the fix for this bug can be
backported before 16.04 is released.  Are there any more tests I should
do to help with this?  Is there any more information I can provide?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1561830

Title:
  Hard disk writes fail in 16.04 daily on nForce 430

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1561830/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to