Re: Unable to shutdown

2011-08-31 Thread Kevin Oberman
Jeremy,

I think we are simply not communicating, I guess. You are arguing
point with which I agree.

Comments in line:
On Tue, Aug 30, 2011 at 4:43 PM, Jeremy Chadwick
free...@jdc.parodius.com wrote:
 On Tue, Aug 30, 2011 at 04:10:13PM -0700, Kevin Oberman wrote:
 On Tue, Aug 30, 2011 at 2:48 PM, Jeremy Chadwick
 free...@jdc.parodius.com wrote:
  On Tue, Aug 30, 2011 at 01:29:02PM -0400, David Magda wrote:
  On Tue, August 30, 2011 11:50, Kevin Oberman wrote:
  [...]
   The more I look at this, the more it seems to me that it is an issue
   with the Seagate drive and not a FreeBSD issue. Probably a bug that is
   never triggered on Windows, so is largely unnoticed. I suspect Widows
   probably orders the command is a subtly different order.
  [...]
 
  Or not the drive per se, but the USB-to-IDE/SATA chipset.
 
  A while back on the OpenSolaris zfs-discuss list there was an issue where
  USB drives would have corrupt ZFS pools if a drive was yanked without a
  'zpool export' being run. Even though ZFS is supposed to always be
  consistent on-disk (because it's transactional), this wasn't happening.
 
  It turned that the chipset had a list of particular SATA commands that it
  allowed through to the drive, and all others were simply answered with
  OK, regardless of what actual actions needed to be taken. One of the
  SATA commands that was NOT whitelisted was the 'cache flush'
  command--which ZFS needs to make sure that it's data structures were
  written in the proper order.
 
  Turns out the drive and its firmware were fine and doing things properly,
  it's just that the necessary commands weren't getting to it because of the
  USB adaptor's chipsset.
 
  I don't think that advice is applicable in this situation. ?Here's why:
 
  Kevin's original description indicates that when the drive (or enclosure
  translation ASIC for that matter) is in standby, when the system is shut
  down, the drive/ASIC never spins back up on I/O (flushing all I/O
  buffers to disk).
 
  If he issues ls commands or similar userland-induced I/O to the drive
  prior to shutting the system down, the drive/ASIC spins up normally.
 
  Here's Kevin's original quote:
 
  The drive is green and spins down when idle. ?If an attempt is made
  to shutdown the system while the drive is spun down, the system goes
  through the usual shutdown including flushing all buffer out to disk,
  but when the final disk access to mark the file systems as clean, the
  drive never spins up and the system hangs until it is powered down.
  I've found no way to avoid this other then to remember to access the
  disk and cause it to spin up before shutting down.
 
  If I attempt to unmount the file systems when the drive is shut down.
  the same thing happens, but I can recover as the second file system
  is still mounted and an ls(1) to that file system will cause the disk
  to spin up and everything is fine.
 
  So the question is what's unique about flushing all I/O buffers to
  disk during shutdown compared to issuing standard I/O in userland. ?I
  can speculate all day as to what the cause is, but it's highly unlikely
  that the USB-to-SATA controller ASIC is causing the problem.

 You are perhaps assuming a bit too much. Since I know that a disk read
 or write WILL spin up the drive, I can only assume that the msdosfs is
 not finding anything to flush, so is not writing. I see the full
 flushing all buffers countdown and it always runs successfully to
 zero. This, without the drive spinning up. This begs at least the
 question of whether the drive is receiving any writes or whether the
 writes are simply being cached by the drive to save energy. I
 suspect that the drive only spins up when enough of its write cache is
 filled.

 If there's nothing to flush, then why is the kernel indefinitely
 looping (finally giving up, and it usually prints something when it
 encounters that condition) when trying to flush buffers when the drive
 is spun down?  What exactly is it trying to flush if there's nothing to
 flush?

I think you may be focusing on things you believe I meant when I didn't mean or
say them. I don't have any reason to believe that a cache flush is or is not the
command that is hanging. I have absolutely no doubt that a flush is requested by
the OS during the unmount process.  I'm just not sure what other commands might
be issued. And, of course, they are CAM operations that the box is probably
converting to SATA, but I can't even say this for sure as the Seagate
drive in question
is a SATA drive in the box. I can only say that the drive is not a
standard 9mm laptop
drive It is longer, thicker and heavier than a laptop drive. It is the
same width as a
normal 2.5 in. drive.

As to the issue of nothing to flush, that was my fault as I was
entering text in a stream
of consciousness  and I realized that, if there was only a little data
being written, it might
not spin up the drive (i.e. take it out of standby) until more data is
written or a 

Re: Unable to shutdown

2011-08-31 Thread Jeremy Chadwick
On Tue, Aug 30, 2011 at 11:04:43PM -0700, Kevin Oberman wrote:
  On Tue, Aug 30, 2011 at 2:48 PM, Jeremy Chadwick
  free...@jdc.parodius.com wrote:
  instead use UFS2 and see if the problem disappears? ?This is in no way a
  permanent solution. ?If this workaround fixes the problem, then I'm
  inclined to believe msdosfs is to blame. ?There have been a lot of
  discussion of this driver in the kernel as of late, and the general
  opinion of it is that it's crummy.
 
 Actually, for me it is as I will shortly be re-partitioning this into
 a GPT disk without any
 msdosfs partitions. I will give it a try with a UFS partition tomorrow
 and see what
 happens.
 
 When you say that it is crummy, are you referring to the USB driver,
 the AHCI driver, or
 the msdosfs support? I have long been concerned about the latter due
 to occasional
 unstable behavior that is fixed by booting Windows. fsck_msdosfs
 seems to do some
 questionable things, too.

I was referring to msdosfs support in the FreeBSD kernel.  I'm still not
so sure about the USB stack (some things seem to be better now as a
result of the re-write that happened during the 7.x - 8.x days, but
other things may still be awry); I don't tend to use any USB devices on
FreeBSD.  As for AHCI, I have no complaints at all, although AHCI
shouldn't be involved when it comes to a USB-connected SATA hard disk.

  And here's another thought: what if the issue is limited, somehow, to
  just writes? ?Meaning, could the kernel issue a false read to the
  device (for some random LBA, even LBA 0 for all I care) and then proceed
  with its write/flushing? ?I wonder if that would cause the drive to spin
  up first. ?That would be a quirk in my opinion.
 
 Interesting idea, but I really doubt that it's an issue with the write
 other than that the
 drive may not leave standby unless the cache is full enough that it flushes.

I'm not sure what you mean by the last part of the sentence, but the
former is something I'm in agreement with.  I doubt adding a fake read
prior to issuing writes and flushes during shutdown would make any
difference.  I'm just surprised the writes being made are not causing
the drive to spin up.

  There's also the possibility the USB stack on FreeBSD is doing something
  really stupid... man, I don't even want to go down that road. ?Hans
  should be able to help determine if that's the case, but not using
  msdosfs as a test would be a good start.
 
 Yes. I make no claim to understand the USB layer at all, but I do
 understand that
 it is very tricky. Lots of evidence of that in how broken early
 Microsoft USB stacks
 were.

FreeBSD has gone through at least two major versions of a USB stack.
The stack in the 4.x days did not impress me -- I tried working on
Logitech USB camera support, but could not get alternative indexes to
work -- ugen(4) returned bizarre error conditions for things that
absolutely should have worked.  I did contact the stack maintainer, but
I would rather not go into the discussion that ensued as a result.

Said USB stack improved slightly from 4.x to 7.x.  An entire re-write
was performed (what was then called USB2, not to be confused with the
USB 2.0 protocol) which is what's in use (in RELENG_8) today.  There
have been at least 3 different maintainers of the FreeBSD USB stack, and
all at different times / completely segregated.

I don't want my comments to make anyone think the problem described here
is in the FreeBSD USB stack.  I'm just stating some history for those
wondering about it, especially given the comments about Microsoft's
early USB stacks (particularly during the original Windows 95 days and
some other issues during the Win98 era).  My opinion/experiences are my
own.

The problem is that I don't know how to rule the USB stack out when it
comes to diagnosing the problem you're having.  There is the USB_DEBUG
option in one's kernel config which may or may not provide some
insights, but I imagine it's quite chatty and would justify the need for
serial or firewire console given the amount of console output.

  So I'm pretty sure the kernel is iterating over whatever cache buffers
  there are for I/O (I don't know what this is called technically) and
  issuing WRITE DMA or -EXT and either waiting for a non-error response
  from the device or issuing it blindly followed by a FLUSH CACHE or -EXT
  (either once per write or at the very end).
 
 Again, I really believe that the kernel fully believes that all writes
 are complete,
 at least to the disk cache. At that point the FS structures can be removed and
 the FS is no longer mounted as seen from the perspective of the
 system, this MUST
 be done before the disk cache is flushed and the FS is marked clean.
 I suspect,
 but don't know for sure, that the last two operations performed are to
 mark the drive
 clean and then do a cache flush. Of possible relevance is that none of the 
 file
 system is marked clean during a hung shutdown. All need to be FSCKed 
 although
 

Re: ports/sysutils/diskcheckd (Re: bad sector in gmirror HDD)

2011-08-31 Thread Chris Rees
On 25 August 2011 18:54, Chris Rees utis...@gmail.com wrote:
 On 24 August 2011 16:14,  per...@pluto.rain.com wrote:
 When the specified or calculated rate exceeds 64KB/sec, the
 required sleep interval between 64KB chunks is less than one
 second.  Since diskcheckd calculates the interval in whole seconds
 -- because it calls sleep() rather than usleep() or nanosleep()
 -- an interval of less than one second is calculated as zero ...
 I suspect the fix will be to calculate in microseconds, and call
 usleep() instead of sleep().

 I think I may have this fixed.

 Could one of you try the attached patch?  I'm especially interested
 to see if this also clears up the issues reported as connected with
 gmirror (http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/143566),
 since I haven't been able to reproduce that part here.

 Summary of changes:

 * Calculate delays in microseconds, so that delays of less than
  one second between reads (needed to implement rates exceeding
  64KB/sec) do not get rounded down to zero.

 * Fix a reinitialization problem when handling SIGHUP.

 * Additional debug messages (only with -d).

 * Comment and manpage improvememts.


 Hi Perry,

 The changes look good, so if there's no response for a few days I'll
 commit the changes.

 Thanks for rescuing the port :)


Committed. Thanks!

-- 
Chris Rees          | FreeBSD Developer
cr...@freebsd.org   | http://people.freebsd.org/~crees
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Unfixable UFS2 corruption

2011-08-31 Thread Eugene Grosbein
Hi!

Long story short: my /usr/local UFS2 filesystem somehow got corrupted
and fsck -y in single user mode does not fix it.

Explanation:

# ls -al /usr/local/obj/usr/local/src/secure/lib/libssh
ls: : No such file or directory
total 8
drwxr-xr-x  2 root  wheel  4608 Aug 30 01:28 .
drwxr-xr-x  3 root  wheel   512 Aug 30 01:28 ..

# rm -rf /usr/local/obj/usr/local/src/secure/lib/libssh
rm: /usr/local/obj/usr/local/src/secure/lib/libssh: Directory not empty

As I've said, I cold booted this FreeBSD 8.2-STABLE system to single user mode
where all file systems are not mounted (except root) and ran fsck -y /usr/local
It found no errors and said it is CLEAN. The problem still persists.

I've written small program and it said me this directory contains third file
(besides . and .. entries) having zero file length.

I got contents of the directory to plain file with
cat /usr/local/obj/usr/local/src/secure/lib/libssh  /tmp/libssh and put it 
online:
http://www.grosbein.net/crash/corruption/libssh

Please help. The program and its output follow:

#include sys/types.h
#include dirent.h
#include err.h
#include stdio.h

int main(int argc, char* argv[])
{

  DIR   *dirp;
  struct dirent *dp;
  unsigned  i;

  if (argc2)
return 1;

  if ( (dirp = opendir(argv[1])) == NULL )
err (1, opendir);

  i = 0;
  while ((dp = readdir(dirp)) != NULL) {
i++;
printf(Entry %u:\n
   d_fileno=%u\n
   d_reclen=%u\n
   d_type=%u\n
   d_namlen=%u\n
   d_name=%s\n\n,
   i, (unsigned) dp-d_fileno, (unsigned) dp-d_reclen,
   (unsigned) dp-d_type, (unsigned) dp-d_namlen,
   (char *) dp-d_name);
  }
  return closedir(dirp);
}

# # ./readdir /usr/local/obj/usr/local/src/secure/lib/libssh
Entry 1:
d_fileno=1531227
d_reclen=12
d_type=4
d_namlen=1
d_name=.

Entry 2:
d_fileno=1389650
d_reclen=500
d_type=4
d_namlen=2
d_name=..

Entry 3:
d_fileno=24
d_reclen=512
d_type=8
d_namlen=0
d_name=
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Unfixable UFS2 corruption

2011-08-31 Thread Eugene Grosbein
31.08.2011 21:35, Eugene Grosbein пишет:

 # ls -al /usr/local/obj/usr/local/src/secure/lib/libssh
 ls: : No such file or directory
 total 8
 drwxr-xr-x  2 root  wheel  4608 Aug 30 01:28 .
 drwxr-xr-x  3 root  wheel   512 Aug 30 01:28 ..
 
 # rm -rf /usr/local/obj/usr/local/src/secure/lib/libssh
 rm: /usr/local/obj/usr/local/src/secure/lib/libssh: Directory not empty
 
 As I've said, I cold booted this FreeBSD 8.2-STABLE system to single user mode
 where all file systems are not mounted (except root) and ran fsck -y 
 /usr/local
 It found no errors and said it is CLEAN. The problem still persists.
 
 I've written small program and it said me this directory contains third file
 (besides . and .. entries) having zero file length.

Not file but file name length is zero. I've just found that
dircheck() function in src/sbin/fsck_ffs/dir.c simply does not check
if d_namlen is zero as it should, shouldn't it?

Eugene Grosbein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Unfixable UFS2 corruption

2011-08-31 Thread Eugene Grosbein
31.08.2011 23:02, Adam Vande More пишет:

 Long story short: my /usr/local UFS2 filesystem somehow got corrupted
 and fsck -y in single user mode does not fix it.
 
 Not sure if this helps or not but on rare occasion I've had to run fsck twice 
 consecutively to fix a FS.

Not this time - fsck does NOT find any problems in this file system.

Now I think fsck_ffs needs a patch:

--- sbin/fsck_ffs/dir.c.orig2011-08-31 22:54:23.0 +0700
+++ sbin/fsck_ffs/dir.c 2011-08-31 22:54:48.0 +0700
@@ -225,7 +225,7 @@
type = dp-d_type;
if (dp-d_reclen  size ||
idesc-id_filesize  size ||
-   namlen  MAXNAMLEN ||
+   namlen == 0 || namlen  MAXNAMLEN ||
type  15)
goto bad;
for (cp = dp-d_name, size = 0; size  namlen; size++)


Comments?

Eugene Grosbein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Unfixable UFS2 corruption

2011-08-31 Thread Eugene Grosbein
31.08.2011 23:13, Eugene Grosbein пишет:
 31.08.2011 23:02, Adam Vande More пишет:
 
 Long story short: my /usr/local UFS2 filesystem somehow got corrupted
 and fsck -y in single user mode does not fix it.

 Not sure if this helps or not but on rare occasion I've had to run fsck 
 twice consecutively to fix a FS.
 
 Not this time - fsck does NOT find any problems in this file system.
 
 Now I think fsck_ffs needs a patch:
 
 --- sbin/fsck_ffs/dir.c.orig  2011-08-31 22:54:23.0 +0700
 +++ sbin/fsck_ffs/dir.c   2011-08-31 22:54:48.0 +0700
 @@ -225,7 +225,7 @@
   type = dp-d_type;
   if (dp-d_reclen  size ||
   idesc-id_filesize  size ||
 - namlen  MAXNAMLEN ||
 + namlen == 0 || namlen  MAXNAMLEN ||
   type  15)
   goto bad;
   for (cp = dp-d_name, size = 0; size  namlen; size++)
 
 
 Comments?

With this patch applied, my FS has finally been fixed by fsck:

** Last Mounted on /usr/local
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
DIRECTORY CORRUPTED  I=1531227  OWNER=root MODE=40755
SIZE=4608 MTIME=Aug 30 01:28 2011 
DIR=/obj/usr/local/src/secure/lib/libssh

SALVAGE? [yn] 

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
LINK COUNT FILE I=24  OWNER=root MODE=100644
SIZE=892 MTIME=Sep 17 11:10 2010  COUNT 2 SHOULD BE 1
ADJUST? [yn] 

** Phase 5 - Check Cyl groups
459580 files, 7411823 used, 7819495 free (105503 frags, 964249 blocks, 0.7% 
fragmentation)

* FILE SYSTEM IS CLEAN *

* FILE SYSTEM WAS MODIFIED *

Should I fill PR?

Eugene Grosbein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Unfixable UFS2 corruption

2011-08-31 Thread Eugene Grosbein
31.08.2011 23:34, Adrian Chadd пишет:
 Have you created a PR for this?

http://www.freebsd.org/cgi/query-pr.cgi?pr=160339

Eugene Grosbein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


mfi(4) patch to add MSI-X support, possibly address command timeouts

2011-08-31 Thread John Baldwin
I'd like some folks to test a patch to the mfi(4) driver that may help to 
address issues several folks have reported.  The patch does two things, first 
it adds some dummy reads of PCI registers when checking device status in the 
interrupt handler to flush the writes to ACK interrupts.  The Linux 
megaraid-sas driver uses this approach and some folks have tested a patch from 
Scott Long which had a somewhat similar effect.  Second, it enables the use of 
MSI-X interrupts for many newer devices.

The patch is available below and at www.freebsd.org/~jhb/patches/mfi.patch

Index: mfi_pci.c
===
--- mfi_pci.c   (revision 224613)
+++ mfi_pci.c   (working copy)
@@ -169,7 +169,7 @@
struct mfi_softc *sc;
struct mfi_ident *m;
uint32_t command;
-   int error;
+   int count, error;
 
sc = device_get_softc(dev);
bzero(sc, sizeof(*sc));
@@ -226,6 +226,29 @@
goto out;
}
 
+   /* Allocate IRQ resource. */
+   sc-mfi_irq_rid = 0;
+   switch (pci_get_device(sc-mfi_dev)) {
+   case 0x0060:/* SAS1078R */
+   case 0x007c:/* SAS1078DE */
+   case 0x0413:/* Verde ZCR */
+   /* Do not use MSI-X for these systems. */
+   break;
+   default:
+   count = 1;
+   if (pci_alloc_msix(sc-mfi_dev, count) == 0) {
+   device_printf(sc-mfi_dev, Using MSI-X\n);
+   sc-mfi_irq_rid = 1;
+   }
+   break;
+   }
+   if ((sc-mfi_irq = bus_alloc_resource_any(sc-mfi_dev, SYS_RES_IRQ,
+   sc-mfi_irq_rid, RF_SHAREABLE | RF_ACTIVE)) == NULL) {
+   device_printf(sc-mfi_dev, Cannot allocate interrupt\n);
+   error = EINVAL;
+   goto out;
+   }
+
error = mfi_attach(sc);
 out:
if (error) {
@@ -280,6 +303,8 @@
bus_release_resource(sc-mfi_dev, SYS_RES_MEMORY,
sc-mfi_regs_rid, sc-mfi_regs_resource);
}
+   if (sc-mfi_irq_rid != 0)
+   pci_release_msi(sc-mfi_dev);
 
return;
 }
Index: mfi.c
===
--- mfi.c   (revision 224613)
+++ mfi.c   (working copy)
@@ -157,6 +157,9 @@
 mfi_enable_intr_xscale(struct mfi_softc *sc)
 {
MFI_WRITE4(sc, MFI_OMSK, 0x01);
+
+   /* Dummy read to force PCI flush. */
+   (void)MFI_READ4(sc, MFI_OMSK);
 }
 
 static void
@@ -168,6 +171,9 @@
} else if (sc-mfi_flags  MFI_FLAGS_GEN2) {
MFI_WRITE4(sc, MFI_OMSK, ~MFI_GEN2_EIM);
}
+
+   /* Dummy read to force PCI flush. */
+   (void)MFI_READ4(sc, MFI_OMSK);
 }
 
 static int32_t
@@ -192,6 +198,9 @@
return 1;
 
MFI_WRITE4(sc, MFI_OSTS, status);
+
+   /* Dummy read to force PCI flush. */
+   (void)MFI_READ4(sc, MFI_OSTS);
return 0;
 }
 
@@ -212,6 +221,9 @@
}
 
MFI_WRITE4(sc, MFI_ODCR0, status);
+
+   /* Dummy read to force PCI flush. */
+   (void)MFI_READ4(sc, MFI_OSTS);
return 0;
 }
 
@@ -484,15 +496,8 @@
mtx_unlock(sc-mfi_io_lock);
 
/*
-* Set up the interrupt handler.  XXX This should happen in
-* mfi_pci.c
+* Set up the interrupt handler.
 */
-   sc-mfi_irq_rid = 0;
-   if ((sc-mfi_irq = bus_alloc_resource_any(sc-mfi_dev, SYS_RES_IRQ,
-   sc-mfi_irq_rid, RF_SHAREABLE | RF_ACTIVE)) == NULL) {
-   device_printf(sc-mfi_dev, Cannot allocate interrupt\n);
-   return (EINVAL);
-   }
if (bus_setup_intr(sc-mfi_dev, sc-mfi_irq, INTR_MPSAFE|INTR_TYPE_BIO,
NULL, mfi_intr, sc, sc-mfi_intr)) {
device_printf(sc-mfi_dev, Cannot set up interrupt\n);

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) patch to add MSI-X support, possibly address command timeouts

2011-08-31 Thread Sergey Kandaurov
On 31 August 2011 21:34, John Baldwin j...@freebsd.org wrote:
 I'd like some folks to test a patch to the mfi(4) driver that may help to
 address issues several folks have reported.  The patch does two things, first
 it adds some dummy reads of PCI registers when checking device status in the
 interrupt handler to flush the writes to ACK interrupts.  The Linux
 megaraid-sas driver uses this approach and some folks have tested a patch from
 Scott Long which had a somewhat similar effect.  Second, it enables the use of
 MSI-X interrupts for many newer devices.

 The patch is available below and at www.freebsd.org/~jhb/patches/mfi.patch

mfi0: LSI MegaSAS Gen2 port 0x3000-0x30ff mem
0x9dd4-0x9dd43fff,0x9dd0-0x9dd3 irq 26 at device 0.0 on
pci26
mfi0: Using MSI-X
mfi0: Megaraid SAS driver Ver 3.00

However, booting never finishes ending up with:
mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 58 SECONDS
mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 88 SECONDS
mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 118 SECONDS
mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 148 SECONDS
mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 179 SECONDS
mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 209 SECONDS

Patch applied and tested on RELENG_8_2.

mfi0@pci0:26:0:0:   class=0x010400 card=0x03b21014 chip=0x00791000
rev=0x03 hdr=0x00
vendor = 'LSI Logic (Was: Symbios Logic, NCR)'
class  = mass storage
subclass   = RAID
bar   [10] = type I/O Port, range 32, base 0x3000, size 256, enabled
bar   [14] = type Memory, range 64, base 0x9dd4, size 16384, enabled
bar   [1c] = type Memory, range 64, base 0x9dd0, size 262144, enabled
cap 01[50] = powerspec 3  supports D0 D1 D2 D3  current D0
cap 10[68] = PCI-Express 2 endpoint max data 256(4096) link x8(x8)
cap 03[d0] = VPD
cap 05[a8] = MSI supports 1 message, 64 bit
cap 11[c0] = MSI-X supports 15 messages in map 0x14
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
ecap 0004[138] = unknown 1

-- 
wbr,
pluknet
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) patch to add MSI-X support, possibly address command timeouts

2011-08-31 Thread John Baldwin
On Wednesday, August 31, 2011 3:24:12 pm Sergey Kandaurov wrote:
 On 31 August 2011 21:34, John Baldwin j...@freebsd.org wrote:
  I'd like some folks to test a patch to the mfi(4) driver that may help to
  address issues several folks have reported.  The patch does two things, 
  first
  it adds some dummy reads of PCI registers when checking device status in the
  interrupt handler to flush the writes to ACK interrupts.  The Linux
  megaraid-sas driver uses this approach and some folks have tested a patch 
  from
  Scott Long which had a somewhat similar effect.  Second, it enables the use 
  of
  MSI-X interrupts for many newer devices.
 
  The patch is available below and at www.freebsd.org/~jhb/patches/mfi.patch
 
 mfi0: LSI MegaSAS Gen2 port 0x3000-0x30ff mem
 0x9dd4-0x9dd43fff,0x9dd0-0x9dd3 irq 26 at device 0.0 on
 pci26
 mfi0: Using MSI-X
 mfi0: Megaraid SAS driver Ver 3.00
 
 However, booting never finishes ending up with:
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 58 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 88 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 118 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 148 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 179 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 209 SECONDS

Did this work fine without the patch?

Also, does it work fine if you disable MSI-X via 'hw.pci.enable_msix=0'
in the loader?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: mfi(4) patch to add MSI-X support, possibly address command timeouts

2011-08-31 Thread Sergey Kandaurov
On 1 September 2011 01:17, John Baldwin j...@freebsd.org wrote:
 On Wednesday, August 31, 2011 3:24:12 pm Sergey Kandaurov wrote:
 On 31 August 2011 21:34, John Baldwin j...@freebsd.org wrote:
  I'd like some folks to test a patch to the mfi(4) driver that may help to
  address issues several folks have reported.  The patch does two things, 
  first
  it adds some dummy reads of PCI registers when checking device status in 
  the
  interrupt handler to flush the writes to ACK interrupts.  The Linux
  megaraid-sas driver uses this approach and some folks have tested a patch 
  from
  Scott Long which had a somewhat similar effect.  Second, it enables the 
  use of
  MSI-X interrupts for many newer devices.
 
  The patch is available below and at www.freebsd.org/~jhb/patches/mfi.patch

 mfi0: LSI MegaSAS Gen2 port 0x3000-0x30ff mem
 0x9dd4-0x9dd43fff,0x9dd0-0x9dd3 irq 26 at device 0.0 on
 pci26
 mfi0: Using MSI-X
 mfi0: Megaraid SAS driver Ver 3.00

 However, booting never finishes ending up with:
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 58 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 88 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 118 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 148 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 179 SECONDS
 mfi0: COMMAND 0xff8000b3a550 TIMEOUT AFTER 209 SECONDS

 Did this work fine without the patch?

Yes, like a charm.


 Also, does it work fine if you disable MSI-X via 'hw.pci.enable_msix=0'
 in the loader?

I will try this tomorrow.
Thanks.

-- 
wbr,
pluknet
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Unable to shutdown

2011-08-31 Thread perryh
Jeremy Chadwick free...@jdc.parodius.com wrote:
 On Tue, Aug 30, 2011 at 11:04:43PM -0700, Kevin Oberman wrote:
  ... the standrad does not specify EXACTLY what triggers a
  transition from standby to ready (PM2 to PM0). Only that it is
  something that requires media access. A write does not
  necessarily require media access if you define media as the
  disk platter.

 You're correct -- media access could mean, literally, accessing
 the platter OR it could mean LBA read/write I/O.  Then comes
 into question whether or not the drive returning something from
 its on-board cache would count as media access or not.

 T13 should probably clarify on this point, and this is one I do
 not have an answer for myself.  I strongly believe media access
 means LBA read/write I/O and regardless if it's data that's in
 the on-board cache on the disk or not.  I wonder if this behaviour
 varies per drive model.

Given a standard which is, shall we say, open to interpretation,
I think the liklihood approaches 100% that it has been interpreted
differently by different manufacturers -- or even by different
firmware authors within a single manufacturer.  I would be amazed
if the behaviour did _not_ vary among drive models.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Unable to shutdown

2011-08-31 Thread Kevin Oberman
On Thu, Sep 1, 2011 at 2:01 AM,  per...@pluto.rain.com wrote:
 Jeremy Chadwick free...@jdc.parodius.com wrote:
 On Tue, Aug 30, 2011 at 11:04:43PM -0700, Kevin Oberman wrote:
  ... the standrad does not specify EXACTLY what triggers a
  transition from standby to ready (PM2 to PM0). Only that it is
  something that requires media access. A write does not
  necessarily require media access if you define media as the
  disk platter.

 You're correct -- media access could mean, literally, accessing
 the platter OR it could mean LBA read/write I/O.  Then comes
 into question whether or not the drive returning something from
 its on-board cache would count as media access or not.

 T13 should probably clarify on this point, and this is one I do
 not have an answer for myself.  I strongly believe media access
 means LBA read/write I/O and regardless if it's data that's in
 the on-board cache on the disk or not.  I wonder if this behaviour
 varies per drive model.

 Given a standard which is, shall we say, open to interpretation,
 I think the liklihood approaches 100% that it has been interpreted
 differently by different manufacturers -- or even by different
 firmware authors within a single manufacturer.  I would be amazed
 if the behaviour did _not_ vary among drive models.

And, if you tell your firmware writers that they should look for any
technique that
reduces power consumption, I don't doubt that keeping the disk in
standby until there
was a reason to move data from write cache to disk would look good. I would hope
that they would not make a cache flush lie, but that used to be common
on old ATA
drives.
-- 
R. Kevin Oberman, Network Engineer - Retired
E-mail: kob6...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org