Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Florian Attenberger
On Mon, 28 Jan 2008 14:13:21 -0500
Gene Heskett <[EMAIL PROTECTED]> wrote:


> >> I had to reboot early this morning due to a freezeup, and I had a
> >> bunch of these in the messages log:
> >> ==
> >> Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0
> >> SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 19:42:11 coyote kernel:
> >> [42461.915973] ata1.00: cmd ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma
> >> 4096 out Jan 27 19:42:11 coyote kernel: [42461.915974]  res
> >> 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 19:42:11
> >> coyote kernel: [42461.915978] ata1.00: status: { DRDY } Jan 27 19:42:11
> >> coyote kernel: [42461.916005] ata1: soft resetting link Jan 27 19:42:12
> >> coyote kernel: [42462.078216] ata1.00: configured for UDMA/100 Jan 27
> >> 19:42:12 coyote kernel: [42462.078232] ata1: EH complete
> >> Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968
> >> 512-byte hardware sectors (200050 MB) Jan 27 19:42:12 coyote kernel:
> >> [42462.114230] sd 0:0:0:0: [sda] Write Protect is off Jan 27 19:42:12
> >> coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: enabled, read
> >> cache: enabled, doesn't support DPO or FUA
> >> ===


I had this error too, or maybe only a similar one, and another, neither
of which of i still have the error output laying around, so I'm posting both
fixes, that i found here on lkml:
1) disabling ncq like that:
"echo 1 > /sys/block/sda/device/queue_depth" 
2) this patch: libata_drain_fifo_on_stuck_drq_hsm.patch 
( applies to 2.6.24 too )

Signed-off-by: Mark Lord <[EMAIL PROTECTED]>
---

--- old/drivers/ata/libata-sff.c2007-09-28 09:29:22.0 -0400
+++ linux/drivers/ata/libata-sff.c  2007-09-28 09:39:44.0 -0400
@@ -420,6 +420,28 @@
ap->ops->irq_on(ap);
 }
 
+static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+   u8 stat = ata_chk_status(ap);
+   /*
+* Try to clear stuck DRQ if necessary,
+* by reading/discarding up to two sectors worth of data.
+*/
+   if ((stat & ATA_DRQ) && (!qc || qc->dma_dir != DMA_TO_DEVICE)) {
+   unsigned int i;
+   unsigned int limit = qc ? qc->sect_size : ATA_SECT_SIZE;
+
+   printk(KERN_WARNING "Draining up to %u words from data FIFO.\n",
+   limit);
+   for (i = 0; i < limit ; ++i) {
+   ioread16(ap->ioaddr.data_addr);
+   if (!(ata_chk_status(ap) & ATA_DRQ))
+   break;
+   }
+   printk(KERN_WARNING "Drained %u/%u words.\n", i, limit);
+   }
+}
+
 /**
  * ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
  * @ap: port to handle error for
@@ -476,7 +498,7 @@
}
 
ata_altstatus(ap);
-   ata_chk_status(ap);
+   ata_drain_fifo(ap, qc);
ap->ops->irq_clear(ap);
 
spin_unlock_irqrestore(ap->lock, flags);
-





-- 
Florian Attenberger <[EMAIL PROTECTED]>


pgpqZfRawkKTf.pgp
Description: PGP signature


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Florian Attenberger
On Mon, 28 Jan 2008 14:13:21 -0500
Gene Heskett [EMAIL PROTECTED] wrote:


  I had to reboot early this morning due to a freezeup, and I had a
  bunch of these in the messages log:
  ==
  Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0
  SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 19:42:11 coyote kernel:
  [42461.915973] ata1.00: cmd ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma
  4096 out Jan 27 19:42:11 coyote kernel: [42461.915974]  res
  40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 19:42:11
  coyote kernel: [42461.915978] ata1.00: status: { DRDY } Jan 27 19:42:11
  coyote kernel: [42461.916005] ata1: soft resetting link Jan 27 19:42:12
  coyote kernel: [42462.078216] ata1.00: configured for UDMA/100 Jan 27
  19:42:12 coyote kernel: [42462.078232] ata1: EH complete
  Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968
  512-byte hardware sectors (200050 MB) Jan 27 19:42:12 coyote kernel:
  [42462.114230] sd 0:0:0:0: [sda] Write Protect is off Jan 27 19:42:12
  coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: enabled, read
  cache: enabled, doesn't support DPO or FUA
  ===


I had this error too, or maybe only a similar one, and another, neither
of which of i still have the error output laying around, so I'm posting both
fixes, that i found here on lkml:
1) disabling ncq like that:
echo 1  /sys/block/sda/device/queue_depth 
2) this patch: libata_drain_fifo_on_stuck_drq_hsm.patch 
( applies to 2.6.24 too )

Signed-off-by: Mark Lord [EMAIL PROTECTED]
---

--- old/drivers/ata/libata-sff.c2007-09-28 09:29:22.0 -0400
+++ linux/drivers/ata/libata-sff.c  2007-09-28 09:39:44.0 -0400
@@ -420,6 +420,28 @@
ap-ops-irq_on(ap);
 }
 
+static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+   u8 stat = ata_chk_status(ap);
+   /*
+* Try to clear stuck DRQ if necessary,
+* by reading/discarding up to two sectors worth of data.
+*/
+   if ((stat  ATA_DRQ)  (!qc || qc-dma_dir != DMA_TO_DEVICE)) {
+   unsigned int i;
+   unsigned int limit = qc ? qc-sect_size : ATA_SECT_SIZE;
+
+   printk(KERN_WARNING Draining up to %u words from data FIFO.\n,
+   limit);
+   for (i = 0; i  limit ; ++i) {
+   ioread16(ap-ioaddr.data_addr);
+   if (!(ata_chk_status(ap)  ATA_DRQ))
+   break;
+   }
+   printk(KERN_WARNING Drained %u/%u words.\n, i, limit);
+   }
+}
+
 /**
  * ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
  * @ap: port to handle error for
@@ -476,7 +498,7 @@
}
 
ata_altstatus(ap);
-   ata_chk_status(ap);
+   ata_drain_fifo(ap, qc);
ap-ops-irq_clear(ap);
 
spin_unlock_irqrestore(ap-lock, flags);
-





-- 
Florian Attenberger [EMAIL PROTECTED]


pgpqZfRawkKTf.pgp
Description: PGP signature


Re: any known issues with leap seconds in 2.6.10?

2007-07-03 Thread Florian Attenberger
On Tue, Jul 03, 2007 at 11:54:45AM -0400, Chris Friesen wrote:
>
> I'm just wondering if anyone knows of issues with leap second handling in 
> 2.6.10.  We just had a field incident where a couple of quad-x86 machines 
> went down at just before midnight (UTC) on June 30th...which is when leap 
> seconds would normally be applied.
>
> From our logs it looks like something went crazy while holding the xtime 
> lock.
>
> If anyone knows of issues with that code I'd love to hear about it.
>
> Thanks,
>
> Chris
look at the thread  '2.6.21.5 june 30th to july 1st date hang'

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.5 june 30th to july 1st date hang?

2007-07-03 Thread Florian Attenberger
On Tue, Jul 03, 2007 at 04:20:17PM +0200, Arne Georg Gleditsch wrote:
> Florian Attenberger <[EMAIL PROTECTED]> writes:
> > there was one 'special' event at that date:
> > syslog.2.gz:Jul  1 01:59:59 master kernel: Clock: inserting leap second
> > 23:59:60 UTC
> 
> As far as I can tell, no leap second was due to be inserted at 1. of
> July this year.  Is the year set correctly for this box?
>
yep, controlled by ntpd.
You're right according to
ftp://hpiers.obspm.fr/iers/bul/bulc/bulletinc.33
that event shouldn't have been there.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.5 june 30th to july 1st date hang?

2007-07-03 Thread Florian Attenberger
On Tue, Jul 03, 2007 at 08:44:00AM -0400, Fortier,Vincent [Montreal] wrote:
> Hi all,
> 
> All my servers and workstations running a 2.6.21.5 kernel hanged exactly
> when the date shift from june 30th to july 1st.
> 
> On my monitoring system every single station running a 2.6.21.5 kernel
> stoped responding exactly after midnight on the date shift from June
> 30th to July 1st.  Although, stations still running 2.6.18 to 2.6.20.11
> worked flawlessly.
> 
> I first tought there had been an electricity outage but two of my
> servers (dell PE 2950 dual-quad core) on UPS in our server room also
> hanged:
> Jun 30 23:55:01 urpdev1 /USR/SBIN/CRON[31298]: (root) CMD ([ -x
> /usr/lib/sysstat/sa1 ] && { [ -r "$DEFAULT" ] && . "$DEFAULT" ; [
> "$ENABLED" = "true" ] && exec /usr/lib/sysstat/sa1; })
> Jul  3 11:54:03 urpdev1 syslogd 1.4.1#17: restart.
> 
> I could not get anything on any of the 20+ consoles...  All the systems
> hanged at around the exact same time... When the date shifted from June
> 30th to July 1st in UTC ...?
> 
> Any clue any one?
> 
> - vin
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
there was one 'special' event at that date:
syslog.2.gz:Jul  1 01:59:59 master kernel: Clock: inserting leap second
23:59:60 UTC

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.5 june 30th to july 1st date hang?

2007-07-03 Thread Florian Attenberger
On Tue, Jul 03, 2007 at 08:44:00AM -0400, Fortier,Vincent [Montreal] wrote:
 Hi all,
 
 All my servers and workstations running a 2.6.21.5 kernel hanged exactly
 when the date shift from june 30th to july 1st.
 
 On my monitoring system every single station running a 2.6.21.5 kernel
 stoped responding exactly after midnight on the date shift from June
 30th to July 1st.  Although, stations still running 2.6.18 to 2.6.20.11
 worked flawlessly.
 
 I first tought there had been an electricity outage but two of my
 servers (dell PE 2950 dual-quad core) on UPS in our server room also
 hanged:
 Jun 30 23:55:01 urpdev1 /USR/SBIN/CRON[31298]: (root) CMD ([ -x
 /usr/lib/sysstat/sa1 ]  { [ -r $DEFAULT ]  . $DEFAULT ; [
 $ENABLED = true ]  exec /usr/lib/sysstat/sa1; })
 Jul  3 11:54:03 urpdev1 syslogd 1.4.1#17: restart.
 
 I could not get anything on any of the 20+ consoles...  All the systems
 hanged at around the exact same time... When the date shifted from June
 30th to July 1st in UTC ...?
 
 Any clue any one?
 
 - vin
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

there was one 'special' event at that date:
syslog.2.gz:Jul  1 01:59:59 master kernel: Clock: inserting leap second
23:59:60 UTC

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.5 june 30th to july 1st date hang?

2007-07-03 Thread Florian Attenberger
On Tue, Jul 03, 2007 at 04:20:17PM +0200, Arne Georg Gleditsch wrote:
 Florian Attenberger [EMAIL PROTECTED] writes:
  there was one 'special' event at that date:
  syslog.2.gz:Jul  1 01:59:59 master kernel: Clock: inserting leap second
  23:59:60 UTC
 
 As far as I can tell, no leap second was due to be inserted at 1. of
 July this year.  Is the year set correctly for this box?

yep, controlled by ntpd.
You're right according to
ftp://hpiers.obspm.fr/iers/bul/bulc/bulletinc.33
that event shouldn't have been there.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: any known issues with leap seconds in 2.6.10?

2007-07-03 Thread Florian Attenberger
On Tue, Jul 03, 2007 at 11:54:45AM -0400, Chris Friesen wrote:

 I'm just wondering if anyone knows of issues with leap second handling in 
 2.6.10.  We just had a field incident where a couple of quad-x86 machines 
 went down at just before midnight (UTC) on June 30th...which is when leap 
 seconds would normally be applied.

 From our logs it looks like something went crazy while holding the xtime 
 lock.

 If anyone knows of issues with that code I'd love to hear about it.

 Thanks,

 Chris
look at the thread  '2.6.21.5 june 30th to july 1st date hang'

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.22-rc6] sata_mv: PCI-ID for Adaptec 1430SA SATA Controller

2007-07-02 Thread Florian Attenberger
Signed-off-by: Florian Attenberger  <[EMAIL PROTECTED]>


--- 2.6.22-rc6/drivers/ata/sata_mv.c2007-06-30 16:21:47.462020256 +0200
+++ 2.6.22-rc6.mine/drivers/ata/sata_mv.c   2007-06-30 16:25:25.999165444 
+0200
@@ -582,6 +582,9 @@ static const struct pci_device_id mv_pci
 
{ PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x },
 
+   /* Adaptec 1430SA */
+   { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 },
+
{ PCI_VDEVICE(TTI, 0x2310), chip_7042 },
 
/* add Marvell 7042 support */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.22-rc6] sata_mv: PCI-ID for Adaptec 1430SA SATA Controller

2007-07-02 Thread Florian Attenberger
Signed-off-by: Florian Attenberger  [EMAIL PROTECTED]


--- 2.6.22-rc6/drivers/ata/sata_mv.c2007-06-30 16:21:47.462020256 +0200
+++ 2.6.22-rc6.mine/drivers/ata/sata_mv.c   2007-06-30 16:25:25.999165444 
+0200
@@ -582,6 +582,9 @@ static const struct pci_device_id mv_pci
 
{ PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x },
 
+   /* Adaptec 1430SA */
+   { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 },
+
{ PCI_VDEVICE(TTI, 0x2310), chip_7042 },
 
/* add Marvell 7042 support */
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.22-rc6] add PCI-ID for Adaptec 1430SA 4-Port SATA Controller

2007-06-30 Thread Florian Attenberger
Hi,

added this pci id to support my:
lspci:
01:00.0 RAID bus controller: Adaptec Unknown device 0243 (rev 02)
lspci -n:
01:00.0 0104: 9005:0243 (rev 02)

seems to work fine.

florian attenberger


--- 2.6.22-rc6/drivers/ata/sata_mv.c2007-06-30 16:21:47.462020256 +0200
+++ 2.6.22-rc6.mine/drivers/ata/sata_mv.c   2007-06-30 16:25:25.999165444 
+0200
@@ -582,6 +582,9 @@ static const struct pci_device_id mv_pci
 
{ PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x },
 
+   /* Adaptec 1430SA */
+   { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 },
+
{ PCI_VDEVICE(TTI, 0x2310), chip_7042 },
 
/* add Marvell 7042 support */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.22-rc6] add PCI-ID for Adaptec 1430SA 4-Port SATA Controller

2007-06-30 Thread Florian Attenberger
Hi,

added this pci id to support my:
lspci:
01:00.0 RAID bus controller: Adaptec Unknown device 0243 (rev 02)
lspci -n:
01:00.0 0104: 9005:0243 (rev 02)

seems to work fine.

florian attenberger


--- 2.6.22-rc6/drivers/ata/sata_mv.c2007-06-30 16:21:47.462020256 +0200
+++ 2.6.22-rc6.mine/drivers/ata/sata_mv.c   2007-06-30 16:25:25.999165444 
+0200
@@ -582,6 +582,9 @@ static const struct pci_device_id mv_pci
 
{ PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x },
 
+   /* Adaptec 1430SA */
+   { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 },
+
{ PCI_VDEVICE(TTI, 0x2310), chip_7042 },
 
/* add Marvell 7042 support */
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PMTU, MSS and "fragmentation needed" problem with linux?

2005-04-07 Thread Florian Attenberger
from my shorewall.conf.
-
#
# MSS CLAMPING
#
# Set this variable to "Yes" or "yes" if you want the TCP "Clamp MSS to 
PMTU"
# option. This option is most commonly required when your internet
# interface is some variant of PPP (PPTP or PPPoE). Your kernel must
# have CONFIG_IP_NF_TARGET_TCPMSS set.
#
# [From the kernel help:
#
#This option adds a `TCPMSS' target, which allows you to alter the
#MSS value of TCP SYN packets, to control the maximum size for that
#connection (usually limiting it to your outgoing interface's MTU
#minus 40).
#
#This is used to overcome criminally braindead ISPs or servers which
#block ICMP Fragmentation Needed packets.  The symptoms of this
#problem are that everything works fine from your Linux
#firewall/router, but machines behind it can never exchange large
#packets:
#1) Web browsers connect, then hang with no data received.
#2) Small mail works fine, but large emails hang.
#3) ssh works fine, but scp hangs after initial handshaking.
# ]
#
# If left blank, or set to "No" or "no", the option is not enabled.
#
CLAMPMSS=1412


see also:
http://en.tldp.org/HOWTO/IP-Masquerade-HOWTO/mtu-issues.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PMTU, MSS and fragmentation needed problem with linux?

2005-04-07 Thread Florian Attenberger
from my shorewall.conf.
-
#
# MSS CLAMPING
#
# Set this variable to Yes or yes if you want the TCP Clamp MSS to 
PMTU
# option. This option is most commonly required when your internet
# interface is some variant of PPP (PPTP or PPPoE). Your kernel must
# have CONFIG_IP_NF_TARGET_TCPMSS set.
#
# [From the kernel help:
#
#This option adds a `TCPMSS' target, which allows you to alter the
#MSS value of TCP SYN packets, to control the maximum size for that
#connection (usually limiting it to your outgoing interface's MTU
#minus 40).
#
#This is used to overcome criminally braindead ISPs or servers which
#block ICMP Fragmentation Needed packets.  The symptoms of this
#problem are that everything works fine from your Linux
#firewall/router, but machines behind it can never exchange large
#packets:
#1) Web browsers connect, then hang with no data received.
#2) Small mail works fine, but large emails hang.
#3) ssh works fine, but scp hangs after initial handshaking.
# ]
#
# If left blank, or set to No or no, the option is not enabled.
#
CLAMPMSS=1412


see also:
http://en.tldp.org/HOWTO/IP-Masquerade-HOWTO/mtu-issues.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/