Re: [PATCH] scsi: target/sbp: remove firewire SBP target driver
On 16/06/2020 16:34, James Bottomley wrote: > On Tue, 2020-06-16 at 14:13 +, Johannes Thumshirn wrote: >> On 16/06/2020 16:09, Bart Van Assche wrote: >>> On 2020-06-16 02:42, Finn Thain wrote: >>>> Martin said, "I'd appreciate a patch to remove it" >>>> >>>> And Bart said, "do you want to keep this driver in the kernel >>>> tree?" >>>> >>>> AFAICT both comments are quite ambiguous. I don't see an >>>> actionable request, just an expression of interest from people >>>> doing their jobs. >>>> >>>> Note well: there is no pay check associated with having a >>>> MAINTAINERS file >>>> entry. >>> >>> Hi Finn, >>> >>> As far as I know the sbp driver only has had one user ever and that >>> user is no longer user the sbp driver. So why to keep it in the >>> kernel tree? Restoring a kernel driver can be easy - the first step >>> is a "git revert". >> >> Why not move the driver to drivers/staging for 2 or 3 kernel releases >> and if noone steps up, delete it? > > Because that's pretty much the worst of all worlds: If the driver is > simply going orphaned it can stay where it is to avoid confusion. If > it's being removed, it's better to remove it from where it is because > that makes the patch to restore it easy to find. > > Chris, the thing is this: if this driver has just one user on a stable > distro who complains about its removal six months to two years from > now, Linus will descend on us from a great height (which won't matter > to you, since you'll be long gone). This makes everyone very wary of > outright removal. If you're really, really sure it has no users, it > can be deleted, but if there's the slightest chance it has just one, it > should get orphaned. My patch to delete the driver was based on Martin's original request: https://lore.kernel.org/lkml/yq1img99d4k@ca-mkp.ca.oracle.com/ I don't especially want it to be gone, nor can I be sure there are no users of what is as far as I can tell a working piece of code. I can tell you that I never hear about it (other than the odd patch), whereas I do get emails out of the blue for some of my other (much smaller) stuff which clearly has users. I'd be just as happy for this to be orphaned or for nothing to happen to it. Honestly, I am totally ambivalent as to what happens to this code. Martin, however, clearly cares enough to have asked me to supply a patch to remove it. Cheers, Chris -- Chris Boot bo...@boo.tc
Re: [PATCH] scsi: target/sbp: remove firewire SBP target driver
On 15/06/2020 00:28, Finn Thain wrote: > On Sun, 14 Jun 2020, Chris Boot wrote: > >> I expect that if someone finds this useful it can stick around (but >> that's not my call). > > Who's call is that? If the patch had said "From: Martin K. Petersen" and > "This driver is being removed because it has the following defects..." > that would be some indication of a good-faith willingness to accept users > as developers in the spirit of the GPL, which is what you seem to be > alluding to (?). If you're asking me, I'd say it was martin's call: > SCSI TARGET SUBSYSTEM > > M: "Martin K. Petersen" > [...] > F: drivers/target/ > > F: include/target/ > >> I just don't have the time or inclination or hardware to be able to >> maintain it anymore, so someone else would have to pick it up. >> > > Which is why most drivers get orphaned, right? Sure, but that's not what Martin asked me to do, hence this patch. -- Chris Boot bo...@boo.tc
Re: [PATCH] scsi: target/sbp: remove firewire SBP target driver
On 14/06/2020 01:03, Finn Thain wrote: > On Sat, 13 Jun 2020, Chris Boot wrote: > >> I no longer have the time to maintain this subsystem nor the hardware to >> test patches with. > > Then why not patch MAINTAINERS, and orphan it, as per usual practice? > > $ git log --oneline MAINTAINERS | grep -i orphan My patch to remove it was in response to: https://lore.kernel.org/lkml/yq1img99d4k@ca-mkp.ca.oracle.com/ >> It also doesn't appear to have any active users so I doubt anyone will >> miss it. >> > > It's not unusual that any Linux driver written more than 5 years ago > "doesn't appear to have any active users". > > If a driver has been orphaned and broken in the past, and no-one stepped > up to fix it within a reasonable period, removal would make sense. But > that's not the case here. > > I haven't used this driver for a long time, but I still own PowerMacs with > firewire, and I know I'm not the only one. I expect that if someone finds this useful it can stick around (but that's not my call). I just don't have the time or inclination or hardware to be able to maintain it anymore, so someone else would have to pick it up. Cheers, Chris -- Chris Boot bo...@boo.tc
[PATCH] scsi: target/sbp: remove SBP target driver
I no longer have the time to maintain this subsystem nor the hardware to test patches with. It also doesn't appear to have any active users so I doubt anyone will miss it. Signed-off-by: Chris Boot --- MAINTAINERS |9 - drivers/target/Kconfig |1 - drivers/target/Makefile |1 - drivers/target/sbp/Kconfig | 12 - drivers/target/sbp/Makefile |2 - drivers/target/sbp/sbp_target.c | 2350 --- drivers/target/sbp/sbp_target.h | 243 7 files changed, 2618 deletions(-) delete mode 100644 drivers/target/sbp/Kconfig delete mode 100644 drivers/target/sbp/Makefile delete mode 100644 drivers/target/sbp/sbp_target.c delete mode 100644 drivers/target/sbp/sbp_target.h diff --git a/MAINTAINERS b/MAINTAINERS index 56d7d27fc114..81b7db7d68a8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6669,15 +6669,6 @@ S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media.git F: drivers/media/firewire/ -FIREWIRE SBP-2 TARGET -M: Chris Boot -L: linux-s...@vger.kernel.org -L: target-de...@vger.kernel.org -L: linux1394-de...@lists.sourceforge.net -S: Maintained -T: git git://git.kernel.org/pub/scm/linux/kernel/git/nab/lio-core-2.6.git master -F: drivers/target/sbp/ - FIREWIRE SUBSYSTEM M: Stefan Richter L: linux1394-de...@lists.sourceforge.net diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig index c163b14774d7..4a5682745ada 100644 --- a/drivers/target/Kconfig +++ b/drivers/target/Kconfig @@ -46,6 +46,5 @@ config TCM_USER2 source "drivers/target/loopback/Kconfig" source "drivers/target/tcm_fc/Kconfig" source "drivers/target/iscsi/Kconfig" -source "drivers/target/sbp/Kconfig" endif diff --git a/drivers/target/Makefile b/drivers/target/Makefile index 45634747377e..c13da05af2e2 100644 --- a/drivers/target/Makefile +++ b/drivers/target/Makefile @@ -29,4 +29,3 @@ obj-$(CONFIG_TCM_USER2) += target_core_user.o obj-$(CONFIG_LOOPBACK_TARGET) += loopback/ obj-$(CONFIG_TCM_FC) += tcm_fc/ obj-$(CONFIG_ISCSI_TARGET) += iscsi/ -obj-$(CONFIG_SBP_TARGET) += sbp/ diff --git a/drivers/target/sbp/Kconfig b/drivers/target/sbp/Kconfig deleted file mode 100644 index 53a1c75f5660.. --- a/drivers/target/sbp/Kconfig +++ /dev/null @@ -1,12 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0-only -config SBP_TARGET - tristate "FireWire SBP-2 fabric module" - depends on FIREWIRE - help - Say Y or M here to enable SCSI target functionality over FireWire. - This enables you to expose SCSI devices to other nodes on the FireWire - bus, for example hard disks. Similar to FireWire Target Disk mode on - many Apple computers. - - To compile this driver as a module, say M here: The module will be - called sbp-target. diff --git a/drivers/target/sbp/Makefile b/drivers/target/sbp/Makefile deleted file mode 100644 index 766f23690013.. --- a/drivers/target/sbp/Makefile +++ /dev/null @@ -1,2 +0,0 @@ -# SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_SBP_TARGET) += sbp_target.o diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c deleted file mode 100644 index e4a9b9fe3dfb.. --- a/drivers/target/sbp/sbp_target.c +++ /dev/null @@ -1,2350 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * SBP2 target driver (SCSI over IEEE1394 in target mode) - * - * Copyright (C) 2011 Chris Boot - */ - -#define KMSG_COMPONENT "sbp_target" -#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "sbp_target.h" - -/* FireWire address region for management and command block address handlers */ -static const struct fw_address_region sbp_register_region = { - .start = CSR_REGISTER_BASE + 0x1, - .end= 0x1ULL, -}; - -static const u32 sbp_unit_directory_template[] = { - 0x1200609e, /* unit_specifier_id: NCITS/T10 */ - 0x13010483, /* unit_sw_version: 1155D Rev 4 */ - 0x3800609e, /* command_set_specifier_id: NCITS/T10 */ - 0x390104d8, /* command_set: SPC-2 */ - 0x3b00, /* command_set_revision: 0 */ - 0x3c01, /* firmware_revision: 1 */ -}; - -#define SESSION_MAINTENANCE_INTERVAL HZ - -static atomic_t login_id = ATOMIC_INIT(0); - -static void session_maintenance_work(struct work_struct *); -static int sbp_run_transaction(struct fw_card *, int, int, int, int, - unsigned long long, void *, size_t); - -static int read_peer_guid(u64 *guid, const struct sbp_management_request *req) -{ - int ret; - __be32 high, low; - - ret = sbp_run_transaction(req->card, TCODE_READ_Q
Re: [PATCH] sbp-target: add the missed kfree() in an error path
On 28/05/2020 15:53, Bart Van Assche wrote: > On 2020-05-28 03:20, Chuhong Yuan wrote: >> sbp_fetch_command() forgets to call kfree() in an error path. >> Add the missed call to fix it. > > Hi Chris, > > The changelog of the code under drivers/target/sbp makes we wonder > whether this driver has ever had any other users than its original > author. Do you agree with this? If so, do you want to keep this driver > in the kernel tree? Hi Bart, I think you might be right. I also don't have much time to maintain it these days and the hardware I had is long dead. It probably should be removed for everyone's sanity. Best regards, Chris -- Chris Boot bo...@bootc.net
Re: [PATCH] sbp-target: Delete an error message for a failed memory allocation in three functions
On 10/12/2017 19:10, SF Markus Elfring wrote: > From: Markus Elfring <elfr...@users.sourceforge.net> > Date: Sun, 10 Dec 2017 19:54:11 +0100 > > Omit an extra message for a memory allocation failure in these functions. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring <elfr...@users.sourceforge.net> [snip] Looks good to me. Acked-by: Chris Boot <bo...@boo.tc> Thanks, Chris -- Chris Boot bo...@boo.tc
Re: [PATCH] sbp-target: Delete an error message for a failed memory allocation in three functions
On 10/12/2017 19:10, SF Markus Elfring wrote: > From: Markus Elfring > Date: Sun, 10 Dec 2017 19:54:11 +0100 > > Omit an extra message for a memory allocation failure in these functions. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring [snip] Looks good to me. Acked-by: Chris Boot Thanks, Chris -- Chris Boot bo...@boo.tc
BUG/panic in ctnetlink_conntrack_event in 4.8.11
0 [147966.128285] [] ? apic_timer_interrupt+0x82/0x90 [147966.134557][] ? cpuidle_enter_state+0x126/0x2d0 [147966.141555] [] ? cpuidle_enter_state+0x113/0x2d0 [147966.147916] [] ? cpu_startup_entry+0x2a2/0x350 [147966.154103] [] ? start_secondary+0x14d/0x190 [147966.160117] ---[ end trace d5725bb00a2f3d6c ]--- Regards, Chris -- Chris Boot bo...@bootc.net
BUG/panic in ctnetlink_conntrack_event in 4.8.11
0 [147966.128285] [] ? apic_timer_interrupt+0x82/0x90 [147966.134557][] ? cpuidle_enter_state+0x126/0x2d0 [147966.141555] [] ? cpuidle_enter_state+0x113/0x2d0 [147966.147916] [] ? cpu_startup_entry+0x2a2/0x350 [147966.154103] [] ? start_secondary+0x14d/0x190 [147966.160117] ---[ end trace d5725bb00a2f3d6c ]--- Regards, Chris -- Chris Boot bo...@bootc.net
Re: [patch] sbp-target: checking for NULL instead of IS_ERR
On 10/03/16 21:52, Chris Boot wrote: > On 10/03/16 20:56, Chris Boot wrote: >> On 05/03/16 09:33, Nicholas A. Bellinger wrote: >>> On Sat, 2016-03-05 at 08:45 +, Chris Boot wrote: >>>> Are these in linux-next or another branch somewhere I can easily clone >>>> them from? >>> >>> The patch series is in target-pending/for-next. >> >> Hi Nic, >> >> I've just managed to resurrect a test rig for this (the hardware I had >> for it has stopped being usable, yay!), and my initial testing shows the >> updated code panics on the first submitted IO. > > So this isn't the first IO, it's exactly the 2nd IO. I'm hitting > BUG_ON(se_cmd->se_tfo || se_cmd->se_sess) in target_submit_cmd_map_sgls(). > > I'm assuming the se_cmd is being reused due to percpu ida allocator, and > the code must be missing something to clean up the se_cmd sufficiently > once we're done with it. > > At this point I'm out of my depth going through the target core, so I'd > appreciate some pointers to get any further! Replying to myself again... Worked it out after reading the thread about the usb gadget target. Here's the patch you want to squash into your existing series: diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c index a04b0605f8d0..d021997cc837 100644 --- a/drivers/target/sbp/sbp_target.c +++ b/drivers/target/sbp/sbp_target.c @@ -933,6 +933,7 @@ static struct sbp_target_request *sbp_mgt_get_req(struct sbp_session *sess, return ERR_PTR(-ENOMEM); req = &((struct sbp_target_request *)se_sess->sess_cmd_map)[tag]; + memset(req, 0, sizeof(*req)); req->se_cmd.map_tag = tag; req->se_cmd.tag = next_orb; @@ -1619,12 +1620,8 @@ static void sbp_mgt_agent_rw(struct fw_card *card, rcode = RCODE_CONFLICT_ERROR; goto out; } - // XXX: -#if 0 - req = sbp_mgt_get_req(agent->login->sess, card); -#else + req = kzalloc(sizeof(*req), GFP_ATOMIC); -#endif if (!req) { rcode = RCODE_CONFLICT_ERROR; goto out; I hope Thunderbird hasn't mangled this too badly. With this applied, please add this to the patch for sbp_target: Acked-by: Chris Boot <bo...@bootc.net> Thanks, Chris -- Chris Boot bo...@bootc.net
Re: [patch] sbp-target: checking for NULL instead of IS_ERR
On 10/03/16 21:52, Chris Boot wrote: > On 10/03/16 20:56, Chris Boot wrote: >> On 05/03/16 09:33, Nicholas A. Bellinger wrote: >>> On Sat, 2016-03-05 at 08:45 +, Chris Boot wrote: >>>> Are these in linux-next or another branch somewhere I can easily clone >>>> them from? >>> >>> The patch series is in target-pending/for-next. >> >> Hi Nic, >> >> I've just managed to resurrect a test rig for this (the hardware I had >> for it has stopped being usable, yay!), and my initial testing shows the >> updated code panics on the first submitted IO. > > So this isn't the first IO, it's exactly the 2nd IO. I'm hitting > BUG_ON(se_cmd->se_tfo || se_cmd->se_sess) in target_submit_cmd_map_sgls(). > > I'm assuming the se_cmd is being reused due to percpu ida allocator, and > the code must be missing something to clean up the se_cmd sufficiently > once we're done with it. > > At this point I'm out of my depth going through the target core, so I'd > appreciate some pointers to get any further! Replying to myself again... Worked it out after reading the thread about the usb gadget target. Here's the patch you want to squash into your existing series: diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c index a04b0605f8d0..d021997cc837 100644 --- a/drivers/target/sbp/sbp_target.c +++ b/drivers/target/sbp/sbp_target.c @@ -933,6 +933,7 @@ static struct sbp_target_request *sbp_mgt_get_req(struct sbp_session *sess, return ERR_PTR(-ENOMEM); req = &((struct sbp_target_request *)se_sess->sess_cmd_map)[tag]; + memset(req, 0, sizeof(*req)); req->se_cmd.map_tag = tag; req->se_cmd.tag = next_orb; @@ -1619,12 +1620,8 @@ static void sbp_mgt_agent_rw(struct fw_card *card, rcode = RCODE_CONFLICT_ERROR; goto out; } - // XXX: -#if 0 - req = sbp_mgt_get_req(agent->login->sess, card); -#else + req = kzalloc(sizeof(*req), GFP_ATOMIC); -#endif if (!req) { rcode = RCODE_CONFLICT_ERROR; goto out; I hope Thunderbird hasn't mangled this too badly. With this applied, please add this to the patch for sbp_target: Acked-by: Chris Boot Thanks, Chris -- Chris Boot bo...@bootc.net
Re: [patch] sbp-target: checking for NULL instead of IS_ERR
On 10/03/16 20:56, Chris Boot wrote: > On 05/03/16 09:33, Nicholas A. Bellinger wrote: >> On Sat, 2016-03-05 at 08:45 +0000, Chris Boot wrote: >>> Are these in linux-next or another branch somewhere I can easily clone >>> them from? >> >> The patch series is in target-pending/for-next. > > Hi Nic, > > I've just managed to resurrect a test rig for this (the hardware I had > for it has stopped being usable, yay!), and my initial testing shows the > updated code panics on the first submitted IO. So this isn't the first IO, it's exactly the 2nd IO. I'm hitting BUG_ON(se_cmd->se_tfo || se_cmd->se_sess) in target_submit_cmd_map_sgls(). I'm assuming the se_cmd is being reused due to percpu ida allocator, and the code must be missing something to clean up the se_cmd sufficiently once we're done with it. At this point I'm out of my depth going through the target core, so I'd appreciate some pointers to get any further! Thanks, Chris -- Chris Boot bo...@bootc.net
Re: [patch] sbp-target: checking for NULL instead of IS_ERR
On 10/03/16 20:56, Chris Boot wrote: > On 05/03/16 09:33, Nicholas A. Bellinger wrote: >> On Sat, 2016-03-05 at 08:45 +0000, Chris Boot wrote: >>> Are these in linux-next or another branch somewhere I can easily clone >>> them from? >> >> The patch series is in target-pending/for-next. > > Hi Nic, > > I've just managed to resurrect a test rig for this (the hardware I had > for it has stopped being usable, yay!), and my initial testing shows the > updated code panics on the first submitted IO. So this isn't the first IO, it's exactly the 2nd IO. I'm hitting BUG_ON(se_cmd->se_tfo || se_cmd->se_sess) in target_submit_cmd_map_sgls(). I'm assuming the se_cmd is being reused due to percpu ida allocator, and the code must be missing something to clean up the se_cmd sufficiently once we're done with it. At this point I'm out of my depth going through the target core, so I'd appreciate some pointers to get any further! Thanks, Chris -- Chris Boot bo...@bootc.net
Re: [patch] sbp-target: checking for NULL instead of IS_ERR
On 05/03/16 09:33, Nicholas A. Bellinger wrote: > On Sat, 2016-03-05 at 08:45 +0000, Chris Boot wrote: >> Are these in linux-next or another branch somewhere I can easily clone >> them from? > > The patch series is in target-pending/for-next. Hi Nic, I've just managed to resurrect a test rig for this (the hardware I had for it has stopped being usable, yay!), and my initial testing shows the updated code panics on the first submitted IO. I'll go and debug it now and see what I can get from it, but I thought I'd let you know ASAP. Cheers, Chris -- Chris Boot bo...@bootc.net
Re: [patch] sbp-target: checking for NULL instead of IS_ERR
On 05/03/16 09:33, Nicholas A. Bellinger wrote: > On Sat, 2016-03-05 at 08:45 +0000, Chris Boot wrote: >> Are these in linux-next or another branch somewhere I can easily clone >> them from? > > The patch series is in target-pending/for-next. Hi Nic, I've just managed to resurrect a test rig for this (the hardware I had for it has stopped being usable, yay!), and my initial testing shows the updated code panics on the first submitted IO. I'll go and debug it now and see what I can get from it, but I thought I'd let you know ASAP. Cheers, Chris -- Chris Boot bo...@bootc.net
Re: [patch] sbp-target: checking for NULL instead of IS_ERR
On 5 Mar 2016, at 07:33, Nicholas A. Bellinger <n...@linux-iscsi.org> wrote: > > Hi Dan + BootC, > > On Wed, 2016-03-02 at 13:09 +0300, Dan Carpenter wrote: >> We changed this from kzalloc to sbp_mgt_get_req() so we need to change >> from checking for NULL to check for error pointers. >> >> Fixes: c064b2a78989 ('sbp-target: Conversion to percpu_ida tag >> pre-allocation') >> Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com> >> >> diff --git a/drivers/target/sbp/sbp_target.c >> b/drivers/target/sbp/sbp_target.c >> index 251d532..a04b0605f 100644 >> --- a/drivers/target/sbp/sbp_target.c >> +++ b/drivers/target/sbp/sbp_target.c >> @@ -951,7 +951,7 @@ static void tgt_agent_fetch_work(struct work_struct >> *work) >> >> while (next_orb && tgt_agent_check_active(agent)) { >> req = sbp_mgt_get_req(sess, sess->card, next_orb); >> -if (!req) { >> +if (IS_ERR(req)) { >> spin_lock_bh(>lock); >> agent->state = AGENT_STATE_DEAD; >> spin_unlock_bh(>lock); > > Fixed + folded into the original patch. > > Thanks Dan. > > Chris, would you be so kind to review the original changes here: > > sbp-target: Conversion to percpu_ida tag pre-allocation > http://www.spinics.net/lists/target-devel/msg11778.html > > sbp-target: Convert to TARGET_SCF_ACK_KREF I/O krefs > http://www.spinics.net/lists/target-devel/msg11780.html > > and verify on your local IEEE1394 target setup..? Hi Nic, Dan, I’m away this weekend so I can’t test these for a few days at least, unfortunately. I must admit I only vaguely follow the changes here as I haven’t been keeping up with the pace of change in target-devel lately, but it generally looks OK I think. Are these in linux-next or another branch somewhere I can easily clone them from? How soon do you need my ACK/NAK on these? Cheers, Chris -- Chris Boot bo...@bootc.net
Re: [patch] sbp-target: checking for NULL instead of IS_ERR
On 5 Mar 2016, at 07:33, Nicholas A. Bellinger wrote: > > Hi Dan + BootC, > > On Wed, 2016-03-02 at 13:09 +0300, Dan Carpenter wrote: >> We changed this from kzalloc to sbp_mgt_get_req() so we need to change >> from checking for NULL to check for error pointers. >> >> Fixes: c064b2a78989 ('sbp-target: Conversion to percpu_ida tag >> pre-allocation') >> Signed-off-by: Dan Carpenter >> >> diff --git a/drivers/target/sbp/sbp_target.c >> b/drivers/target/sbp/sbp_target.c >> index 251d532..a04b0605f 100644 >> --- a/drivers/target/sbp/sbp_target.c >> +++ b/drivers/target/sbp/sbp_target.c >> @@ -951,7 +951,7 @@ static void tgt_agent_fetch_work(struct work_struct >> *work) >> >> while (next_orb && tgt_agent_check_active(agent)) { >> req = sbp_mgt_get_req(sess, sess->card, next_orb); >> -if (!req) { >> +if (IS_ERR(req)) { >> spin_lock_bh(>lock); >> agent->state = AGENT_STATE_DEAD; >> spin_unlock_bh(>lock); > > Fixed + folded into the original patch. > > Thanks Dan. > > Chris, would you be so kind to review the original changes here: > > sbp-target: Conversion to percpu_ida tag pre-allocation > http://www.spinics.net/lists/target-devel/msg11778.html > > sbp-target: Convert to TARGET_SCF_ACK_KREF I/O krefs > http://www.spinics.net/lists/target-devel/msg11780.html > > and verify on your local IEEE1394 target setup..? Hi Nic, Dan, I’m away this weekend so I can’t test these for a few days at least, unfortunately. I must admit I only vaguely follow the changes here as I haven’t been keeping up with the pace of change in target-devel lately, but it generally looks OK I think. Are these in linux-next or another branch somewhere I can easily clone them from? How soon do you need my ACK/NAK on these? Cheers, Chris -- Chris Boot bo...@bootc.net
qla2xxx firmware crashes in target mode
4 Gbps). [484976.448002] qla2xxx [:05:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2. HTH, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
qla2xxx firmware crashes in target mode
4 Gbps). [484976.448002] qla2xxx [:05:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2. HTH, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Panic on 3.10.18 in nf_conntrack_sip with IPv6
? nf_hook_thresh.constprop.36+0x2e/0x33 > [ 799.941898] [] ? nf_hook_thresh.constprop.36+0x2e/0x33 > [ 799.963582] [] ? ip6_output+0x7a/0x83 > [ 799.983090] [] ? ip6_forward+0x5fd/0x69e > [ 800.001437] [] ? pskb_may_pull+0x2d/0x2d > [ 800.019612] [] ? pskb_may_pull+0x2d/0x2d > [ 800.036639] [] ? __ipv6_conntrack_in+0xc4/0x13f > [nf_conntrack_ipv6] > [ 800.057257] [] ? nf_iterate+0x42/0x80 > [ 800.075044] [] ? nf_hook_slow+0x69/0x100 > [ 800.092089] [] ? pskb_may_pull+0x2d/0x2d > [ 800.108860] [] ? pskb_may_pull+0x2d/0x2d > [ 800.353154] [] ? nf_ct_frag6_output+0x9f/0xe8 > [nf_defrag_ipv6] > [ 800.371387] [] ? pskb_may_pull+0x2d/0x2d > [ 800.387677] [] ? ipv6_defrag+0xbb/0xcf [nf_defrag_ipv6] > [ 800.406280] [] ? pskb_may_pull+0x2d/0x2d > [ 800.424998] [] ? nf_iterate+0x42/0x80 > [ 800.441318] [] ? nf_hook_slow+0x69/0x100 > [ 800.457694] [] ? pskb_may_pull+0x2d/0x2d > [ 800.474649] [] ? nf_hook_thresh.constprop.13+0x34/0x39 > [ 800.495046] [] ? ipv6_rcv+0x2bb/0x30b > [ 800.511896] [] ? __netif_receive_skb_core+0x437/0x4af > [ 800.532539] [] ? netif_receive_skb+0x42/0x73 > [ 800.551414] [] ? napi_gro_receive+0x35/0x76 > [ 800.568152] [] ? e1000_clean_rx_irq+0x249/0x2cb [e1000e] > [ 800.589151] [] ? e1000e_poll+0x65/0x203 [e1000e] > [ 800.606255] [] ? ktime_get+0x5f/0x6b > [ 800.622019] [] ? net_rx_action+0xa7/0x1d9 > [ 800.640555] [] ? _raw_spin_unlock_irqrestore+0xc/0xd > [ 800.658116] [] ? add_interrupt_randomness+0x39/0x16f > [ 800.677242] [] ? __do_softirq+0xe4/0x1f9 > [ 800.696819] [] ? call_softirq+0x1c/0x30 > [ 800.713258] [] ? do_softirq+0x3a/0x78 > [ 800.731145] [] ? irq_exit+0x3f/0x83 > [ 800.747466] [] ? do_IRQ+0x81/0x97 > [ 800.763133] [] ? common_interrupt+0x6d/0x6d > [ 800.780351] > [ 800.782499] [] ? clockevents_program_event+0x9a/0xb6 > [ 800.813469] [] ? arch_local_irq_enable+0x4/0x8 > [ 800.831484] [] ? cpuidle_enter_state+0x46/0xb1 > [ 800.849729] [] ? cpuidle_idle_call+0xcf/0x126 > [ 800.869185] [] ? arch_cpu_idle+0x6/0x1a > [ 800.885493] [] ? cpu_startup_entry+0x106/0x169 > [ 800.902532] [] ? start_kernel+0x3d7/0x3e2 > [ 800.922455] [] ? repair_env_string+0x57/0x57 > [ 800.939302] [] ? x86_64_start_kernel+0xf2/0xfd > [ 800.956528] Code: c3 41 57 41 56 41 55 41 54 55 53 48 89 fb 55 8b 87 dc 00 > 00 00 89 f5 01 f0 01 c2 85 f6 79 02 0f 0b 8b 87 f4 00 00 00 ff c8 74 02 <0f> > 0b 83 c2 3f 89 c8 41 89 cd 80 cc 20 83 e2 c0 f6 87 b2 00 00 > [ 801.015687] RIP [] pskb_expand_head+0x2a/0x1e1 > [ 801.034404] RSP > [ 801.049813] ---[ end trace a0ea98f51afb8cc0 ]--- > [ 801.454124] Kernel panic - not syncing: Fatal exception in interrupt > [ 801.474385] Rebooting in 120 seconds.. The O taint is due to loading LinBIT's drbd module. The crash occurs even without this, and also in a 3.7.10 kernel that I was using before. Cheers, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Panic on 3.10.18 in nf_conntrack_sip with IPv6
] [813419ac] ? nf_hook_thresh.constprop.36+0x2e/0x33 [ 799.963582] [81344437] ? ip6_output+0x7a/0x83 [ 799.983090] [81343a10] ? ip6_forward+0x5fd/0x69e [ 800.001437] [8134446d] ? pskb_may_pull+0x2d/0x2d [ 800.019612] [8134446d] ? pskb_may_pull+0x2d/0x2d [ 800.036639] [a04746ac] ? __ipv6_conntrack_in+0xc4/0x13f [nf_conntrack_ipv6] [ 800.057257] [812f201a] ? nf_iterate+0x42/0x80 [ 800.075044] [812f20c1] ? nf_hook_slow+0x69/0x100 [ 800.092089] [8134446d] ? pskb_may_pull+0x2d/0x2d [ 800.108860] [8134446d] ? pskb_may_pull+0x2d/0x2d [ 800.353154] [a046bc5a] ? nf_ct_frag6_output+0x9f/0xe8 [nf_defrag_ipv6] [ 800.371387] [8134446d] ? pskb_may_pull+0x2d/0x2d [ 800.387677] [a046b0bc] ? ipv6_defrag+0xbb/0xcf [nf_defrag_ipv6] [ 800.406280] [8134446d] ? pskb_may_pull+0x2d/0x2d [ 800.424998] [812f201a] ? nf_iterate+0x42/0x80 [ 800.441318] [812f20c1] ? nf_hook_slow+0x69/0x100 [ 800.457694] [8134446d] ? pskb_may_pull+0x2d/0x2d [ 800.474649] [813445b9] ? nf_hook_thresh.constprop.13+0x34/0x39 [ 800.495046] [81344b43] ? ipv6_rcv+0x2bb/0x30b [ 800.511896] [812cea5d] ? __netif_receive_skb_core+0x437/0x4af [ 800.532539] [812ceca1] ? netif_receive_skb+0x42/0x73 [ 800.551414] [812cf419] ? napi_gro_receive+0x35/0x76 [ 800.568152] [a012e20b] ? e1000_clean_rx_irq+0x249/0x2cb [e1000e] [ 800.589151] [a0131698] ? e1000e_poll+0x65/0x203 [e1000e] [ 800.606255] [810742f4] ? ktime_get+0x5f/0x6b [ 800.622019] [812cf1b8] ? net_rx_action+0xa7/0x1d9 [ 800.640555] [8139238c] ? _raw_spin_unlock_irqrestore+0xc/0xd [ 800.658116] [812730de] ? add_interrupt_randomness+0x39/0x16f [ 800.677242] [8104244a] ? __do_softirq+0xe4/0x1f9 [ 800.696819] [81398bdc] ? call_softirq+0x1c/0x30 [ 800.713258] [8100e9ee] ? do_softirq+0x3a/0x78 [ 800.731145] [8104262a] ? irq_exit+0x3f/0x83 [ 800.747466] [8100e6ff] ? do_IRQ+0x81/0x97 [ 800.763133] [8139262d] ? common_interrupt+0x6d/0x6d [ 800.780351] EOI [ 800.782499] [81078ffb] ? clockevents_program_event+0x9a/0xb6 [ 800.813469] [812a8110] ? arch_local_irq_enable+0x4/0x8 [ 800.831484] [812a84db] ? cpuidle_enter_state+0x46/0xb1 [ 800.849729] [812a8615] ? cpuidle_idle_call+0xcf/0x126 [ 800.869185] [81013b3b] ? arch_cpu_idle+0x6/0x1a [ 800.885493] [81073255] ? cpu_startup_entry+0x106/0x169 [ 800.902532] [816b5d40] ? start_kernel+0x3d7/0x3e2 [ 800.922455] [816b577f] ? repair_env_string+0x57/0x57 [ 800.939302] [816b559a] ? x86_64_start_kernel+0xf2/0xfd [ 800.956528] Code: c3 41 57 41 56 41 55 41 54 55 53 48 89 fb 55 8b 87 dc 00 00 00 89 f5 01 f0 01 c2 85 f6 79 02 0f 0b 8b 87 f4 00 00 00 ff c8 74 02 0f 0b 83 c2 3f 89 c8 41 89 cd 80 cc 20 83 e2 c0 f6 87 b2 00 00 [ 801.015687] RIP [812c5b22] pskb_expand_head+0x2a/0x1e1 [ 801.034404] RSP 88043fc037c0 [ 801.049813] ---[ end trace a0ea98f51afb8cc0 ]--- [ 801.454124] Kernel panic - not syncing: Fatal exception in interrupt [ 801.474385] Rebooting in 120 seconds.. The O taint is due to loading LinBIT's drbd module. The crash occurs even without this, and also in a 3.7.10 kernel that I was using before. Cheers, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
On 26/06/2013 23:17, David Miller wrote: > From: Chris Boot > Date: Thu, 20 Jun 2013 21:36:44 +0100 > >> On 06/06/2013 09:38, Timo Teras wrote: >>> On Thu, 06 Jun 2013 08:47:56 +0100 >>> Chris Boot wrote: >>> >>>> On 06/06/13 02:24, Fan Du wrote: >>>>> Hello Chris/Jean >>>>> >>>>> This issue might have already been fixed by this: >>>>> https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/net/xfrm/xfrm_output.c?id=497574c72c9922cf20c12aed15313c389f722fa0 >>>>> >>>>> >>>>> Hope it helps. >>>> >>>> Hi Fan, Jean, >>>> >>>> Thanks, that looks like it's the patch for exactly my problem. >>>> Unfortunately I can't test it until next week now. :-/ >>>> >>>> Timo/Dave: are there any plans to push this into 3.10-rc and/or >>>> stable? I seem to be able to hit the issue pretty reliably. >>> >>> It is already present in 3.10-rc3 [1], and Dave has it queued for >>> 3.9-stable [2]. >>> >>> - Timo >>> >>> [1] http://lwn.net/Articles/551922/ >>> [2] http://patchwork.ozlabs.org/patch/245594/ >> >> I'm just wondering if this patch has got lost in the cracks; I reported >> the issue in 3.9.4 and 3.9.7 is just out without any sign of it. Have I >> missed something? > > It got submitted to -stable last week. Dave, Thank you, I see it's in 3.9.8 that has been just released. Cheers, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
On 26/06/2013 23:17, David Miller wrote: From: Chris Boot bo...@bootc.net Date: Thu, 20 Jun 2013 21:36:44 +0100 On 06/06/2013 09:38, Timo Teras wrote: On Thu, 06 Jun 2013 08:47:56 +0100 Chris Boot bo...@bootc.net wrote: On 06/06/13 02:24, Fan Du wrote: Hello Chris/Jean This issue might have already been fixed by this: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/net/xfrm/xfrm_output.c?id=497574c72c9922cf20c12aed15313c389f722fa0 Hope it helps. Hi Fan, Jean, Thanks, that looks like it's the patch for exactly my problem. Unfortunately I can't test it until next week now. :-/ Timo/Dave: are there any plans to push this into 3.10-rc and/or stable? I seem to be able to hit the issue pretty reliably. It is already present in 3.10-rc3 [1], and Dave has it queued for 3.9-stable [2]. - Timo [1] http://lwn.net/Articles/551922/ [2] http://patchwork.ozlabs.org/patch/245594/ I'm just wondering if this patch has got lost in the cracks; I reported the issue in 3.9.4 and 3.9.7 is just out without any sign of it. Have I missed something? It got submitted to -stable last week. Dave, Thank you, I see it's in 3.9.8 that has been just released. Cheers, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
On 06/06/2013 09:38, Timo Teras wrote: > On Thu, 06 Jun 2013 08:47:56 +0100 > Chris Boot wrote: > >> On 06/06/13 02:24, Fan Du wrote: >>> Hello Chris/Jean >>> >>> This issue might have already been fixed by this: >>> https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/net/xfrm/xfrm_output.c?id=497574c72c9922cf20c12aed15313c389f722fa0 >>> >>> >>> Hope it helps. >> >> Hi Fan, Jean, >> >> Thanks, that looks like it's the patch for exactly my problem. >> Unfortunately I can't test it until next week now. :-/ >> >> Timo/Dave: are there any plans to push this into 3.10-rc and/or >> stable? I seem to be able to hit the issue pretty reliably. > > It is already present in 3.10-rc3 [1], and Dave has it queued for > 3.9-stable [2]. > > - Timo > > [1] http://lwn.net/Articles/551922/ > [2] http://patchwork.ozlabs.org/patch/245594/ Hi folks, I'm just wondering if this patch has got lost in the cracks; I reported the issue in 3.9.4 and 3.9.7 is just out without any sign of it. Have I missed something? Thanks, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
On 06/06/2013 09:38, Timo Teras wrote: On Thu, 06 Jun 2013 08:47:56 +0100 Chris Boot bo...@bootc.net wrote: On 06/06/13 02:24, Fan Du wrote: Hello Chris/Jean This issue might have already been fixed by this: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/net/xfrm/xfrm_output.c?id=497574c72c9922cf20c12aed15313c389f722fa0 Hope it helps. Hi Fan, Jean, Thanks, that looks like it's the patch for exactly my problem. Unfortunately I can't test it until next week now. :-/ Timo/Dave: are there any plans to push this into 3.10-rc and/or stable? I seem to be able to hit the issue pretty reliably. It is already present in 3.10-rc3 [1], and Dave has it queued for 3.9-stable [2]. - Timo [1] http://lwn.net/Articles/551922/ [2] http://patchwork.ozlabs.org/patch/245594/ Hi folks, I'm just wondering if this patch has got lost in the cracks; I reported the issue in 3.9.4 and 3.9.7 is just out without any sign of it. Have I missed something? Thanks, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
On 06/06/13 09:38, Timo Teras wrote: > On Thu, 06 Jun 2013 08:47:56 +0100 > Chris Boot wrote: > >> On 06/06/13 02:24, Fan Du wrote: >>> Hello Chris/Jean >>> >>> This issue might have already been fixed by this: >>> https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/net/xfrm/xfrm_output.c?id=497574c72c9922cf20c12aed15313c389f722fa0 >>> >>> >>> Hope it helps. >> >> Hi Fan, Jean, >> >> Thanks, that looks like it's the patch for exactly my problem. >> Unfortunately I can't test it until next week now. :-/ >> >> Timo/Dave: are there any plans to push this into 3.10-rc and/or >> stable? I seem to be able to hit the issue pretty reliably. > > It is already present in 3.10-rc3 [1], and Dave has it queued for > 3.9-stable [2]. > > - Timo > > [1] http://lwn.net/Articles/551922/ > [2] http://patchwork.ozlabs.org/patch/245594/ Thank you! Cheers, Cheers -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
On 06/06/13 02:24, Fan Du wrote: > Hello Chris/Jean > > This issue might have already been fixed by this: > https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/net/xfrm/xfrm_output.c?id=497574c72c9922cf20c12aed15313c389f722fa0 > > > Hope it helps. Hi Fan, Jean, Thanks, that looks like it's the patch for exactly my problem. Unfortunately I can't test it until next week now. :-/ Timo/Dave: are there any plans to push this into 3.10-rc and/or stable? I seem to be able to hit the issue pretty reliably. Thanks, Chris > On 2013年06月06日 09:04, Jean Sacren wrote: >> From: Chris Boot >> Date: Wed, 05 Jun 2013 22:47:48 +0100 >>> >>> Hi folks, >>> >>> I have a re-purposed Watchguard Firebox running Debian GNU/Linux with a >>> self-built vanilla 3.9.4 kernel. I have an IPsec tunnel up to a remote >>> router through which I was passing a fair bit of traffic when I hit the >>> following panic: >>> >>> [486832.949560] BUG: unable to handle kernel NULL pointer dereference at >>> 0010 >>> [486832.953431] IP: [] xfrm_output_resume+0x61/0x29f >>> [486832.953431] *pde = >>> [486832.953431] Oops: [#1] >>> [486832.953431] Modules linked in: xt_realm xt_nat authenc esp4 >>> xfrm4_mode_tunnel tun ip6table_nat nf_nat_ipv6 sch_fq_codel xt_statistic >>> xt_CT xt_LOG xt_connlimit xt_recent xt_time xt_TCPMSS xt_sctp >>> ip6t_REJECT pppoe deflate zlib_deflate pppox ctr twofish_generic >>> twofish_i586 twofish_common camellia_generic serpent_sse2_i586 xts >>> serpent_generic lrw gf128mul glue_helper ablk_helper cryptd >>> blowfish_generic blowfish_common cast5_generic cast_common des_generic >>> cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic hmac >>> crypto_null af_key xfrm_algo xt_comment xt_addrtype xt_policy >>> ip_set_hash_ip ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP >>> ipt_ah act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio >>> sch_htb sch_hfsc sch_ingress sch_sfq xt_set ip_set nf_nat_tftp >>> nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp >>> nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp >>> nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip >>> nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp >>> nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns >>> nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 >>> nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype >>> xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport >>> xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit >>> xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY xt_AUDIT xt_state >>> nfnetlink bridge 8021q garp stp mrp llc ppp_generic slhc >>> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_raw >>> ip6table_filter ip6_tables xt_tcpudp xt_conntrack iptable_mangle >>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat >>> nf_conntrack iptable_raw iptable_filter ip_tables x_tables w83627hf >>> hwmon_vid loop iTCO_wdt iTCO_vendor_support evdev snd_pcm snd_page_alloc >>> snd_timer snd soundcore acpi_cpufreq mperf processor pcspkr serio_raw >>> drm_kms_helper lpc_ich i2c_i801 of_i2c drm rng_core thermal_sys >>> i2c_algo_bit ehci_pci i2c_core ext4 crc16 jbd2 mbcache dm_mod sg sd_mod >>> crc_t10dif ata_generic ata_piix uhci_hcd ehci_hcd libata microcode >>> scsi_mod skge sky2 usbcore usb_common >>> [486832.953431] Pid: 0, comm: swapper Not tainted 3.9.4-1-bootc #1 >>> [486832.953431] EIP: 0060:[] EFLAGS: 00210246 CPU: 0 >>> [486832.953431] EIP is at xfrm_output_resume+0x61/0x29f >>> [486832.953431] EAX: EBX: f3fbc100 ECX: f77f1288 EDX: f6130200 >>> [486832.953431] ESI: 0016 EDI: EBP: f70b3c00 ESP: c1407c44 >>> [486832.953431] DS: 007b ES: 007b FS: GS: 00e0 SS: 0068 >>> [486832.953431] CR0: 8005003b CR2: 0010 CR3: 37247000 CR4: 07d0 >>> [486832.953431] DR0: DR1: DR2: DR3: >>> [486832.953431] DR6: 0ff0 DR7: 0400 >>> [486832.953431] Process swapper (pid: 0, ti=c1406000 task=c1413490 >>> task.ti=c1406000) >>> [486832.953431] Stack: >>> [486832.953431] c129d44f 8000 0002 c1457254 f3fbc100 c129d44f >>> 0008 >>> [486832.953431] c129d49e f4524000 c129d44f 8000 >>> f3fbc100 c1268b49 >>> [486832.953431] f3fbc100 f127604
Re: PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
] [486832.953431] [f8499310] ? NF_HOOK_THRESH+0x1d/0x4c [bridge] [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [f849983e] ? br_nf_pre_routing_finish+0x1c8/0x1d2 [bridge] [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [c12635d1] ? nf_hook_slow+0x52/0xed [486832.953431] [f8499676] ? nf_bridge_alloc.isra.18+0x32/0x32 [bridge] [486832.953431] [f8499676] ? nf_bridge_alloc.isra.18+0x32/0x32 [bridge] [486832.953431] [f8499310] ? NF_HOOK_THRESH+0x1d/0x4c [bridge] [486832.953431] [f8499676] ? nf_bridge_alloc.isra.18+0x32/0x32 [bridge] [486832.953431] [f849a1c0] ? br_nf_pre_routing+0x32c/0x33f [bridge] [486832.953431] [f8499676] ? nf_bridge_alloc.isra.18+0x32/0x32 [bridge] [486832.953431] [c1263552] ? nf_iterate+0x3c/0x69 [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [c12635d1] ? nf_hook_slow+0x52/0xed [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [f84952fa] ? nf_hook_thresh.constprop.10+0x36/0x42 [bridge] [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [f8495746] ? br_handle_frame+0x18f/0x1b5 [bridge] [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [f84955b7] ? br_handle_frame_finish+0x264/0x264 [bridge] [486832.953431] [c12467a8] ? __netif_receive_skb_core+0x2b5/0x406 [486832.953431] [c1051a58] ? __getnstimeofday+0x17/0x52 [486832.953431] [c1051a00] ? get_monotonic_boottime+0x73/0x92 [486832.953431] [c124704f] ? napi_gro_receive+0x2e/0x69 [486832.953431] [c10053d8] ? __stop_machine.isra.0.constprop.1+0x27/0x27 [486832.953431] [f80792d7] ? sky2_poll+0x6d8/0x8f3 [sky2] [486832.953431] [c1006058] ? native_sched_clock+0x40/0x98 [486832.953431] [c1006058] ? native_sched_clock+0x40/0x98 [486832.953431] [c1005962] ? paravirt_sched_clock+0x8/0xb [486832.953431] [c1006058] ? native_sched_clock+0x40/0x98 [486832.953431] [c1246bbf] ? net_rx_action+0x6e/0x180 [486832.953431] [c1005962] ? paravirt_sched_clock+0x8/0xb [486832.953431] [c102ca5a] ? __do_softirq+0xa5/0x19e [486832.953431] [c102cbfa] ? irq_exit+0x36/0x69 [486832.953431] [c100326b] ? do_IRQ+0x6e/0x81 [486832.953431] [c12e4cf3] ? common_interrupt+0x33/0x38 [486832.953431] [c101df1b] ? native_safe_halt+0x2/0x3 [486832.953431] [c1006b2f] ? default_idle+0x23/0x3e [486832.953431] [c10070cd] ? cpu_idle+0x75/0x8f [486832.953431] [c145996b] ? start_kernel+0x34e/0x353 [486832.953431] [c1459465] ? repair_env_string+0x4d/0x4d [486832.953431] Code: f9 ff 8b 43 74 c7 43 70 00 00 00 00 85 c0 74 0e ff 08 0f 94 c2 84 d2 74 05 e8 c7 6b e1 ff 8b 43 48 c7 43 74 00 00 00 00 83 e0 fe8b 50 10 89 d8 ff 52 34 83 f8 01 89 c7 0f 85 21 02 00 00 8b 53 [486832.953431] EIP: [c12a4dd0] xfrm_output_resume+0x61/0x29f SS:ESP 0068:c1407c44 [486832.953431] CR2: 0010 [486833.573872] ---[ end trace ed321ebdc197b3d7 ]--- [486833.578576] Kernel panic - not syncing: Fatal exception in interrupt [486833.582572] Rebooting in 60 seconds.. (gdb) list *xfrm_output_resume+0x61 0xc12a4dd0 is in xfrm_output_resume (net/xfrm/xfrm_output.c:125). 120 int xfrm_output_resume(struct sk_buff *skb, int err) 121 { 122 while (likely((err = xfrm_output_one(skb, err)) == 0)) { 123 nf_reset(skb); 124 125 err = skb_dst(skb)-ops-local_out(skb); 126 if (unlikely(err != 1)) 127 goto out; 128 129 if (!skb_dst(skb)-xfrm) Try this: diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index bcfda89..0cf003d 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -64,6 +64,7 @@ static int xfrm_output_one(struct sk_buff *skb, int err) if (unlikely(x-km.state != XFRM_STATE_VALID)) { XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEINVALID); +err = -EINVAL; goto error; } -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
On 06/06/13 09:38, Timo Teras wrote: On Thu, 06 Jun 2013 08:47:56 +0100 Chris Boot bo...@bootc.net wrote: On 06/06/13 02:24, Fan Du wrote: Hello Chris/Jean This issue might have already been fixed by this: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/net/xfrm/xfrm_output.c?id=497574c72c9922cf20c12aed15313c389f722fa0 Hope it helps. Hi Fan, Jean, Thanks, that looks like it's the patch for exactly my problem. Unfortunately I can't test it until next week now. :-/ Timo/Dave: are there any plans to push this into 3.10-rc and/or stable? I seem to be able to hit the issue pretty reliably. It is already present in 3.10-rc3 [1], and Dave has it queued for 3.9-stable [2]. - Timo [1] http://lwn.net/Articles/551922/ [2] http://patchwork.ozlabs.org/patch/245594/ Thank you! Cheers, Cheers -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
] [] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [] ? nf_hook_thresh.constprop.10+0x36/0x42 [bridge] [486832.953431] [] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [] ? br_handle_frame+0x18f/0x1b5 [bridge] [486832.953431] [] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [] ? br_handle_frame_finish+0x264/0x264 [bridge] [486832.953431] [] ? __netif_receive_skb_core+0x2b5/0x406 [486832.953431] [] ? __getnstimeofday+0x17/0x52 [486832.953431] [] ? get_monotonic_boottime+0x73/0x92 [486832.953431] [] ? napi_gro_receive+0x2e/0x69 [486832.953431] [] ? __stop_machine.isra.0.constprop.1+0x27/0x27 [486832.953431] [] ? sky2_poll+0x6d8/0x8f3 [sky2] [486832.953431] [] ? native_sched_clock+0x40/0x98 [486832.953431] [] ? native_sched_clock+0x40/0x98 [486832.953431] [] ? paravirt_sched_clock+0x8/0xb [486832.953431] [] ? native_sched_clock+0x40/0x98 [486832.953431] [] ? net_rx_action+0x6e/0x180 [486832.953431] [] ? paravirt_sched_clock+0x8/0xb [486832.953431] [] ? __do_softirq+0xa5/0x19e [486832.953431] [] ? irq_exit+0x36/0x69 [486832.953431] [] ? do_IRQ+0x6e/0x81 [486832.953431] [] ? common_interrupt+0x33/0x38 [486832.953431] [] ? native_safe_halt+0x2/0x3 [486832.953431] [] ? default_idle+0x23/0x3e [486832.953431] [] ? cpu_idle+0x75/0x8f [486832.953431] [] ? start_kernel+0x34e/0x353 [486832.953431] [] ? repair_env_string+0x4d/0x4d [486832.953431] Code: f9 ff 8b 43 74 c7 43 70 00 00 00 00 85 c0 74 0e ff 08 0f 94 c2 84 d2 74 05 e8 c7 6b e1 ff 8b 43 48 c7 43 74 00 00 00 00 83 e0 fe <8b> 50 10 89 d8 ff 52 34 83 f8 01 89 c7 0f 85 21 02 00 00 8b 53 [486832.953431] EIP: [] xfrm_output_resume+0x61/0x29f SS:ESP 0068:c1407c44 [486832.953431] CR2: 0010 [486833.573872] ---[ end trace ed321ebdc197b3d7 ]--- [486833.578576] Kernel panic - not syncing: Fatal exception in interrupt [486833.582572] Rebooting in 60 seconds.. (gdb) list *xfrm_output_resume+0x61 0xc12a4dd0 is in xfrm_output_resume (net/xfrm/xfrm_output.c:125). 120 int xfrm_output_resume(struct sk_buff *skb, int err) 121 { 122 while (likely((err = xfrm_output_one(skb, err)) == 0)) { 123 nf_reset(skb); 124 125 err = skb_dst(skb)->ops->local_out(skb); 126 if (unlikely(err != 1)) 127 goto out; 128 129 if (!skb_dst(skb)->xfrm) Not knowing anything much about networking in the kernel I can't go any further, but I'm happy to try out patches and poke around with a little guidance. I should add that the box doesn't reboot after 60 seconds and the watchdog doesn't seem to kick in either, but that's clearly not a networking issue. It reboots fine with the 'reboot' command. Cheers, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PANIC at net/xfrm/xfrm_output.c:125 (3.9.4)
] ? nf_bridge_alloc.isra.18+0x32/0x32 [bridge] [486832.953431] [c1263552] ? nf_iterate+0x3c/0x69 [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [c12635d1] ? nf_hook_slow+0x52/0xed [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [f84952fa] ? nf_hook_thresh.constprop.10+0x36/0x42 [bridge] [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [f8495746] ? br_handle_frame+0x18f/0x1b5 [bridge] [486832.953431] [f8495353] ? br_handle_local_finish+0x4d/0x4d [bridge] [486832.953431] [f84955b7] ? br_handle_frame_finish+0x264/0x264 [bridge] [486832.953431] [c12467a8] ? __netif_receive_skb_core+0x2b5/0x406 [486832.953431] [c1051a58] ? __getnstimeofday+0x17/0x52 [486832.953431] [c1051a00] ? get_monotonic_boottime+0x73/0x92 [486832.953431] [c124704f] ? napi_gro_receive+0x2e/0x69 [486832.953431] [c10053d8] ? __stop_machine.isra.0.constprop.1+0x27/0x27 [486832.953431] [f80792d7] ? sky2_poll+0x6d8/0x8f3 [sky2] [486832.953431] [c1006058] ? native_sched_clock+0x40/0x98 [486832.953431] [c1006058] ? native_sched_clock+0x40/0x98 [486832.953431] [c1005962] ? paravirt_sched_clock+0x8/0xb [486832.953431] [c1006058] ? native_sched_clock+0x40/0x98 [486832.953431] [c1246bbf] ? net_rx_action+0x6e/0x180 [486832.953431] [c1005962] ? paravirt_sched_clock+0x8/0xb [486832.953431] [c102ca5a] ? __do_softirq+0xa5/0x19e [486832.953431] [c102cbfa] ? irq_exit+0x36/0x69 [486832.953431] [c100326b] ? do_IRQ+0x6e/0x81 [486832.953431] [c12e4cf3] ? common_interrupt+0x33/0x38 [486832.953431] [c101df1b] ? native_safe_halt+0x2/0x3 [486832.953431] [c1006b2f] ? default_idle+0x23/0x3e [486832.953431] [c10070cd] ? cpu_idle+0x75/0x8f [486832.953431] [c145996b] ? start_kernel+0x34e/0x353 [486832.953431] [c1459465] ? repair_env_string+0x4d/0x4d [486832.953431] Code: f9 ff 8b 43 74 c7 43 70 00 00 00 00 85 c0 74 0e ff 08 0f 94 c2 84 d2 74 05 e8 c7 6b e1 ff 8b 43 48 c7 43 74 00 00 00 00 83 e0 fe 8b 50 10 89 d8 ff 52 34 83 f8 01 89 c7 0f 85 21 02 00 00 8b 53 [486832.953431] EIP: [c12a4dd0] xfrm_output_resume+0x61/0x29f SS:ESP 0068:c1407c44 [486832.953431] CR2: 0010 [486833.573872] ---[ end trace ed321ebdc197b3d7 ]--- [486833.578576] Kernel panic - not syncing: Fatal exception in interrupt [486833.582572] Rebooting in 60 seconds.. (gdb) list *xfrm_output_resume+0x61 0xc12a4dd0 is in xfrm_output_resume (net/xfrm/xfrm_output.c:125). 120 int xfrm_output_resume(struct sk_buff *skb, int err) 121 { 122 while (likely((err = xfrm_output_one(skb, err)) == 0)) { 123 nf_reset(skb); 124 125 err = skb_dst(skb)-ops-local_out(skb); 126 if (unlikely(err != 1)) 127 goto out; 128 129 if (!skb_dst(skb)-xfrm) Not knowing anything much about networking in the kernel I can't go any further, but I'm happy to try out patches and poke around with a little guidance. I should add that the box doesn't reboot after 60 seconds and the watchdog doesn't seem to kick in either, but that's clearly not a networking issue. It reboots fine with the 'reboot' command. Cheers, Chris -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
drbd: kernels 3.7 => 3.8 broken userspace compatibility
Hi all, I upgraded from a 3.7.x kernel to a 3.8.x kernel on a test machine running DRBD, and found myself unable to bring up my DRBD devices. I'm using the 8.3.13 userspace tools as shipped in Debian Wheezy, which work fine on the 3.7 kernel, but they appear to hang when using the 3.8 kernel and cannot set up the device. The 3.8 kernel appears to introduce drbd 8.4.2 rather than the 8.3.13 available in 3.7. The hang seems to be caused by lots of the following: [pid 7631] socket(PF_NETLINK, SOCK_DGRAM, 11) = 8 [pid 7631] getpid()= 7631 [pid 7631] bind(8, {sa_family=AF_NETLINK, pid=7631, groups=}, 12) = 0 [pid 7631] sendto(8, "4\0\0\0\3\0\0\0\1\0\0\0\317\35\0\0\4\0\0\0\1\0\0\0\1\0\0\0\317\35\0\0"..., 52, 0, NULL, 0) = 52 [pid 7631] poll([{fd=8, events=POLLIN}], 1, 12 [pid 7630] <... read resumed> 0x7fff6d011a30, 1024) = ? ERESTARTSYS (To be restarted) [pid 7630] --- SIGALRM (Alarm clock) @ 0 (0) --- [pid 7630] rt_sigreturn(0xe) = -1 EINTR (Interrupted system call) [pid 7630] close(8)= 0 [pid 7630] wait4(7631, Process 7630 suspended I asked for help on the #drbd channel on FreeNode, and the only remark I got there was that I should upgrade the userspace tools. Somehow, that doesn't feel right to me - can a newer kernel require new userspace tools to still be able to use a certain kernel functionality at all? Doesn't this fall under not breaking userspace with new kernel versions? Even if the kernel did require new userspace tools, should there not be some better mechanism to notify the user they must upgrade them before things will work? At the moment all I see without strace is: # drbdadm attach r0 DRBD module version: 8.4.2 userland version: 8.3.13 you should upgrade your drbd tools! [hang] There is nothing in dmesg during this time, either. Cheers, Chris PS: Please ensure you CC me as I'm no longer an LKML subscriber. -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
drbd: kernels 3.7 = 3.8 broken userspace compatibility
Hi all, I upgraded from a 3.7.x kernel to a 3.8.x kernel on a test machine running DRBD, and found myself unable to bring up my DRBD devices. I'm using the 8.3.13 userspace tools as shipped in Debian Wheezy, which work fine on the 3.7 kernel, but they appear to hang when using the 3.8 kernel and cannot set up the device. The 3.8 kernel appears to introduce drbd 8.4.2 rather than the 8.3.13 available in 3.7. The hang seems to be caused by lots of the following: [pid 7631] socket(PF_NETLINK, SOCK_DGRAM, 11) = 8 [pid 7631] getpid()= 7631 [pid 7631] bind(8, {sa_family=AF_NETLINK, pid=7631, groups=}, 12) = 0 [pid 7631] sendto(8, 4\0\0\0\3\0\0\0\1\0\0\0\317\35\0\0\4\0\0\0\1\0\0\0\1\0\0\0\317\35\0\0..., 52, 0, NULL, 0) = 52 [pid 7631] poll([{fd=8, events=POLLIN}], 1, 12 unfinished ... [pid 7630] ... read resumed 0x7fff6d011a30, 1024) = ? ERESTARTSYS (To be restarted) [pid 7630] --- SIGALRM (Alarm clock) @ 0 (0) --- [pid 7630] rt_sigreturn(0xe) = -1 EINTR (Interrupted system call) [pid 7630] close(8)= 0 [pid 7630] wait4(7631, Process 7630 suspended I asked for help on the #drbd channel on FreeNode, and the only remark I got there was that I should upgrade the userspace tools. Somehow, that doesn't feel right to me - can a newer kernel require new userspace tools to still be able to use a certain kernel functionality at all? Doesn't this fall under not breaking userspace with new kernel versions? Even if the kernel did require new userspace tools, should there not be some better mechanism to notify the user they must upgrade them before things will work? At the moment all I see without strace is: # drbdadm attach r0 DRBD module version: 8.4.2 userland version: 8.3.13 you should upgrade your drbd tools! [hang] There is nothing in dmesg during this time, either. Cheers, Chris PS: Please ensure you CC me as I'm no longer an LKML subscriber. -- Chris Boot bo...@bootc.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)
Chris Boot wrote: I'll probably just try and recompile the kernel with 8k stacks and see how it goes. Screw the support, we're unlikely to get it anyway. :-P Please report how this works out. I will. This will probably be on Monday now, since the machine isn't accepting SysRq requests over the serial console. :-( OK, with the recompiled kernel this appears to work just fine now. I've been pounding the box all day with rsyncs, VMware VMs, plenty of web serving (inc. SVN) and so far it's holding up just fine. Cheers for the diagnosis. Many thanks, Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)
Chris Boot wrote: I'll probably just try and recompile the kernel with 8k stacks and see how it goes. Screw the support, we're unlikely to get it anyway. :-P Please report how this works out. I will. This will probably be on Monday now, since the machine isn't accepting SysRq requests over the serial console. :-( OK, with the recompiled kernel this appears to work just fine now. I've been pounding the box all day with rsyncs, VMware VMs, plenty of web serving (inc. SVN) and so far it's holding up just fine. Cheers for the diagnosis. Many thanks, Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)
Måns Rullgård wrote: Chris Boot <[EMAIL PROTECTED]> writes: Måns Rullgård wrote: Chris Boot <[EMAIL PROTECTED]> writes: All, I've got a box running RHEL5 and haven't been impressed by ext3 performance on it (running of a 1.5TB HP MSA20 using the cciss driver). I compiled XFS as a module and tried it out since I'm used to using it on Debian, which runs much more efficiently. However, every so often the kernel panics as below. Apologies for the tainted kernel, but we run VMware Server on the box as well. Does anyone have any hits/tips for using XFS on Red Hat? What's causing the panic below, and is there a way around this? BUG: unable to handle kernel paging request at virtual address b8af9d60 printing eip: c0415974 *pde = Oops: [#1] SMP last sysfs file: /block/loop7/dev [...] [] xfsbufd_wakeup+0x28/0x49 [xfs] [] shrink_slab+0x56/0x13c [] try_to_free_pages+0x162/0x23e [] __alloc_pages+0x18d/0x27e [] find_or_create_page+0x53/0x8c [] __getblk+0x162/0x270 [] do_lookup+0x53/0x157 [] ext3_getblk+0x7c/0x233 [ext3] [] ext3_getblk+0xeb/0x233 [ext3] [] mntput_no_expire+0x11/0x6a [] ext3_bread+0x13/0x69 [ext3] [] htree_dirblock_to_tree+0x22/0x113 [ext3] [] ext3_htree_fill_tree+0x58/0x1a0 [ext3] [] do_path_lookup+0x20e/0x25f [] get_empty_filp+0x99/0x15e [] ext3_permission+0x0/0xa [ext3] [] ext3_readdir+0x1ce/0x59b [ext3] [] filldir+0x0/0xb9 [] sys_fstat64+0x1e/0x23 [] vfs_readdir+0x63/0x8d [] filldir+0x0/0xb9 [] sys_getdents+0x5f/0x9c [] syscall_call+0x7/0xb === Your Redhat kernel is probably built with 4k stacks and XFS+loop+ext3 seems to be enough to overflow it. Thanks, that explains a lot. However, I don't have any XFS filesystems mounted over loop devices on ext3. Earlier in the day I had iso9660 on loop on xfs, could that have caused the issue? It was unmounted and deleted when this panic occurred. The mention of /block/loop7/dev and the presence both XFS and ext3 function in the call stack suggested to me that you might have an ext3 filesystem in a loop device on XFS. I see no other explanation for that call stack other than a stack overflow, but then we're still back at the same root cause. Are you using device-mapper and/or md? They too are known to blow 4k stacks when used with XFS. I am. The situation was earlier on was iso9660 on loop on xfs on lvm on cciss. I guess that might have smashed the stack undetectably and induced corruption encountered later on? When I experienced this panic the machine would have probably been performing a backup, which was simply a load of ext3/xfs filesystems on lvm on the HP cciss controller. None of the loop devices would have been mounted. I have a few machines now with 4k stacks and using lvm + md + xfs and have no trouble at all, but none are Red Hat (all Debian) and none use cciss either. Maybe it's a deadly combination. I'll probably just try and recompile the kernel with 8k stacks and see how it goes. Screw the support, we're unlikely to get it anyway. :-P Please report how this works out. I will. This will probably be on Monday now, since the machine isn't accepting SysRq requests over the serial console. :-( Many thanks, Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)
Måns Rullgård wrote: Chris Boot <[EMAIL PROTECTED]> writes: All, I've got a box running RHEL5 and haven't been impressed by ext3 performance on it (running of a 1.5TB HP MSA20 using the cciss driver). I compiled XFS as a module and tried it out since I'm used to using it on Debian, which runs much more efficiently. However, every so often the kernel panics as below. Apologies for the tainted kernel, but we run VMware Server on the box as well. Does anyone have any hits/tips for using XFS on Red Hat? What's causing the panic below, and is there a way around this? BUG: unable to handle kernel paging request at virtual address b8af9d60 printing eip: c0415974 *pde = Oops: [#1] SMP last sysfs file: /block/loop7/dev Modules linked in: loop nfsd exportfs lockd nfs_acl iscsi_trgt(U) autofs4 hidp nls_utf8 cifs ppdev rfcomm l2cap bluetooth vmnet(U) vmmon(U) sunrpc ipv6 xfs(U) video sbs i2c_ec button battery asus_acpi ac lp st sg floppy serio_raw intel_rng pcspkr e100 mii e7xxx_edac i2c_i801 edac_mc i2c_core e1000 r8169 ide_cd cdrom parport_pc parport dm_snapshot dm_zero dm_mirror dm_mod cciss mptspi mptscsih scsi_transport_spi sd_mod scsi_mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU:1 EIP:0060:[]Tainted: P VLI EFLAGS: 00010046 (2.6.18-8.1.8.el5 #1) EIP is at smp_send_reschedule+0x3/0x53 eax: c213f000 ebx: c213f000 ecx: eef84000 edx: c213f000 esi: 1086 edi: f668c000 ebp: f4f2fce8 esp: f4f2fc8c ds: 007b es: 007b ss: 0068 Process crond (pid: 3146, ti=f4f2f000 task=f51faaa0 task.ti=f4f2f000) Stack: 66d66b89 c041dc23 a9afbb0e fea5 01904500 000f 0001 0001 c200c6e0 0100 0069 0180 018fc500 c200d240 0003 0292 f601efc0 f6027e00 0050 Call Trace: [] try_to_wake_up+0x351/0x37b [] xfsbufd_wakeup+0x28/0x49 [xfs] [] shrink_slab+0x56/0x13c [] try_to_free_pages+0x162/0x23e [] __alloc_pages+0x18d/0x27e [] find_or_create_page+0x53/0x8c [] __getblk+0x162/0x270 [] do_lookup+0x53/0x157 [] ext3_getblk+0x7c/0x233 [ext3] [] ext3_getblk+0xeb/0x233 [ext3] [] mntput_no_expire+0x11/0x6a [] ext3_bread+0x13/0x69 [ext3] [] htree_dirblock_to_tree+0x22/0x113 [ext3] [] ext3_htree_fill_tree+0x58/0x1a0 [ext3] [] do_path_lookup+0x20e/0x25f [] get_empty_filp+0x99/0x15e [] ext3_permission+0x0/0xa [ext3] [] ext3_readdir+0x1ce/0x59b [ext3] [] filldir+0x0/0xb9 [] sys_fstat64+0x1e/0x23 [] vfs_readdir+0x63/0x8d [] filldir+0x0/0xb9 [] sys_getdents+0x5f/0x9c [] syscall_call+0x7/0xb === Your Redhat kernel is probably built with 4k stacks and XFS+loop+ext3 seems to be enough to overflow it. Thanks, that explains a lot. However, I don't have any XFS filesystems mounted over loop devices on ext3. Earlier in the day I had iso9660 on loop on xfs, could that have caused the issue? It was unmounted and deleted when this panic occurred. I'll probably just try and recompile the kernel with 8k stacks and see how it goes. Screw the support, we're unlikely to get it anyway. :-P Many thanks, Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)
All, I've got a box running RHEL5 and haven't been impressed by ext3 performance on it (running of a 1.5TB HP MSA20 using the cciss driver). I compiled XFS as a module and tried it out since I'm used to using it on Debian, which runs much more efficiently. However, every so often the kernel panics as below. Apologies for the tainted kernel, but we run VMware Server on the box as well. Does anyone have any hits/tips for using XFS on Red Hat? What's causing the panic below, and is there a way around this? Many thanks, Chris Boot BUG: unable to handle kernel paging request at virtual address b8af9d60 printing eip: c0415974 *pde = Oops: [#1] SMP last sysfs file: /block/loop7/dev Modules linked in: loop nfsd exportfs lockd nfs_acl iscsi_trgt(U) autofs4 hidp nls_utf8 cifs ppdev rfcomm l2cap bluetooth vmnet(U) vmmon(U) sunrpc ipv6 xfs(U) video sbs i2c_ec button battery asus_acpi ac lp st sg floppy serio_raw intel_rng pcspkr e100 mii e7xxx_edac i2c_i801 edac_mc i2c_core e1000 r8169 ide_cd cdrom parport_pc parport dm_snapshot dm_zero dm_mirror dm_mod cciss mptspi mptscsih scsi_transport_spi sd_mod scsi_mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU:1 EIP:0060:[]Tainted: P VLI EFLAGS: 00010046 (2.6.18-8.1.8.el5 #1) EIP is at smp_send_reschedule+0x3/0x53 eax: c213f000 ebx: c213f000 ecx: eef84000 edx: c213f000 esi: 1086 edi: f668c000 ebp: f4f2fce8 esp: f4f2fc8c ds: 007b es: 007b ss: 0068 Process crond (pid: 3146, ti=f4f2f000 task=f51faaa0 task.ti=f4f2f000) Stack: 66d66b89 c041dc23 a9afbb0e fea5 01904500 000f 0001 0001 c200c6e0 0100 0069 0180 018fc500 c200d240 0003 0292 f601efc0 f6027e00 0050 Call Trace: [] try_to_wake_up+0x351/0x37b [] xfsbufd_wakeup+0x28/0x49 [xfs] [] shrink_slab+0x56/0x13c [] try_to_free_pages+0x162/0x23e [] __alloc_pages+0x18d/0x27e [] find_or_create_page+0x53/0x8c [] __getblk+0x162/0x270 [] do_lookup+0x53/0x157 [] ext3_getblk+0x7c/0x233 [ext3] [] ext3_getblk+0xeb/0x233 [ext3] [] mntput_no_expire+0x11/0x6a [] ext3_bread+0x13/0x69 [ext3] [] htree_dirblock_to_tree+0x22/0x113 [ext3] [] ext3_htree_fill_tree+0x58/0x1a0 [ext3] [] do_path_lookup+0x20e/0x25f [] get_empty_filp+0x99/0x15e [] ext3_permission+0x0/0xa [ext3] [] ext3_readdir+0x1ce/0x59b [ext3] [] filldir+0x0/0xb9 [] sys_fstat64+0x1e/0x23 [] vfs_readdir+0x63/0x8d [] filldir+0x0/0xb9 [] sys_getdents+0x5f/0x9c [] syscall_call+0x7/0xb === Code: 5d c3 b9 01 00 00 00 31 d2 6a 00 b8 f0 5a 41 c0 e8 2a ff ff ff fa e8 52 16 00 00 fb 58 c3 b8 54 3a 66 c0 e9 8e 6b 1e 00 53 89 c3 <0f> a3 05 60 1f 6d c0 19 c0 85 c0 75 27 e8 bf db 00 00 50 68 55 EIP: [] smp_send_reschedule+0x3/0x53 SS:ESP 0068:f4f2fc8c <0>Kernel panic - not syncing: Fatal exception - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)
All, I've got a box running RHEL5 and haven't been impressed by ext3 performance on it (running of a 1.5TB HP MSA20 using the cciss driver). I compiled XFS as a module and tried it out since I'm used to using it on Debian, which runs much more efficiently. However, every so often the kernel panics as below. Apologies for the tainted kernel, but we run VMware Server on the box as well. Does anyone have any hits/tips for using XFS on Red Hat? What's causing the panic below, and is there a way around this? Many thanks, Chris Boot BUG: unable to handle kernel paging request at virtual address b8af9d60 printing eip: c0415974 *pde = Oops: [#1] SMP last sysfs file: /block/loop7/dev Modules linked in: loop nfsd exportfs lockd nfs_acl iscsi_trgt(U) autofs4 hidp nls_utf8 cifs ppdev rfcomm l2cap bluetooth vmnet(U) vmmon(U) sunrpc ipv6 xfs(U) video sbs i2c_ec button battery asus_acpi ac lp st sg floppy serio_raw intel_rng pcspkr e100 mii e7xxx_edac i2c_i801 edac_mc i2c_core e1000 r8169 ide_cd cdrom parport_pc parport dm_snapshot dm_zero dm_mirror dm_mod cciss mptspi mptscsih scsi_transport_spi sd_mod scsi_mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU:1 EIP:0060:[c0415974]Tainted: P VLI EFLAGS: 00010046 (2.6.18-8.1.8.el5 #1) EIP is at smp_send_reschedule+0x3/0x53 eax: c213f000 ebx: c213f000 ecx: eef84000 edx: c213f000 esi: 1086 edi: f668c000 ebp: f4f2fce8 esp: f4f2fc8c ds: 007b es: 007b ss: 0068 Process crond (pid: 3146, ti=f4f2f000 task=f51faaa0 task.ti=f4f2f000) Stack: 66d66b89 c041dc23 a9afbb0e fea5 01904500 000f 0001 0001 c200c6e0 0100 0069 0180 018fc500 c200d240 0003 0292 f601efc0 f6027e00 0050 Call Trace: [c041dc23] try_to_wake_up+0x351/0x37b [f936884e] xfsbufd_wakeup+0x28/0x49 [xfs] [c04572f9] shrink_slab+0x56/0x13c [c0457c0c] try_to_free_pages+0x162/0x23e [c0454064] __alloc_pages+0x18d/0x27e [c045214e] find_or_create_page+0x53/0x8c [c046c7b1] __getblk+0x162/0x270 [c0475be0] do_lookup+0x53/0x157 [f889138f] ext3_getblk+0x7c/0x233 [ext3] [f88913fe] ext3_getblk+0xeb/0x233 [ext3] [c048215c] mntput_no_expire+0x11/0x6a [f889226e] ext3_bread+0x13/0x69 [ext3] [f8895606] htree_dirblock_to_tree+0x22/0x113 [ext3] [f889574f] ext3_htree_fill_tree+0x58/0x1a0 [ext3] [c047828b] do_path_lookup+0x20e/0x25f [c046b987] get_empty_filp+0x99/0x15e [f889d611] ext3_permission+0x0/0xa [ext3] [f888eaa3] ext3_readdir+0x1ce/0x59b [ext3] [c047a0dd] filldir+0x0/0xb9 [c0472973] sys_fstat64+0x1e/0x23 [c047a1f9] vfs_readdir+0x63/0x8d [c047a0dd] filldir+0x0/0xb9 [c047a447] sys_getdents+0x5f/0x9c [c0403eff] syscall_call+0x7/0xb === Code: 5d c3 b9 01 00 00 00 31 d2 6a 00 b8 f0 5a 41 c0 e8 2a ff ff ff fa e8 52 16 00 00 fb 58 c3 b8 54 3a 66 c0 e9 8e 6b 1e 00 53 89 c3 0f a3 05 60 1f 6d c0 19 c0 85 c0 75 27 e8 bf db 00 00 50 68 55 EIP: [c0415974] smp_send_reschedule+0x3/0x53 SS:ESP 0068:f4f2fc8c 0Kernel panic - not syncing: Fatal exception - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)
Måns Rullgård wrote: Chris Boot [EMAIL PROTECTED] writes: All, I've got a box running RHEL5 and haven't been impressed by ext3 performance on it (running of a 1.5TB HP MSA20 using the cciss driver). I compiled XFS as a module and tried it out since I'm used to using it on Debian, which runs much more efficiently. However, every so often the kernel panics as below. Apologies for the tainted kernel, but we run VMware Server on the box as well. Does anyone have any hits/tips for using XFS on Red Hat? What's causing the panic below, and is there a way around this? BUG: unable to handle kernel paging request at virtual address b8af9d60 printing eip: c0415974 *pde = Oops: [#1] SMP last sysfs file: /block/loop7/dev Modules linked in: loop nfsd exportfs lockd nfs_acl iscsi_trgt(U) autofs4 hidp nls_utf8 cifs ppdev rfcomm l2cap bluetooth vmnet(U) vmmon(U) sunrpc ipv6 xfs(U) video sbs i2c_ec button battery asus_acpi ac lp st sg floppy serio_raw intel_rng pcspkr e100 mii e7xxx_edac i2c_i801 edac_mc i2c_core e1000 r8169 ide_cd cdrom parport_pc parport dm_snapshot dm_zero dm_mirror dm_mod cciss mptspi mptscsih scsi_transport_spi sd_mod scsi_mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU:1 EIP:0060:[c0415974]Tainted: P VLI EFLAGS: 00010046 (2.6.18-8.1.8.el5 #1) EIP is at smp_send_reschedule+0x3/0x53 eax: c213f000 ebx: c213f000 ecx: eef84000 edx: c213f000 esi: 1086 edi: f668c000 ebp: f4f2fce8 esp: f4f2fc8c ds: 007b es: 007b ss: 0068 Process crond (pid: 3146, ti=f4f2f000 task=f51faaa0 task.ti=f4f2f000) Stack: 66d66b89 c041dc23 a9afbb0e fea5 01904500 000f 0001 0001 c200c6e0 0100 0069 0180 018fc500 c200d240 0003 0292 f601efc0 f6027e00 0050 Call Trace: [c041dc23] try_to_wake_up+0x351/0x37b [f936884e] xfsbufd_wakeup+0x28/0x49 [xfs] [c04572f9] shrink_slab+0x56/0x13c [c0457c0c] try_to_free_pages+0x162/0x23e [c0454064] __alloc_pages+0x18d/0x27e [c045214e] find_or_create_page+0x53/0x8c [c046c7b1] __getblk+0x162/0x270 [c0475be0] do_lookup+0x53/0x157 [f889138f] ext3_getblk+0x7c/0x233 [ext3] [f88913fe] ext3_getblk+0xeb/0x233 [ext3] [c048215c] mntput_no_expire+0x11/0x6a [f889226e] ext3_bread+0x13/0x69 [ext3] [f8895606] htree_dirblock_to_tree+0x22/0x113 [ext3] [f889574f] ext3_htree_fill_tree+0x58/0x1a0 [ext3] [c047828b] do_path_lookup+0x20e/0x25f [c046b987] get_empty_filp+0x99/0x15e [f889d611] ext3_permission+0x0/0xa [ext3] [f888eaa3] ext3_readdir+0x1ce/0x59b [ext3] [c047a0dd] filldir+0x0/0xb9 [c0472973] sys_fstat64+0x1e/0x23 [c047a1f9] vfs_readdir+0x63/0x8d [c047a0dd] filldir+0x0/0xb9 [c047a447] sys_getdents+0x5f/0x9c [c0403eff] syscall_call+0x7/0xb === Your Redhat kernel is probably built with 4k stacks and XFS+loop+ext3 seems to be enough to overflow it. Thanks, that explains a lot. However, I don't have any XFS filesystems mounted over loop devices on ext3. Earlier in the day I had iso9660 on loop on xfs, could that have caused the issue? It was unmounted and deleted when this panic occurred. I'll probably just try and recompile the kernel with 8k stacks and see how it goes. Screw the support, we're unlikely to get it anyway. :-P Many thanks, Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)
Måns Rullgård wrote: Chris Boot [EMAIL PROTECTED] writes: Måns Rullgård wrote: Chris Boot [EMAIL PROTECTED] writes: All, I've got a box running RHEL5 and haven't been impressed by ext3 performance on it (running of a 1.5TB HP MSA20 using the cciss driver). I compiled XFS as a module and tried it out since I'm used to using it on Debian, which runs much more efficiently. However, every so often the kernel panics as below. Apologies for the tainted kernel, but we run VMware Server on the box as well. Does anyone have any hits/tips for using XFS on Red Hat? What's causing the panic below, and is there a way around this? BUG: unable to handle kernel paging request at virtual address b8af9d60 printing eip: c0415974 *pde = Oops: [#1] SMP last sysfs file: /block/loop7/dev [...] [f936884e] xfsbufd_wakeup+0x28/0x49 [xfs] [c04572f9] shrink_slab+0x56/0x13c [c0457c0c] try_to_free_pages+0x162/0x23e [c0454064] __alloc_pages+0x18d/0x27e [c045214e] find_or_create_page+0x53/0x8c [c046c7b1] __getblk+0x162/0x270 [c0475be0] do_lookup+0x53/0x157 [f889138f] ext3_getblk+0x7c/0x233 [ext3] [f88913fe] ext3_getblk+0xeb/0x233 [ext3] [c048215c] mntput_no_expire+0x11/0x6a [f889226e] ext3_bread+0x13/0x69 [ext3] [f8895606] htree_dirblock_to_tree+0x22/0x113 [ext3] [f889574f] ext3_htree_fill_tree+0x58/0x1a0 [ext3] [c047828b] do_path_lookup+0x20e/0x25f [c046b987] get_empty_filp+0x99/0x15e [f889d611] ext3_permission+0x0/0xa [ext3] [f888eaa3] ext3_readdir+0x1ce/0x59b [ext3] [c047a0dd] filldir+0x0/0xb9 [c0472973] sys_fstat64+0x1e/0x23 [c047a1f9] vfs_readdir+0x63/0x8d [c047a0dd] filldir+0x0/0xb9 [c047a447] sys_getdents+0x5f/0x9c [c0403eff] syscall_call+0x7/0xb === Your Redhat kernel is probably built with 4k stacks and XFS+loop+ext3 seems to be enough to overflow it. Thanks, that explains a lot. However, I don't have any XFS filesystems mounted over loop devices on ext3. Earlier in the day I had iso9660 on loop on xfs, could that have caused the issue? It was unmounted and deleted when this panic occurred. The mention of /block/loop7/dev and the presence both XFS and ext3 function in the call stack suggested to me that you might have an ext3 filesystem in a loop device on XFS. I see no other explanation for that call stack other than a stack overflow, but then we're still back at the same root cause. Are you using device-mapper and/or md? They too are known to blow 4k stacks when used with XFS. I am. The situation was earlier on was iso9660 on loop on xfs on lvm on cciss. I guess that might have smashed the stack undetectably and induced corruption encountered later on? When I experienced this panic the machine would have probably been performing a backup, which was simply a load of ext3/xfs filesystems on lvm on the HP cciss controller. None of the loop devices would have been mounted. I have a few machines now with 4k stacks and using lvm + md + xfs and have no trouble at all, but none are Red Hat (all Debian) and none use cciss either. Maybe it's a deadly combination. I'll probably just try and recompile the kernel with 8k stacks and see how it goes. Screw the support, we're unlikely to get it anyway. :-P Please report how this works out. I will. This will probably be on Monday now, since the machine isn't accepting SysRq requests over the serial console. :-( Many thanks, Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SiI 3112A + Seagate HDs = still no go? [SOLVED]
Tejun Heo wrote: Chris Boot wrote: Some interesting developments! I installed a fresh copy of Windows, and all the VIA and nVidia and so on drivers. At some point during all this (a period of relatively heavy disk IO), the computer seemed to crash and I rebooted it. It then worked fine for a while, but during my perfmon testing it seemed to do the same thing. This time I left it for a while and it did eventually wake up again, so I'm guessing the controller is a bit fubared. Perfmon did indeed show several dips down to or very close to 0 during the write operation, with peaks up to 48 MB/sec, which is pretty respectable. So, time to replace the brand-new controller I guess. Now, do you think this is just my one particular controller card and a simple return would fix the problem, or is it more likely a problem with the whole range? It's an Innovision EIO SATA controller: http:// www.ivmm.com/eio/products/index.htm Would it be a safer bet to go for the Adaptec controller of the same variety? How reliable are they? I frankly don't know. Maybe it's just one faulty controller, connector or whatever. Maybe the card manufacturer screwed up somewhere. I mean, the only course I took in electronics is introductory digital circuits which used 74xx chips and push triggered clock on a breadboard. What would I know about gigahertz signaling error. :-p Though, one thing I can say is majority of 311x controllers don't seem to suffer from this problem. So, take your pick. Right, I've replaced my previous controller with an Adaptec AHA-1205SA, and I'm rebulding 2 RAID-1 arrays at 50MB/sec without a hitch. Thanks for your help diagnosing my problem, it was much appreciated! Many thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SiI 3112A + Seagate HDs = still no go? [SOLVED]
Tejun Heo wrote: Chris Boot wrote: Some interesting developments! I installed a fresh copy of Windows, and all the VIA and nVidia and so on drivers. At some point during all this (a period of relatively heavy disk IO), the computer seemed to crash and I rebooted it. It then worked fine for a while, but during my perfmon testing it seemed to do the same thing. This time I left it for a while and it did eventually wake up again, so I'm guessing the controller is a bit fubared. Perfmon did indeed show several dips down to or very close to 0 during the write operation, with peaks up to 48 MB/sec, which is pretty respectable. So, time to replace the brand-new controller I guess. Now, do you think this is just my one particular controller card and a simple return would fix the problem, or is it more likely a problem with the whole range? It's an Innovision EIO SATA controller: http:// www.ivmm.com/eio/products/index.htm Would it be a safer bet to go for the Adaptec controller of the same variety? How reliable are they? I frankly don't know. Maybe it's just one faulty controller, connector or whatever. Maybe the card manufacturer screwed up somewhere. I mean, the only course I took in electronics is introductory digital circuits which used 74xx chips and push triggered clock on a breadboard. What would I know about gigahertz signaling error. :-p Though, one thing I can say is majority of 311x controllers don't seem to suffer from this problem. So, take your pick. Right, I've replaced my previous controller with an Adaptec AHA-1205SA, and I'm rebulding 2 RAID-1 arrays at 50MB/sec without a hitch. Thanks for your help diagnosing my problem, it was much appreciated! Many thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SiI 3112A + Seagate HDs = still no go?
On 13 Aug 2005, at 2:13, Tejun Heo wrote: Hello, Chris. Chris Boot wrote: On 12 Aug 2005, at 15:08, Tejun Heo wrote: [adding cc to Jeff Garzik. (Hi!)] Hi again, Chris. Unfortunately, I'm as lost as you are. Can you please do the followings? * Verify if read is free from the problem. ie. does "dd if=/ dev/ sd? of=/dev/null" work? Works like a treat at 30 MB/s. I do get a few errors in the log (repeated a couple of times), but they seem mostly harmless: ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } This is IDE ABRT error and it indicates that something strange is going on. You're not getting this kind of error on VIA controller, right? I most certainly am not. * Turn on ATA_DEBUG and ATA_VERBOSE_DEBUG in include/linux/ libata.h (change #undef's to #define's) and make the drive hang. The log should show what was going on. While untarring and compiling the new kernel I got lots of: ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x84 { DriveStatusError BadCRC } Wow, this is CRC error. Something is wrong w/ your controller. Great... Syslog seems to die log before I get anything useful, and setting loglevel 9 with SysRq gives: ata_fill_sg: PRD[126]: 0x1206A000, 0x1000) ata_fill_sg: PRD[127]: 0x1206B000, 0x1000) ata_dev_select: ENTER, ata1: device 0, wait 1 ATA: abnormal status 0xD9 on port 0xE0804087 ATA: abnormal status 0xD9 on port 0xE0804087 ata_tf_load_mmio: hob: feat 0x0 nsect 0x3, lba 0x1 0x0 0x0 ata_tf_load_mmio: feat 0x0 nsect 0xF8 lba 0x1A 0xEF 0x33 ata_tf_load_mmio: device 0xE0 ATA: abnormal statux 0xD9 on port 0xE0804087 ata_exec_command_mmio: ata: cmd 0x35 ata_scsi_translate: EXIT It then hangs for exactly 30 seconds, and more stuff flies by followed by much the same messages EXCEPT: 1. There seems to be one less ata_fill_sg line every time, since PRD [XXX] decrements by one every time. 2. The ata_tf_load_mmio lines give different nsect and lba, the device stays the same. 30 secs is SCSI command timeout and retrying w/ one less chunk is sd driver's error recovery behavior. It seems that a lot of errors occur while bits are going through your SATA connection. I don't know about Seagate drives, but my Samsung drive sometimes locks up if it gets weird packets/ commands. This might be also your case. PHY-resetting usually gets the drive back online but currently libata doesn't do any such error recovery actions. To make sure that it's because of faulty controller, can you please try the following? * Monitor how IO goes on the drive in Windows. You can do this by - Start->Run and enter perfmon. - After perfmon starts, right click on (heh heh, I guess this is one of those few times you read this on linux kernel mailing list) counter list and select add. Add DiskBytes/sec counter of PhysicalDisk object. - Adjust scale to 0.010. Also, change color to black to make it stand out. - start dd. I think, if the errors are due to hardware error, the perfmon graph will show some stuttering when it hits command timeout. So, write to disk, as writing seems to cause timeouts. If the problem also happens on Windows, it's highly likely that you have a faulty controller. Some interesting developments! I installed a fresh copy of Windows, and all the VIA and nVidia and so on drivers. At some point during all this (a period of relatively heavy disk IO), the computer seemed to crash and I rebooted it. It then worked fine for a while, but during my perfmon testing it seemed to do the same thing. This time I left it for a while and it did eventually wake up again, so I'm guessing the controller is a bit fubared. Perfmon did indeed show several dips down to or very close to 0 during the write operation, with peaks up to 48 MB/sec, which is pretty respectable. So, time to replace the brand-new controller I guess. Now, do you think this is just my one particular controller card and a simple return would fix the problem, or is it more likely a problem with the whole range? It's an Innovision EIO SATA controller: http:// www.ivmm.com/eio/products/index.htm Would it be a safer bet to go for the Adaptec controller of the same variety? How reliable are they? Many thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SiI 3112A + Seagate HDs = still no go?
On 13 Aug 2005, at 2:13, Tejun Heo wrote: Hello, Chris. Chris Boot wrote: On 12 Aug 2005, at 15:08, Tejun Heo wrote: [adding cc to Jeff Garzik. (Hi!)] Hi again, Chris. Unfortunately, I'm as lost as you are. Can you please do the followings? * Verify if read is free from the problem. ie. does dd if=/ dev/ sd? of=/dev/null work? Works like a treat at 30 MB/s. I do get a few errors in the log (repeated a couple of times), but they seem mostly harmless: ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x04 { DriveStatusError } This is IDE ABRT error and it indicates that something strange is going on. You're not getting this kind of error on VIA controller, right? I most certainly am not. * Turn on ATA_DEBUG and ATA_VERBOSE_DEBUG in include/linux/ libata.h (change #undef's to #define's) and make the drive hang. The log should show what was going on. While untarring and compiling the new kernel I got lots of: ata1: status=0x51 { DriveReady SeekComplete Error } ata1: error=0x84 { DriveStatusError BadCRC } Wow, this is CRC error. Something is wrong w/ your controller. Great... Syslog seems to die log before I get anything useful, and setting loglevel 9 with SysRq gives: ata_fill_sg: PRD[126]: 0x1206A000, 0x1000) ata_fill_sg: PRD[127]: 0x1206B000, 0x1000) ata_dev_select: ENTER, ata1: device 0, wait 1 ATA: abnormal status 0xD9 on port 0xE0804087 ATA: abnormal status 0xD9 on port 0xE0804087 ata_tf_load_mmio: hob: feat 0x0 nsect 0x3, lba 0x1 0x0 0x0 ata_tf_load_mmio: feat 0x0 nsect 0xF8 lba 0x1A 0xEF 0x33 ata_tf_load_mmio: device 0xE0 ATA: abnormal statux 0xD9 on port 0xE0804087 ata_exec_command_mmio: ata: cmd 0x35 ata_scsi_translate: EXIT It then hangs for exactly 30 seconds, and more stuff flies by followed by much the same messages EXCEPT: 1. There seems to be one less ata_fill_sg line every time, since PRD [XXX] decrements by one every time. 2. The ata_tf_load_mmio lines give different nsect and lba, the device stays the same. 30 secs is SCSI command timeout and retrying w/ one less chunk is sd driver's error recovery behavior. It seems that a lot of errors occur while bits are going through your SATA connection. I don't know about Seagate drives, but my Samsung drive sometimes locks up if it gets weird packets/ commands. This might be also your case. PHY-resetting usually gets the drive back online but currently libata doesn't do any such error recovery actions. To make sure that it's because of faulty controller, can you please try the following? * Monitor how IO goes on the drive in Windows. You can do this by - Start-Run and enter perfmon. - After perfmon starts, right click on (heh heh, I guess this is one of those few times you read this on linux kernel mailing list) counter list and select add. Add DiskBytes/sec counter of PhysicalDisk object. - Adjust scale to 0.010. Also, change color to black to make it stand out. - start dd. I think, if the errors are due to hardware error, the perfmon graph will show some stuttering when it hits command timeout. So, write to disk, as writing seems to cause timeouts. If the problem also happens on Windows, it's highly likely that you have a faulty controller. Some interesting developments! I installed a fresh copy of Windows, and all the VIA and nVidia and so on drivers. At some point during all this (a period of relatively heavy disk IO), the computer seemed to crash and I rebooted it. It then worked fine for a while, but during my perfmon testing it seemed to do the same thing. This time I left it for a while and it did eventually wake up again, so I'm guessing the controller is a bit fubared. Perfmon did indeed show several dips down to or very close to 0 during the write operation, with peaks up to 48 MB/sec, which is pretty respectable. So, time to replace the brand-new controller I guess. Now, do you think this is just my one particular controller card and a simple return would fix the problem, or is it more likely a problem with the whole range? It's an Innovision EIO SATA controller: http:// www.ivmm.com/eio/products/index.htm Would it be a safer bet to go for the Adaptec controller of the same variety? How reliable are they? Many thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SiI 3112A + Seagate HDs = still no go?
On 12 Aug 2005, at 15:08, Tejun Heo wrote: Chris Boot wrote: Hi Tejun, On 12 Aug 2005, at 12:33, Chris Boot wrote: Hi Tejun, On 12 Aug 2005, at 12:28, Tejun Heo wrote: Hello, Chris. Chris Boot wrote: On 12 Aug 2005, at 4:24, Tejun Heo wrote: Chris Boot wrote: Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http:// www.ussg.iu.edu/hypermail/ linux/ kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris [added linux-ide to cc list] Can you please try w/ vanilla kernel (2.6.12 or 2.6.13-rc)? And w/ one drive only? I unplugged both drives from my on-board SATA controller and left just one connected to the 3112A controller. Rebooted with a fresh, vanilla 2.6.13-rc6 and ran: You can leave drives on on-board SATA controller. It wouldn't make any difference. dd if=/dev/zero of=test.img bs=1M count=16384 After about 30 seconds I got the crash and the kernel started repeating every 30 seconds (with different sector numbers): ata1: command 0x35 timeout, stat 0xd9 host_stat 0x1 ata1: status=0xd9 { Busy } SCSI error : <0 0 0 0> return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 14937602 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 dmesg: Linux version 2.6.13-rc6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #1 Fri Aug 12 12:31:25 BST 2005 ... libata version 1.11 loaded. sata_sil version 0.9 ACPI: PCI Interrupt :00:0a.0[A] -> GSI 18 (level, low) -> IRQ 177 ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: no device found (phy stat ) scsi1 : sata_sil Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 sata_via version 1.1 ACPI: PCI Interrupt :00:0f.0[B] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 169 PCI: Via IRQ fixup for :00:0f.0, from 11 to 9 sata_via(:00:0f.0): routed to hard irq line 9 ata3: SATA max UDMA/133 cmd 0xB400 ctl 0xB802 bmdma 0xC400 irq 169 ata4: SATA max UDMA/133 cmd 0xBC00 ctl 0xC002 bmdma 0xC408 irq 169 ata3: no device found (phy stat ) scsi2 : sata_via ata4: no device found (phy stat ) scsi3 : sata_via SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 I forgot to mention previously but I even tried with "noapic nolapic acpi=off pci=routeirq" and got the same trouble. This is weird as ST3250823AS (and all Seagate .8 drives) are known to work without any problem with sii 3112/3114. I currently don't own such a drive but someone confirmed me that ST3250823AS works w/ sii 3114 without any problem (including bonnie++ results and all). So, I don't think it's the good old mod15write problem. I hope it's just a bad hardware, cable or something like that; otherwise, you're hitting a new bug. Can you verify if the drive works under windows? Well, what piqued my interest is that the same drives work fine on my on-board sata_via controller. All 4 drives were bought at the same time and *seem* to be from the same batch, and all work fine on the VIA controller and none work on the 3112A. I've also tried different cables, all of which are Belkin which I thought were decent qual
Re: SiI 3112A + Seagate HDs = still no go?
Hi there, I get very different symptoms indeed. My drive isn't in the blacklist, and adding it has little effect (status 0xd9 to 0xd8, no other differences). Once the controller hangs, I can't even kill dd or login at a different terminal, just a complete lockup. If I have 2 drives plugged in, running the dd on one of them also hangs the other, thus I suspect the controller. Also, reading via dd is fine, only writing has trouble. Chris On 12 Aug 2005, at 16:19, Roger Heflin wrote: With the Segate sata's I worked with before, I had to actually remove them from the blacklist, this was a couple of months ago with the native sata seagate disks. With the drive in the blacklist the drive worked right under light conditions, but under a dd read from the boot seagate the entire machine appeared to block on any io going to that disk, it did not stop (verified by vmstat), but I could never get the 55-60MiB/second expected, and was getting around 15MiB/second, with enormous amounts of interrupts, after removing it from the blacklist, I got the 55-60MiB/second rate, and the interrupts were much more reasonable, and the response of the system was actually useable.When the lockup occurred, stopping the dd resulting in all things unlocking and continuing on, I duplicated this several times with the latest kernel at the time. Roger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Boot Sent: Thursday, August 11, 2005 4:55 PM To: linux-kernel@vger.kernel.org Subject: SiI 3112A + Seagate HDs = still no go? Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/linux/kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris The following messages are sent to the log when everything goes mad: ata1: command 0x35 timeout, stat 0xd8 host_stat 0x0 ata1: status=0xd8 { Busy } SCSI error : <0 0 0 0> return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 2990370 ATA: abnormal status 0xD8 on port E0802087 ATA: abnormal status 0xD8 on port E0802087 ATA: abnormal status 0xD8 on port E0802087 [ the above is transcribed so may not be 100% accurate ] Dmesg log during boot (and detection): Aug 11 21:47:05 arcadia Linux version 2.6.12-gentoo-r6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #2 Thu Aug 11 20:19:00 BST 2005 ... Aug 11 17:30:12 arcadia sata_sil version 0.9 Aug 11 17:30:12 arcadia ACPI: PCI Interrupt :00:0a.0[A] -> GSI 18 (level, low) -> IRQ 177 Aug 11 17:30:12 arcadia ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 Aug 11 17:30:12 arcadia ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 Aug 11 17:30:12 arcadia ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f Aug 11 17:30:12 arcadia ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 Aug 11 17:30:12 arcadia ata1(0): applying Seagate errata fix Aug 11 17:30:12 arcadia ata1: dev 0 configured for UDMA/100 Aug 11 17:30:12 arcadia scsi0 : sata_sil Aug 11 17:30:12 arcadia ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f Aug 11 17:30:12 arcadia ata2: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 Aug 11 17:30:12 arcadia ata2(0): applying Seagate errata fix Aug 11 17:30:12 arcadia ata2: dev 0 configured for UDMA/100 Aug 11 17:30:12 arcadia scsi1 : sata_sil Aug 11 17:30:12 arcadia Vendor: ATA Model: ST3250823AS Rev: 3.03 Aug 11 17:30:12 arcadia Type: Direct-Access ANSI SCSI revision: 05 Aug 11 17:30:12 arcadia Vendor: ATA Model: ST3250823AS Rev: 3.03 Aug 11 17:30:12 arcadia Type: Direct-Access ANSI SCSI revision: 05 lspci: :00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge :00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge :00:0a.0 Unknown mass storage controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02) :00:0c.0 FireWir
Re: SiI 3112A + Seagate HDs = still no go?
Hi Tejun, On 12 Aug 2005, at 12:33, Chris Boot wrote: Hi Tejun, On 12 Aug 2005, at 12:28, Tejun Heo wrote: Hello, Chris. Chris Boot wrote: On 12 Aug 2005, at 4:24, Tejun Heo wrote: Chris Boot wrote: Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/ linux/ kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris [added linux-ide to cc list] Can you please try w/ vanilla kernel (2.6.12 or 2.6.13-rc)? And w/ one drive only? I unplugged both drives from my on-board SATA controller and left just one connected to the 3112A controller. Rebooted with a fresh, vanilla 2.6.13-rc6 and ran: You can leave drives on on-board SATA controller. It wouldn't make any difference. dd if=/dev/zero of=test.img bs=1M count=16384 After about 30 seconds I got the crash and the kernel started repeating every 30 seconds (with different sector numbers): ata1: command 0x35 timeout, stat 0xd9 host_stat 0x1 ata1: status=0xd9 { Busy } SCSI error : <0 0 0 0> return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 14937602 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 dmesg: Linux version 2.6.13-rc6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #1 Fri Aug 12 12:31:25 BST 2005 ... libata version 1.11 loaded. sata_sil version 0.9 ACPI: PCI Interrupt :00:0a.0[A] -> GSI 18 (level, low) -> IRQ 177 ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: no device found (phy stat ) scsi1 : sata_sil Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 sata_via version 1.1 ACPI: PCI Interrupt :00:0f.0[B] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 169 PCI: Via IRQ fixup for :00:0f.0, from 11 to 9 sata_via(:00:0f.0): routed to hard irq line 9 ata3: SATA max UDMA/133 cmd 0xB400 ctl 0xB802 bmdma 0xC400 irq 169 ata4: SATA max UDMA/133 cmd 0xBC00 ctl 0xC002 bmdma 0xC408 irq 169 ata3: no device found (phy stat ) scsi2 : sata_via ata4: no device found (phy stat ) scsi3 : sata_via SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 I forgot to mention previously but I even tried with "noapic nolapic acpi=off pci=routeirq" and got the same trouble. This is weird as ST3250823AS (and all Seagate .8 drives) are known to work without any problem with sii 3112/3114. I currently don't own such a drive but someone confirmed me that ST3250823AS works w/ sii 3114 without any problem (including bonnie++ results and all). So, I don't think it's the good old mod15write problem. I hope it's just a bad hardware, cable or something like that; otherwise, you're hitting a new bug. Can you verify if the drive works under windows? Well, what piqued my interest is that the same drives work fine on my on-board sata_via controller. All 4 drives were bought at the same time and *seem* to be from the same batch, and all work fine on the VIA controller and none work on the 3112A. I've also tried different cables, all of which are Belkin which I thought were decent quality. I'll just try installing Winblows and let you know. I just installed Windows XP SP2 and Cygwin: $ dd if=/dev/zero of
Re: SiI 3112A + Seagate HDs = still no go?
Hi Tejun, On 12 Aug 2005, at 12:28, Tejun Heo wrote: Hello, Chris. Chris Boot wrote: On 12 Aug 2005, at 4:24, Tejun Heo wrote: Chris Boot wrote: Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/ linux/ kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris [added linux-ide to cc list] Can you please try w/ vanilla kernel (2.6.12 or 2.6.13-rc)? And w/ one drive only? I unplugged both drives from my on-board SATA controller and left just one connected to the 3112A controller. Rebooted with a fresh, vanilla 2.6.13-rc6 and ran: You can leave drives on on-board SATA controller. It wouldn't make any difference. dd if=/dev/zero of=test.img bs=1M count=16384 After about 30 seconds I got the crash and the kernel started repeating every 30 seconds (with different sector numbers): ata1: command 0x35 timeout, stat 0xd9 host_stat 0x1 ata1: status=0xd9 { Busy } SCSI error : <0 0 0 0> return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 14937602 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 dmesg: Linux version 2.6.13-rc6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #1 Fri Aug 12 12:31:25 BST 2005 ... libata version 1.11 loaded. sata_sil version 0.9 ACPI: PCI Interrupt :00:0a.0[A] -> GSI 18 (level, low) -> IRQ 177 ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: no device found (phy stat ) scsi1 : sata_sil Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 sata_via version 1.1 ACPI: PCI Interrupt :00:0f.0[B] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 169 PCI: Via IRQ fixup for :00:0f.0, from 11 to 9 sata_via(:00:0f.0): routed to hard irq line 9 ata3: SATA max UDMA/133 cmd 0xB400 ctl 0xB802 bmdma 0xC400 irq 169 ata4: SATA max UDMA/133 cmd 0xBC00 ctl 0xC002 bmdma 0xC408 irq 169 ata3: no device found (phy stat ) scsi2 : sata_via ata4: no device found (phy stat ) scsi3 : sata_via SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 I forgot to mention previously but I even tried with "noapic nolapic acpi=off pci=routeirq" and got the same trouble. This is weird as ST3250823AS (and all Seagate .8 drives) are known to work without any problem with sii 3112/3114. I currently don't own such a drive but someone confirmed me that ST3250823AS works w/ sii 3114 without any problem (including bonnie++ results and all). So, I don't think it's the good old mod15write problem. I hope it's just a bad hardware, cable or something like that; otherwise, you're hitting a new bug. Can you verify if the drive works under windows? Well, what piqued my interest is that the same drives work fine on my on-board sata_via controller. All 4 drives were bought at the same time and *seem* to be from the same batch, and all work fine on the VIA controller and none work on the 3112A. I've also tried different cables, all of which are Belkin which I thought were decent quality. I'll just try installing Winblows and let you know. Many thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line "u
Re: SiI 3112A + Seagate HDs = still no go?
On 12 Aug 2005, at 4:24, Tejun Heo wrote: Chris Boot wrote: Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/linux/ kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris [added linux-ide to cc list] Can you please try w/ vanilla kernel (2.6.12 or 2.6.13-rc)? And w/ one drive only? I unplugged both drives from my on-board SATA controller and left just one connected to the 3112A controller. Rebooted with a fresh, vanilla 2.6.13-rc6 and ran: dd if=/dev/zero of=test.img bs=1M count=16384 After about 30 seconds I got the crash and the kernel started repeating every 30 seconds (with different sector numbers): ata1: command 0x35 timeout, stat 0xd9 host_stat 0x1 ata1: status=0xd9 { Busy } SCSI error : <0 0 0 0> return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 14937602 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 dmesg: Linux version 2.6.13-rc6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #1 Fri Aug 12 12:31:25 BST 2005 ... libata version 1.11 loaded. sata_sil version 0.9 ACPI: PCI Interrupt :00:0a.0[A] -> GSI 18 (level, low) -> IRQ 177 ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: no device found (phy stat ) scsi1 : sata_sil Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 sata_via version 1.1 ACPI: PCI Interrupt :00:0f.0[B] -> Link [ALKA] -> GSI 20 (level, low) -> IRQ 169 PCI: Via IRQ fixup for :00:0f.0, from 11 to 9 sata_via(:00:0f.0): routed to hard irq line 9 ata3: SATA max UDMA/133 cmd 0xB400 ctl 0xB802 bmdma 0xC400 irq 169 ata4: SATA max UDMA/133 cmd 0xBC00 ctl 0xC002 bmdma 0xC408 irq 169 ata3: no device found (phy stat ) scsi2 : sata_via ata4: no device found (phy stat ) scsi3 : sata_via SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 I forgot to mention previously but I even tried with "noapic nolapic acpi=off pci=routeirq" and got the same trouble. Thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SiI 3112A + Seagate HDs = still no go?
On 12 Aug 2005, at 4:24, Tejun Heo wrote: Chris Boot wrote: Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/linux/ kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris [added linux-ide to cc list] Can you please try w/ vanilla kernel (2.6.12 or 2.6.13-rc)? And w/ one drive only? I unplugged both drives from my on-board SATA controller and left just one connected to the 3112A controller. Rebooted with a fresh, vanilla 2.6.13-rc6 and ran: dd if=/dev/zero of=test.img bs=1M count=16384 After about 30 seconds I got the crash and the kernel started repeating every 30 seconds (with different sector numbers): ata1: command 0x35 timeout, stat 0xd9 host_stat 0x1 ata1: status=0xd9 { Busy } SCSI error : 0 0 0 0 return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 14937602 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 dmesg: Linux version 2.6.13-rc6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #1 Fri Aug 12 12:31:25 BST 2005 ... libata version 1.11 loaded. sata_sil version 0.9 ACPI: PCI Interrupt :00:0a.0[A] - GSI 18 (level, low) - IRQ 177 ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: no device found (phy stat ) scsi1 : sata_sil Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 sata_via version 1.1 ACPI: PCI Interrupt :00:0f.0[B] - Link [ALKA] - GSI 20 (level, low) - IRQ 169 PCI: Via IRQ fixup for :00:0f.0, from 11 to 9 sata_via(:00:0f.0): routed to hard irq line 9 ata3: SATA max UDMA/133 cmd 0xB400 ctl 0xB802 bmdma 0xC400 irq 169 ata4: SATA max UDMA/133 cmd 0xBC00 ctl 0xC002 bmdma 0xC408 irq 169 ata3: no device found (phy stat ) scsi2 : sata_via ata4: no device found (phy stat ) scsi3 : sata_via SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 I forgot to mention previously but I even tried with noapic nolapic acpi=off pci=routeirq and got the same trouble. Thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SiI 3112A + Seagate HDs = still no go?
Hi Tejun, On 12 Aug 2005, at 12:28, Tejun Heo wrote: Hello, Chris. Chris Boot wrote: On 12 Aug 2005, at 4:24, Tejun Heo wrote: Chris Boot wrote: Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/ linux/ kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris [added linux-ide to cc list] Can you please try w/ vanilla kernel (2.6.12 or 2.6.13-rc)? And w/ one drive only? I unplugged both drives from my on-board SATA controller and left just one connected to the 3112A controller. Rebooted with a fresh, vanilla 2.6.13-rc6 and ran: You can leave drives on on-board SATA controller. It wouldn't make any difference. dd if=/dev/zero of=test.img bs=1M count=16384 After about 30 seconds I got the crash and the kernel started repeating every 30 seconds (with different sector numbers): ata1: command 0x35 timeout, stat 0xd9 host_stat 0x1 ata1: status=0xd9 { Busy } SCSI error : 0 0 0 0 return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 14937602 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 dmesg: Linux version 2.6.13-rc6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #1 Fri Aug 12 12:31:25 BST 2005 ... libata version 1.11 loaded. sata_sil version 0.9 ACPI: PCI Interrupt :00:0a.0[A] - GSI 18 (level, low) - IRQ 177 ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: no device found (phy stat ) scsi1 : sata_sil Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 sata_via version 1.1 ACPI: PCI Interrupt :00:0f.0[B] - Link [ALKA] - GSI 20 (level, low) - IRQ 169 PCI: Via IRQ fixup for :00:0f.0, from 11 to 9 sata_via(:00:0f.0): routed to hard irq line 9 ata3: SATA max UDMA/133 cmd 0xB400 ctl 0xB802 bmdma 0xC400 irq 169 ata4: SATA max UDMA/133 cmd 0xBC00 ctl 0xC002 bmdma 0xC408 irq 169 ata3: no device found (phy stat ) scsi2 : sata_via ata4: no device found (phy stat ) scsi3 : sata_via SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 I forgot to mention previously but I even tried with noapic nolapic acpi=off pci=routeirq and got the same trouble. This is weird as ST3250823AS (and all Seagate .8 drives) are known to work without any problem with sii 3112/3114. I currently don't own such a drive but someone confirmed me that ST3250823AS works w/ sii 3114 without any problem (including bonnie++ results and all). So, I don't think it's the good old mod15write problem. I hope it's just a bad hardware, cable or something like that; otherwise, you're hitting a new bug. Can you verify if the drive works under windows? Well, what piqued my interest is that the same drives work fine on my on-board sata_via controller. All 4 drives were bought at the same time and *seem* to be from the same batch, and all work fine on the VIA controller and none work on the 3112A. I've also tried different cables, all of which are Belkin which I thought were decent quality. I'll just try installing Winblows and let you know. Many thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message
Re: SiI 3112A + Seagate HDs = still no go?
Hi Tejun, On 12 Aug 2005, at 12:33, Chris Boot wrote: Hi Tejun, On 12 Aug 2005, at 12:28, Tejun Heo wrote: Hello, Chris. Chris Boot wrote: On 12 Aug 2005, at 4:24, Tejun Heo wrote: Chris Boot wrote: Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/ linux/ kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris [added linux-ide to cc list] Can you please try w/ vanilla kernel (2.6.12 or 2.6.13-rc)? And w/ one drive only? I unplugged both drives from my on-board SATA controller and left just one connected to the 3112A controller. Rebooted with a fresh, vanilla 2.6.13-rc6 and ran: You can leave drives on on-board SATA controller. It wouldn't make any difference. dd if=/dev/zero of=test.img bs=1M count=16384 After about 30 seconds I got the crash and the kernel started repeating every 30 seconds (with different sector numbers): ata1: command 0x35 timeout, stat 0xd9 host_stat 0x1 ata1: status=0xd9 { Busy } SCSI error : 0 0 0 0 return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 14937602 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 dmesg: Linux version 2.6.13-rc6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #1 Fri Aug 12 12:31:25 BST 2005 ... libata version 1.11 loaded. sata_sil version 0.9 ACPI: PCI Interrupt :00:0a.0[A] - GSI 18 (level, low) - IRQ 177 ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: no device found (phy stat ) scsi1 : sata_sil Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 sata_via version 1.1 ACPI: PCI Interrupt :00:0f.0[B] - Link [ALKA] - GSI 20 (level, low) - IRQ 169 PCI: Via IRQ fixup for :00:0f.0, from 11 to 9 sata_via(:00:0f.0): routed to hard irq line 9 ata3: SATA max UDMA/133 cmd 0xB400 ctl 0xB802 bmdma 0xC400 irq 169 ata4: SATA max UDMA/133 cmd 0xBC00 ctl 0xC002 bmdma 0xC408 irq 169 ata3: no device found (phy stat ) scsi2 : sata_via ata4: no device found (phy stat ) scsi3 : sata_via SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 I forgot to mention previously but I even tried with noapic nolapic acpi=off pci=routeirq and got the same trouble. This is weird as ST3250823AS (and all Seagate .8 drives) are known to work without any problem with sii 3112/3114. I currently don't own such a drive but someone confirmed me that ST3250823AS works w/ sii 3114 without any problem (including bonnie++ results and all). So, I don't think it's the good old mod15write problem. I hope it's just a bad hardware, cable or something like that; otherwise, you're hitting a new bug. Can you verify if the drive works under windows? Well, what piqued my interest is that the same drives work fine on my on-board sata_via controller. All 4 drives were bought at the same time and *seem* to be from the same batch, and all work fine on the VIA controller and none work on the 3112A. I've also tried different cables, all of which are Belkin which I thought were decent quality. I'll just try installing Winblows and let you know. I just installed Windows XP SP2 and Cygwin: $ dd if=/dev/zero of=test.img bs=1M count=4096 4096+0
Re: SiI 3112A + Seagate HDs = still no go?
Hi there, I get very different symptoms indeed. My drive isn't in the blacklist, and adding it has little effect (status 0xd9 to 0xd8, no other differences). Once the controller hangs, I can't even kill dd or login at a different terminal, just a complete lockup. If I have 2 drives plugged in, running the dd on one of them also hangs the other, thus I suspect the controller. Also, reading via dd is fine, only writing has trouble. Chris On 12 Aug 2005, at 16:19, Roger Heflin wrote: With the Segate sata's I worked with before, I had to actually remove them from the blacklist, this was a couple of months ago with the native sata seagate disks. With the drive in the blacklist the drive worked right under light conditions, but under a dd read from the boot seagate the entire machine appeared to block on any io going to that disk, it did not stop (verified by vmstat), but I could never get the 55-60MiB/second expected, and was getting around 15MiB/second, with enormous amounts of interrupts, after removing it from the blacklist, I got the 55-60MiB/second rate, and the interrupts were much more reasonable, and the response of the system was actually useable.When the lockup occurred, stopping the dd resulting in all things unlocking and continuing on, I duplicated this several times with the latest kernel at the time. Roger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Boot Sent: Thursday, August 11, 2005 4:55 PM To: linux-kernel@vger.kernel.org Subject: SiI 3112A + Seagate HDs = still no go? Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/linux/kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris The following messages are sent to the log when everything goes mad: ata1: command 0x35 timeout, stat 0xd8 host_stat 0x0 ata1: status=0xd8 { Busy } SCSI error : 0 0 0 0 return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 2990370 ATA: abnormal status 0xD8 on port E0802087 ATA: abnormal status 0xD8 on port E0802087 ATA: abnormal status 0xD8 on port E0802087 [ the above is transcribed so may not be 100% accurate ] Dmesg log during boot (and detection): Aug 11 21:47:05 arcadia Linux version 2.6.12-gentoo-r6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #2 Thu Aug 11 20:19:00 BST 2005 ... Aug 11 17:30:12 arcadia sata_sil version 0.9 Aug 11 17:30:12 arcadia ACPI: PCI Interrupt :00:0a.0[A] - GSI 18 (level, low) - IRQ 177 Aug 11 17:30:12 arcadia ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 Aug 11 17:30:12 arcadia ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 Aug 11 17:30:12 arcadia ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f Aug 11 17:30:12 arcadia ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 Aug 11 17:30:12 arcadia ata1(0): applying Seagate errata fix Aug 11 17:30:12 arcadia ata1: dev 0 configured for UDMA/100 Aug 11 17:30:12 arcadia scsi0 : sata_sil Aug 11 17:30:12 arcadia ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f Aug 11 17:30:12 arcadia ata2: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 Aug 11 17:30:12 arcadia ata2(0): applying Seagate errata fix Aug 11 17:30:12 arcadia ata2: dev 0 configured for UDMA/100 Aug 11 17:30:12 arcadia scsi1 : sata_sil Aug 11 17:30:12 arcadia Vendor: ATA Model: ST3250823AS Rev: 3.03 Aug 11 17:30:12 arcadia Type: Direct-Access ANSI SCSI revision: 05 Aug 11 17:30:12 arcadia Vendor: ATA Model: ST3250823AS Rev: 3.03 Aug 11 17:30:12 arcadia Type: Direct-Access ANSI SCSI revision: 05 lspci: :00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge :00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge :00:0a.0 Unknown mass storage controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02) :00:0c.0 FireWire (IEEE 1394
Re: SiI 3112A + Seagate HDs = still no go?
On 12 Aug 2005, at 15:08, Tejun Heo wrote: Chris Boot wrote: Hi Tejun, On 12 Aug 2005, at 12:33, Chris Boot wrote: Hi Tejun, On 12 Aug 2005, at 12:28, Tejun Heo wrote: Hello, Chris. Chris Boot wrote: On 12 Aug 2005, at 4:24, Tejun Heo wrote: Chris Boot wrote: Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http:// www.ussg.iu.edu/hypermail/ linux/ kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris [added linux-ide to cc list] Can you please try w/ vanilla kernel (2.6.12 or 2.6.13-rc)? And w/ one drive only? I unplugged both drives from my on-board SATA controller and left just one connected to the 3112A controller. Rebooted with a fresh, vanilla 2.6.13-rc6 and ran: You can leave drives on on-board SATA controller. It wouldn't make any difference. dd if=/dev/zero of=test.img bs=1M count=16384 After about 30 seconds I got the crash and the kernel started repeating every 30 seconds (with different sector numbers): ata1: command 0x35 timeout, stat 0xd9 host_stat 0x1 ata1: status=0xd9 { Busy } SCSI error : 0 0 0 0 return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 14937602 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 ATA: abnormal status 0xD9 on port E0802087 dmesg: Linux version 2.6.13-rc6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #1 Fri Aug 12 12:31:25 BST 2005 ... libata version 1.11 loaded. sata_sil version 0.9 ACPI: PCI Interrupt :00:0a.0[A] - GSI 18 (level, low) - IRQ 177 ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: no device found (phy stat ) scsi1 : sata_sil Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 sata_via version 1.1 ACPI: PCI Interrupt :00:0f.0[B] - Link [ALKA] - GSI 20 (level, low) - IRQ 169 PCI: Via IRQ fixup for :00:0f.0, from 11 to 9 sata_via(:00:0f.0): routed to hard irq line 9 ata3: SATA max UDMA/133 cmd 0xB400 ctl 0xB802 bmdma 0xC400 irq 169 ata4: SATA max UDMA/133 cmd 0xBC00 ctl 0xC002 bmdma 0xC408 irq 169 ata3: no device found (phy stat ) scsi2 : sata_via ata4: no device found (phy stat ) scsi3 : sata_via SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 I forgot to mention previously but I even tried with noapic nolapic acpi=off pci=routeirq and got the same trouble. This is weird as ST3250823AS (and all Seagate .8 drives) are known to work without any problem with sii 3112/3114. I currently don't own such a drive but someone confirmed me that ST3250823AS works w/ sii 3114 without any problem (including bonnie++ results and all). So, I don't think it's the good old mod15write problem. I hope it's just a bad hardware, cable or something like that; otherwise, you're hitting a new bug. Can you verify if the drive works under windows? Well, what piqued my interest is that the same drives work fine on my on-board sata_via controller. All 4 drives were bought at the same time and *seem* to be from the same batch, and all work fine on the VIA controller and none work on the 3112A. I've also tried different cables, all of which are Belkin which I thought were decent quality. I'll just try installing Winblows
SiI 3112A + Seagate HDs = still no go?
Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/linux/kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris The following messages are sent to the log when everything goes mad: ata1: command 0x35 timeout, stat 0xd8 host_stat 0x0 ata1: status=0xd8 { Busy } SCSI error : <0 0 0 0> return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 2990370 ATA: abnormal status 0xD8 on port E0802087 ATA: abnormal status 0xD8 on port E0802087 ATA: abnormal status 0xD8 on port E0802087 [ the above is transcribed so may not be 100% accurate ] Dmesg log during boot (and detection): Aug 11 21:47:05 arcadia Linux version 2.6.12-gentoo-r6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #2 Thu Aug 11 20:19:00 BST 2005 ... Aug 11 17:30:12 arcadia sata_sil version 0.9 Aug 11 17:30:12 arcadia ACPI: PCI Interrupt :00:0a.0[A] -> GSI 18 (level, low) -> IRQ 177 Aug 11 17:30:12 arcadia ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 Aug 11 17:30:12 arcadia ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 Aug 11 17:30:12 arcadia ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f Aug 11 17:30:12 arcadia ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 Aug 11 17:30:12 arcadia ata1(0): applying Seagate errata fix Aug 11 17:30:12 arcadia ata1: dev 0 configured for UDMA/100 Aug 11 17:30:12 arcadia scsi0 : sata_sil Aug 11 17:30:12 arcadia ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f Aug 11 17:30:12 arcadia ata2: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 Aug 11 17:30:12 arcadia ata2(0): applying Seagate errata fix Aug 11 17:30:12 arcadia ata2: dev 0 configured for UDMA/100 Aug 11 17:30:12 arcadia scsi1 : sata_sil Aug 11 17:30:12 arcadia Vendor: ATA Model: ST3250823AS Rev: 3.03 Aug 11 17:30:12 arcadia Type: Direct-Access ANSI SCSI revision: 05 Aug 11 17:30:12 arcadia Vendor: ATA Model: ST3250823AS Rev: 3.03 Aug 11 17:30:12 arcadia Type: Direct-Access ANSI SCSI revision: 05 lspci: :00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge :00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge :00:0a.0 Unknown mass storage controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02) :00:0c.0 FireWire (IEEE 1394): Agere Systems (former Lucent Microelectronics) FW323 (rev 61) :00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) :00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/ VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) :00:10.0 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 81) :00:10.1 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 81) :00:10.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 81) :00:10.3 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 81) :00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) :00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] :00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60) :00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78) :01:00.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 MX/MX 400] (rev b2) Many thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SiI 3112A + Seagate HDs = still no go?
Hi all, I just recently took the plunge and bought 4 250 GB Seagate drives and a 2 port Silicon Image 3112A controller card for the 2 drives my motherboard doesn't handle. No matter how hard I try, I can't get the hard drives to work: they are detected correctly and work reasonably well under _very_ light load, but anything like building a RAID array is a bit much and the whole controller seems to lock up. I've tried adding the drive to the blacklist in the sata_sil.c driver and I still have the same trouble: as you can see the messages below relate to my patched kernel with the blacklist fix. I've seen that this was discussed just yesterday, but that seemed to give nothing: http://www.ussg.iu.edu/hypermail/linux/kernel/0508.1/0310.html Ready and willing to hack my kernel to pieces; this machine is no use until I get all the drives working! Needless to say the drives connected to the on-board VIA controller work fine, as do the drives currently on the SiI controller if I swap them around. Any ideas? TIA Chris The following messages are sent to the log when everything goes mad: ata1: command 0x35 timeout, stat 0xd8 host_stat 0x0 ata1: status=0xd8 { Busy } SCSI error : 0 0 0 0 return code = 0x8002 sda: Current: sense key=0xb ASC=0x47 ASCQ=0x0 end_request: I/O error, dev sda, sector 2990370 ATA: abnormal status 0xD8 on port E0802087 ATA: abnormal status 0xD8 on port E0802087 ATA: abnormal status 0xD8 on port E0802087 [ the above is transcribed so may not be 100% accurate ] Dmesg log during boot (and detection): Aug 11 21:47:05 arcadia Linux version 2.6.12-gentoo-r6 ([EMAIL PROTECTED]) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #2 Thu Aug 11 20:19:00 BST 2005 ... Aug 11 17:30:12 arcadia sata_sil version 0.9 Aug 11 17:30:12 arcadia ACPI: PCI Interrupt :00:0a.0[A] - GSI 18 (level, low) - IRQ 177 Aug 11 17:30:12 arcadia ata1: SATA max UDMA/100 cmd 0xE0802080 ctl 0xE080208A bmdma 0xE0802000 irq 177 Aug 11 17:30:12 arcadia ata2: SATA max UDMA/100 cmd 0xE08020C0 ctl 0xE08020CA bmdma 0xE0802008 irq 177 Aug 11 17:30:12 arcadia ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f Aug 11 17:30:12 arcadia ata1: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 Aug 11 17:30:12 arcadia ata1(0): applying Seagate errata fix Aug 11 17:30:12 arcadia ata1: dev 0 configured for UDMA/100 Aug 11 17:30:12 arcadia scsi0 : sata_sil Aug 11 17:30:12 arcadia ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023 88:207f Aug 11 17:30:12 arcadia ata2: dev 0 ATA, max UDMA/133, 488397168 sectors: lba48 Aug 11 17:30:12 arcadia ata2(0): applying Seagate errata fix Aug 11 17:30:12 arcadia ata2: dev 0 configured for UDMA/100 Aug 11 17:30:12 arcadia scsi1 : sata_sil Aug 11 17:30:12 arcadia Vendor: ATA Model: ST3250823AS Rev: 3.03 Aug 11 17:30:12 arcadia Type: Direct-Access ANSI SCSI revision: 05 Aug 11 17:30:12 arcadia Vendor: ATA Model: ST3250823AS Rev: 3.03 Aug 11 17:30:12 arcadia Type: Direct-Access ANSI SCSI revision: 05 lspci: :00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge :00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge :00:0a.0 Unknown mass storage controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02) :00:0c.0 FireWire (IEEE 1394): Agere Systems (former Lucent Microelectronics) FW323 (rev 61) :00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) :00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/ VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) :00:10.0 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 81) :00:10.1 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 81) :00:10.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 81) :00:10.3 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 81) :00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) :00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] :00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60) :00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78) :01:00.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 MX/MX 400] (rev b2) Many thanks, Chris -- Chris Boot [EMAIL PROTECTED] http://www.bootc.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cosmetic JFFS patch.
Hi, > Many new Linux users go through an extended period of dual-booting. And many users also have to sleep in the same room as their computers (still live w/ parents or are in college) and the fans bother them, so they turn them off every night. Just my 2 eurocents. -- Chris Boot [EMAIL PROTECTED] "use the source, luke." (obi-wan gnuobi) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Cosmetic JFFS patch.
Hi, Many new Linux users go through an extended period of dual-booting. And many users also have to sleep in the same room as their computers (still live w/ parents or are in college) and the fans bother them, so they turn them off every night. Just my 2 eurocents. -- Chris Boot [EMAIL PROTECTED] use the source, luke. (obi-wan gnuobi) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, > I haven't encountered any CPU with builtin temperature sensors. Well, I've got an Apple iMac (tee hee hee) with a PowerPC G3 (or 750 for you number guys). I know for sure that all of the G3 / G4 chips have temperature sensors built onto the CPU core. Mine's showing 23 degrees Celsius at the moment. >> This thread keeps going and going and going... > > and going, and going . and still going . and going, and going, and going... -- .-. Chris Boot /v\ [EMAIL PROTECTED] // \\ /( )\L I N U X ^^-^^>Phear the Penguin< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, I haven't encountered any CPU with builtin temperature sensors. Well, I've got an Apple iMac (tee hee hee) with a PowerPC G3 (or 750 for you number guys). I know for sure that all of the G3 / G4 chips have temperature sensors built onto the CPU core. Mine's showing 23 degrees Celsius at the moment. This thread keeps going and going and going... and going, and going . and still going . and going, and going, and going... -- .-. Chris Boot /v\ [EMAIL PROTECTED] // \\ /( )\L I N U X ^^-^^Phear the Penguin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, > Only the truly stupid would assume accuracy from decimal places. Well then, tell all the teachers in this world that they're stupid, and tell everyone who learnt from them as well. I'm in high school (gd. 11, junior) and my physics teacher is always screaming at us for putting too many decimal places or having them inconsistent. There are certain situations where adding a ±1 is too cumbersome and / or clumsy, so you can specify the accuracy using just decimal places. For example, 5.00 would mean pretty much spot on 5 (anywhere from 4.995 to 5.00499), wheras 5 could mean anywhere from 4.5 to 5.499. Please, let's quit this dumb argument. We all know that thermistors and other types of cheap temperature gauges are very inaccurate, and I don't think expensive thermocouples will make it into computer sensors very soon. Plus, who the hell could care whether their chip is at 45.4 or 45.5 degrees? Does it really matter? A difference of 0.1 will not decide whether your chip will fry. Just my 2 eurocents. -- Chris Boot [EMAIL PROTECTED] DOS Computers manufactured by companies such as IBM, Compaq, Tandy, and millions of others are by far the most popular, with about 70 million machines in use worldwide. Macintosh fans, on the other hand, may note that cockroaches are far more numerous than humans, and that numbers alone do not denote a higher life form. New York Times, November 26, 1991 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, > Then you must have blown your quantum finals. Royally. ESPECIALLY > after that statement about "temperature is nothing but the movement of > pieces of materie". Not even close, once you get into the quant. > > Mathematically and quantum mechanically, negative absolute > temperatures do exist. In quantum mechanics, temperature is expressed as > probability populations in various quantum states. Excuse me, but I don't think that we can get computer temperature sensors as we know them to measure temperatures of matter in quantum states. Even if, one day, we built a usable quantum computer which might need temperature measurements, I doubt that the Linux kernel would run on it without being totally rewritten. Anyhow, I like the discussion. I love anything to do with quantum physics! -- Chris Boot [EMAIL PROTECTED] #define QUESTION ((2b) || (!2b)) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, Then you must have blown your quantum finals. Royally. ESPECIALLY after that statement about temperature is nothing but the movement of pieces of materie. Not even close, once you get into the quant. Mathematically and quantum mechanically, negative absolute temperatures do exist. In quantum mechanics, temperature is expressed as probability populations in various quantum states. Excuse me, but I don't think that we can get computer temperature sensors as we know them to measure temperatures of matter in quantum states. Even if, one day, we built a usable quantum computer which might need temperature measurements, I doubt that the Linux kernel would run on it without being totally rewritten. Anyhow, I like the discussion. I love anything to do with quantum physics! -- Chris Boot [EMAIL PROTECTED] #define QUESTION ((2b) || (!2b)) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, Only the truly stupid would assume accuracy from decimal places. Well then, tell all the teachers in this world that they're stupid, and tell everyone who learnt from them as well. I'm in high school (gd. 11, junior) and my physics teacher is always screaming at us for putting too many decimal places or having them inconsistent. There are certain situations where adding a ±1 is too cumbersome and / or clumsy, so you can specify the accuracy using just decimal places. For example, 5.00 would mean pretty much spot on 5 (anywhere from 4.995 to 5.00499), wheras 5 could mean anywhere from 4.5 to 5.499. Please, let's quit this dumb argument. We all know that thermistors and other types of cheap temperature gauges are very inaccurate, and I don't think expensive thermocouples will make it into computer sensors very soon. Plus, who the hell could care whether their chip is at 45.4 or 45.5 degrees? Does it really matter? A difference of 0.1 will not decide whether your chip will fry. Just my 2 eurocents. -- Chris Boot [EMAIL PROTECTED] DOS Computers manufactured by companies such as IBM, Compaq, Tandy, and millions of others are by far the most popular, with about 70 million machines in use worldwide. Macintosh fans, on the other hand, may note that cockroaches are far more numerous than humans, and that numbers alone do not denote a higher life form. New York Times, November 26, 1991 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, >>> Kelvins good idea in general - it is always positive ;-) >>> >>> 0.01*K fits in 16 bits and gives reasonable range. >>> >>> but may be something like K<<6 could be a option? (to allow use of shifts >>> instead of muls/divs). It would be much more easier to extract int part. >>> >>> just my 2 eurocents. >> >> Why not make it in Celsius ? Is more easy to read it this way. > > It's easier for you as a user to read, but slightly harder to deal with inside > the code. > It's really a user-space issue, inside the kernel should be as standardized as > possible, and > Kelvins make the most sense there. OK, I think by now we've all agreed the following: - The issue is NOT displaying temperatures to the user, but a userspace program reading them from the kernel. The userspace program itself can do temperature conversions for the user if he/she wants. - The most preferable units would be decikelvins, as the value can give a relatively precise as well as wide range of numbers ranging from absolute zero to about 6340 degrees Celsius ((65535 / 10) - 273) which is well within anything that a computer can operate. It also gives us a good base for all sorts of other temperature sensing devices. Do we all agree on those now? -- Chris Boot [EMAIL PROTECTED] #define QUESTION ((2b) || (!2b)) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, Kelvins good idea in general - it is always positive ;-) 0.01*K fits in 16 bits and gives reasonable range. but may be something like K6 could be a option? (to allow use of shifts instead of muls/divs). It would be much more easier to extract int part. just my 2 eurocents. Why not make it in Celsius ? Is more easy to read it this way. It's easier for you as a user to read, but slightly harder to deal with inside the code. It's really a user-space issue, inside the kernel should be as standardized as possible, and Kelvins make the most sense there. OK, I think by now we've all agreed the following: - The issue is NOT displaying temperatures to the user, but a userspace program reading them from the kernel. The userspace program itself can do temperature conversions for the user if he/she wants. - The most preferable units would be decikelvins, as the value can give a relatively precise as well as wide range of numbers ranging from absolute zero to about 6340 degrees Celsius ((65535 / 10) - 273) which is well within anything that a computer can operate. It also gives us a good base for all sorts of other temperature sensing devices. Do we all agree on those now? -- Chris Boot [EMAIL PROTECTED] #define QUESTION ((2b) || (!2b)) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, > Please, don't. > > Use kelvins *0.1, and use them consistently everywhere. This is what > ACPI does, and it is probably right. I'm sorry, by I don't feel like adding 273 to every number I get just to find the temperature of something. What I would do is give configuration options to choose the default (Celsius/centigrade, Kelvin, or [shudder] Fahrenheit) then, when you need to print or output a temperature, send it off to a common converter function so you don't repeat core all over the place. Just my 0.02 Eurocents (what an ugly word). -- Chris Boot [EMAIL PROTECTED] "Modem error handling really su~c%dk,s.^D^D@R*cCKo#?CB,*o#?C!!b%o#? NO CARRIER - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: temperature standard - global config option?
Hi, Please, don't. Use kelvins *0.1, and use them consistently everywhere. This is what ACPI does, and it is probably right. I'm sorry, by I don't feel like adding 273 to every number I get just to find the temperature of something. What I would do is give configuration options to choose the default (Celsius/centigrade, Kelvin, or [shudder] Fahrenheit) then, when you need to print or output a temperature, send it off to a common converter function so you don't repeat core all over the place. Just my 0.02 Eurocents (what an ugly word). -- Chris Boot [EMAIL PROTECTED] Modem error handling really su~c%dk,s.^D^Dx@R*cCKo#?CB,*o#?C!!b%o#? NO CARRIER - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/