libsas: correctly flush the LU queue on error recovery
The current sas_scsi_clear_queue_lu() is wrongly checking for commands which match the pointer to the one passed in. It should be checking for commands which are on the same logical unit as the one passed in. Fix this by checking target pointer and LUN for equality. Signed-off-by: James Bottomley <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index b796d99..57ba99f 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -282,7 +282,8 @@ static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct scsi_cmnd struct scsi_cmnd *cmd, *n; list_for_each_entry_safe(cmd, n, error_q, eh_entry) { - if (cmd == my_cmd) + if (cmd->device->sdev_target == my_cmd->device->sdev_target && + cmd->device->lun == my_cmd->device->lun) sas_eh_finish_cmd(cmd); } } - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
aic94xx: fix sequencer hang on error recovery
The clear nexus I_T and clear nexus I_T_L functions in the aic94xx specify the SUSPEND_TX flag which causes the sequencer to be suspended until it receives a RESUME_TX. Unfortunately, nothing ever sends the resume, so the sequencer on the link is stopped forever, leading to eventual timeouts and I/O errors. Since clear nexus commands are only executed as part of error recovery, it's perfectly fine to keep the sequencer running on the link ... as soon as the recovery function is completed, we'll send it the commands to retry. Signed-off-by: James Bottomley <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/aic94xx/aic94xx_tmf.c b/drivers/scsi/aic94xx/aic94xx_tmf.c index b52124f..144f5ad 100644 --- a/drivers/scsi/aic94xx/aic94xx_tmf.c +++ b/drivers/scsi/aic94xx/aic94xx_tmf.c @@ -151,8 +151,6 @@ static int asd_clear_nexus_I_T(struct domain_device *dev) CLEAR_NEXUS_PRE; scb->clear_nexus.nexus = NEXUS_I_T; scb->clear_nexus.flags = SEND_Q | EXEC_Q | NOTINQ; - if (dev->tproto) - scb->clear_nexus.flags |= SUSPEND_TX; scb->clear_nexus.conn_handle = cpu_to_le16((u16)(unsigned long) dev->lldd_dev); CLEAR_NEXUS_POST; @@ -169,8 +167,6 @@ static int asd_clear_nexus_I_T_L(struct domain_device *dev, u8 *lun) CLEAR_NEXUS_PRE; scb->clear_nexus.nexus = NEXUS_I_T_L; scb->clear_nexus.flags = SEND_Q | EXEC_Q | NOTINQ; - if (dev->tproto) - scb->clear_nexus.flags |= SUSPEND_TX; memcpy(scb->clear_nexus.ssp_task.lun, lun, 8); scb->clear_nexus.conn_handle = cpu_to_le16((u16)(unsigned long) dev->lldd_dev); - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New 2.6.24.2 SG_IO SCSI problems
> I attached a backport of the patch from Tony (added as cc) that is in > 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it > was this patch, then we can send it to stable. > Yes, I had wanted to send this patch to -stable, but got distracted with other bugs. So please do so, by all means. Tony - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New 2.6.24.2 SG_IO SCSI problems
Mark Hounschell wrote: Mark Hounschell wrote: Mike Christie wrote: Mike Christie wrote: Mark Hounschell wrote: I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine. During reads I get these kernel messages. Writes and other functions _seem_ OK. Actually basic reads are working. Its with large BC reads using an io_vec list that the problem shows up. Are you doing SG_IO to the sg device (/dev/sg*) or to the block device (/dev/sdX)? If you are doing SG_IO to the sg device, then I know of one regression (well not regression exactly, but I fixed a bug but the patch got partially overwritten by another patch and that caused a new bug). Both bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing SG_IO to the sg device. Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC. Thanks Mark - 2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a patch lying around for 2.6.24.2?? I attached a backport of the patch from Tony (added as cc) that is in 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it was this patch, then we can send it to stable. Backport 76d78300a6eb8b7f08e47703b7e68a659ffc2053 to 2.6.24 >From Tony Battersby: When sending a SCSI command to a tape drive via the SCSI Generic (sg) driver, if the command has a data transfer length more than scatter_elem_sz (32 KB default) and not a multiple of 512, then I either hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else the command never completes (depending on the LLDD). When constructing scatterlists, the sg driver rounds up the scatterlist element sizes to be a multiple of 512. This can result in sum(scatterlist lengths) > bufflen. In this case, scsi_req_map_sg() incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to bufflen. When the command completes, req_bio_endio() detects that bio->bi_size != 0, and so it doesn't call bio_endio(). This causes the command to be resubmitted, resulting in BUG_ON or the command never completing. This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather than to sum(scatterlist lengths), which fixes the problem. Signed-off-by: Mike Christie <[EMAIL PROTECTED]> --- linux-2.6.24.2/drivers/scsi/scsi_lib.c 2008-02-10 23:51:11.0 -0600 +++ linux-2.6.24.2.work/drivers/scsi/scsi_lib.c 2008-02-22 16:20:09.0 -0600 @@ -298,7 +298,6 @@ static int scsi_req_map_sg(struct reques page = sg_page(sg); off = sg->offset; len = sg->length; - data_len += len; while (len > 0 && data_len > 0) { /*
Re: New 2.6.24.2 SG_IO SCSI problems
Mark Hounschell wrote: > Mike Christie wrote: >> Mike Christie wrote: >>> Mark Hounschell wrote: I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine. During reads I get these kernel messages. Writes and other functions _seem_ OK. Actually basic reads are working. Its with large BC reads using an io_vec list that the problem shows up. >>> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device >>> (/dev/sdX)? >> If you are doing SG_IO to the sg device, then I know of one regression >> (well not regression exactly, but I fixed a bug but the patch got >> partially overwritten by another patch and that caused a new bug). Both >> bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing >> SG_IO to the sg device. >> > > Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC. > > Thanks > Mark > - 2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a patch lying around for 2.6.24.2?? Thanks Mark - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libsas: fix error handling
On Wed, 2008-02-20 at 10:07 -0600, James Bottomley wrote: > The libsas error handler has two fairly fatal bugs A bug in this one was picked up in testing. There's still a double completion race from the time a task is aborted to the time the eh actually starts. If the task returns in that interval it will get completed twice, so we key off the SAS_TASK_STATE_ABORTED to prevent this (or at least narrow the race window to being almost untriggerable). James --- Subject: [SCSI] libsas: fix error handling The libsas error handler has two fairly fatal bugs 1. scsi_sas_task_done calls scsi_eh_finish_cmd() too early. This happens if the task completes after it has been aborted but before the error handler starts up. Because scsi_eh_finish_cmd() decrements host_failed and adds the task to the done list, the error handler start check (host_failed == host_busy) never passes and the eh never starts. 2. The multiple task completion paths sas_scsi_clear_queue_... all simply delete the task from the error queue. This causes it to disappear into the ether, since a command must be placed on the done queue to be finished off by the error handler. This behaviour causes the HBA to hang on pending commands. Fix 1. by moving the SAS_TASK_STATE_ABORTED check to an exit clause at the top of the routine and calling ->scsi_done() unconditionally (it is a nop if the timer has fired). This keeps the task in the error handling queue until the eh starts. Fix 2. by making sure every task goes through task complete followed by scsi_eh_finish_cmd(). Tested this by firing resets across a disk running a hammer test (now it actually survives without hanging the system) Signed-off-by: James Bottomley <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_scsi_host.c | 65 ++- 1 files changed, 41 insertions(+), 24 deletions(-) diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index 583d249..b796d99 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -53,10 +53,14 @@ static void sas_scsi_task_done(struct sas_task *task) { struct task_status_struct *ts = &task->task_status; struct scsi_cmnd *sc = task->uldd_task; - struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(sc->device->host); - unsigned ts_flags = task->task_state_flags; int hs = 0, stat = 0; + if (unlikely(task->task_state_flags & SAS_TASK_STATE_ABORTED)) { + /* Aborted tasks will be completed by the error handler */ + SAS_DPRINTK("task done but aborted\n"); + return; + } + if (unlikely(!sc)) { SAS_DPRINTK("task_done called with non existing SCSI cmnd!\n"); list_del_init(&task->list); @@ -122,11 +126,7 @@ static void sas_scsi_task_done(struct sas_task *task) sc->result = (hs << 16) | stat; list_del_init(&task->list); sas_free_task(task); - /* This is very ugly but this is how SCSI Core works. */ - if (ts_flags & SAS_TASK_STATE_ABORTED) - scsi_eh_finish_cmd(sc, &sas_ha->eh_done_q); - else - sc->scsi_done(sc); + sc->scsi_done(sc); } static enum task_attribute sas_scsi_get_task_attr(struct scsi_cmnd *cmd) @@ -257,13 +257,33 @@ out: return res; } +static void sas_eh_finish_cmd(struct scsi_cmnd *cmd) +{ + struct sas_task *task = TO_SAS_TASK(cmd); + struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(cmd->device->host); + + /* remove the aborted task flag to allow the task to be +* completed now. At this point, we only get called following +* an actual abort of the task, so we should be guaranteed not +* to be racing with any completions from the LLD (hence we +* don't need the task state lock to clear the flag) */ + task->task_state_flags &= ~SAS_TASK_STATE_ABORTED; + /* Now call task_done. However, task will be free'd after +* this */ + task->task_done(task); + /* now finish the command and move it on to the error +* handler done list, this also takes it off the +* error handler pending list */ + scsi_eh_finish_cmd(cmd, &sas_ha->eh_done_q); +} + static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct scsi_cmnd *my_cmd) { struct scsi_cmnd *cmd, *n; list_for_each_entry_safe(cmd, n, error_q, eh_entry) { if (cmd == my_cmd) - list_del_init(&cmd->eh_entry); + sas_eh_finish_cmd(cmd); } } @@ -276,7 +296,7 @@ static void sas_scsi_clear_queue_I_T(struct list_head *error_q, struct domain_device *x = cmd_to_domain_dev(cmd); if (x == dev) - list_del_init(&cmd->eh_entry); + sas_eh_finish_cmd(cmd); } } @@ -290,7 +310,7 @@ static void sas_scsi_clear
RE: [PATCH 1/2] stex: stex_direct_copy shouldn't call dma_map_sg
>-Original Message- >From: fujita [mailto:[EMAIL PROTECTED] On Behalf Of FUJITA Tomonori >Sent: Friday, February 22, 2008 6:11 AM >To: linux-scsi@vger.kernel.org >Cc: FUJITA Tomonori; Ed Lin; James Bottomley >Subject: [PATCH 1/2] stex: stex_direct_copy shouldn't call dma_map_sg > > >stex_direct_copy copies an in-kernel buffer to a sg list in order to >spoof some SCSI commands. stex_direct_copy calls dma_map_sg and then >stex_internal_copy with the value that dma_map_sg returned. It calls >scsi_kmap_atomic_sg to copy data. > >scsi_kmap_atomic_sg doesn't see sg->dma_length so if dma_map_sg merges >sg entries, stex_internal_copy gets the smaller number of sg entries >than the acutual number, which means it wrongly think that the data >length in the sg list is shorter than the actual length. > >stex_direct_copy shouldn't call dma_map_sg and it doesn't need since >this code path doesn't involve dma transfers. This patch removes >stex_direct_copy and simply calls stex_internal_copy with the actual >number of sg entries. > >Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> >Cc: Ed Lin <[EMAIL PROTECTED]> >Cc: James Bottomley <[EMAIL PROTECTED]> >--- > drivers/scsi/stex.c | 34 -- > 1 files changed, 12 insertions(+), 22 deletions(-) > >diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c >index 72f6d80..4b6861c 100644 >--- a/drivers/scsi/stex.c >+++ b/drivers/scsi/stex.c >@@ -461,23 +461,6 @@ static void stex_internal_copy(struct >scsi_cmnd *cmd, > } > } > >-static int stex_direct_copy(struct scsi_cmnd *cmd, >- const void *src, size_t count) >-{ >- size_t cp_len = count; >- int n_elem = 0; >- >- n_elem = scsi_dma_map(cmd); >- if (n_elem < 0) >- return 0; >- >- stex_internal_copy(cmd, src, &cp_len, n_elem, ST_TO_CMD); >- >- scsi_dma_unmap(cmd); >- >- return cp_len == count; >-} >- > static void stex_controller_info(struct st_hba *hba, struct >st_ccb *ccb) > { > struct st_frame *p; >@@ -569,8 +552,10 @@ stex_queuecommand(struct scsi_cmnd *cmd, >void (* done)(struct scsi_cmnd *)) > unsigned char page; > page = cmd->cmnd[2] & 0x3f; > if (page == 0x8 || page == 0x3f) { >- stex_direct_copy(cmd, ms10_caching_page, >- sizeof(ms10_caching_page)); >+ size_t cp_len = sizeof(ms10_caching_page); >+ stex_internal_copy(cmd, ms10_caching_page, >+ &cp_len, scsi_sg_count(cmd), >+ ST_TO_CMD); > cmd->result = DID_OK << 16 | >COMMAND_COMPLETE << 8; > done(cmd); > } else >@@ -599,8 +584,10 @@ stex_queuecommand(struct scsi_cmnd *cmd, >void (* done)(struct scsi_cmnd *)) > if (id != host->max_id - 1) > break; > if (lun == 0 && (cmd->cmnd[1] & INQUIRY_EVPD) == 0) { >- stex_direct_copy(cmd, console_inq_page, >- sizeof(console_inq_page)); >+ size_t cp_len = sizeof(console_inq_page); >+ stex_internal_copy(cmd, console_inq_page, >+ &cp_len, scsi_sg_count(cmd), >+ ST_TO_CMD); > cmd->result = DID_OK << 16 | >COMMAND_COMPLETE << 8; > done(cmd); > } else >@@ -609,6 +596,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, >void (* done)(struct scsi_cmnd *)) > case PASSTHRU_CMD: > if (cmd->cmnd[1] == PASSTHRU_GET_DRVVER) { > struct st_drvver ver; >+ size_t cp_len = sizeof(ver); > ver.major = ST_VER_MAJOR; > ver.minor = ST_VER_MINOR; > ver.oem = ST_OEM; >@@ -616,7 +604,9 @@ stex_queuecommand(struct scsi_cmnd *cmd, >void (* done)(struct scsi_cmnd *)) > ver.signature[0] = PASSTHRU_SIGNATURE; > ver.console_id = host->max_id - 1; > ver.host_no = hba->host->host_no; >- cmd->result = stex_direct_copy(cmd, >&ver, sizeof(ver)) ? >+ stex_internal_copy(cmd, &ver, &cp_len, >+ scsi_sg_count(cmd), >ST_TO_CMD); >+ cmd->result = sizeof(ver) == cp_len ? > DID_OK << 16 | COMMAND_COMPLETE << 8 : > DID_ERROR << 16 | COMMAND_COMPLETE << 8; > done(cmd); >-- >1.5.3.4 > > ACK patches 1-2. I didn't think dma_map_sg could change the sg count. And clearly it's unneeded if we can get sg count by scsi_sg_count. Thanks for the patches. Ed Lin - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message t
[PATCH] SCSI st: compile fix when DEBUG set to one
Remove the now useless counting of adjacent pages from the debugging code in to make it compile when DEBUG is set non-zero. Signed-off-by: Kai Makisara <[EMAIL PROTECTED]> --- The patch does not touch any code compiled when DEBUG is zero and is suitable for 2.5.25. drivers/scsi/st.c | 11 --- drivers/scsi/st.h |1 - 2 files changed, 4 insertions(+), 8 deletions(-) Index: linux-2.6.25-rc2-q/drivers/scsi/st.c === --- linux-2.6.25-rc2-q.orig/drivers/scsi/st.c +++ linux-2.6.25-rc2-q/drivers/scsi/st.c @@ -17,7 +17,7 @@ Last modified: 18-JAN-1998 Richard Gooch <[EMAIL PROTECTED]> Devfs support */ -static const char *verstr = "20080117"; +static const char *verstr = "20080221"; #include @@ -1172,7 +1172,7 @@ static int st_open(struct inode *inode, STp->try_dio_now = STp->try_dio; STp->recover_count = 0; DEB( STp->nbr_waits = STp->nbr_finished = 0; -STp->nbr_requests = STp->nbr_dio = STp->nbr_pages = STp->nbr_combinable = 0; ) +STp->nbr_requests = STp->nbr_dio = STp->nbr_pages = 0; ) retval = check_tape(STp, filp); if (retval < 0) @@ -1226,8 +1226,8 @@ static int st_flush(struct file *filp, f } DEBC( if (STp->nbr_requests) - printk(KERN_DEBUG "%s: Number of r/w requests %d, dio used in %d, pages %d (%d).\n", - name, STp->nbr_requests, STp->nbr_dio, STp->nbr_pages, STp->nbr_combinable)); + printk(KERN_DEBUG "%s: Number of r/w requests %d, dio used in %d, pages %d.\n", + name, STp->nbr_requests, STp->nbr_dio, STp->nbr_pages)); if (STps->rw == ST_WRITING && !STp->pos_unknown) { struct st_cmdstatus *cmdstatp = &STp->buffer->cmdstat; @@ -1422,9 +1422,6 @@ static int setup_buffering(struct scsi_t if (STbp->do_dio) { STp->nbr_dio++; STp->nbr_pages += STbp->do_dio; - for (i=1; i < STbp->do_dio; i++) - if (page_to_pfn(STbp->sg[i].page) == page_to_pfn(STbp->sg[i-1].page) + 1) - STp->nbr_combinable++; } ) } else Index: linux-2.6.25-rc2-q/drivers/scsi/st.h === --- linux-2.6.25-rc2-q.orig/drivers/scsi/st.h +++ linux-2.6.25-rc2-q/drivers/scsi/st.h @@ -164,7 +164,6 @@ struct scsi_tape { int nbr_requests; int nbr_dio; int nbr_pages; - int nbr_combinable; unsigned char last_cmnd[6]; unsigned char last_sense[16]; #endif - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New 2.6.24.2 SG_IO SCSI problems
Mike Christie wrote: > Mike Christie wrote: >> Mark Hounschell wrote: >>> I seem to have run into some sort of regression in the SG_IO >>> interface of 2.6.24.2. I have an application that up until 2.6.24 >>> worked fine. The 2.6.23.16 kernel works fine. >>> >>> During reads I get these kernel messages. Writes and other functions >>> _seem_ OK. Actually basic >>> reads are working. Its with large BC reads using an io_vec list that >>> the problem shows up. >>> >> >> Are you doing SG_IO to the sg device (/dev/sg*) or to the block device >> (/dev/sdX)? > > If you are doing SG_IO to the sg device, then I know of one regression > (well not regression exactly, but I fixed a bug but the patch got > partially overwritten by another patch and that caused a new bug). Both > bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing > SG_IO to the sg device. > Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC. Thanks Mark - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New 2.6.24.2 SG_IO SCSI problems
Mike Christie wrote: Mark Hounschell wrote: I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine. During reads I get these kernel messages. Writes and other functions _seem_ OK. Actually basic reads are working. Its with large BC reads using an io_vec list that the problem shows up. Are you doing SG_IO to the sg device (/dev/sg*) or to the block device (/dev/sdX)? If you are doing SG_IO to the sg device, then I know of one regression (well not regression exactly, but I fixed a bug but the patch got partially overwritten by another patch and that caused a new bug). Both bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing SG_IO to the sg device. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New 2.6.24.2 SG_IO SCSI problems
Mark Hounschell wrote: I seem to have run into some sort of regression in the SG_IO interface of 2.6.24.2. I have an application that up until 2.6.24 worked fine. The 2.6.23.16 kernel works fine. During reads I get these kernel messages. Writes and other functions _seem_ OK. Actually basic reads are working. Its with large BC reads using an io_vec list that the problem shows up. Are you doing SG_IO to the sg device (/dev/sg*) or to the block device (/dev/sdX)? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Marvell 6440 SAS/SATA driver
On Fri, 2008-02-22 at 11:26 -0500, Jeff Garzik wrote: > 4) [major] Your email was encoded in base64, which makes it difficult > for automated tools to handle, and difficult for some mail clients to > view and reply-to. Yes, I echo this, there's also another problem it causes: The email didn't actually get through the scsi reflector in part, I suspect, because of the base64 content (anybody who wasn't on the direct to or cc list is only seeing Jeff's reply). James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info
On Fri, 2008-02-22 at 09:17 -0700, Yang, Bo wrote: > James, > > What is the status for this patch? We need to submit more patches based > on the acceptance of this patch. OK, read the thread; Matthew is right. What you propose would pretty much destroy compat ioctl handling within the driver. You need a compat handler for MEGASAS_IOC_FW. With your current patch you'd get a failure both from a 64 bit binary running on x86-64 and if someone ran the x86 binary on ia64. The driver already uses the compat infrastructure, it shouldn't be too hard to add this in the correct manner. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info
On Fri, 2008-02-22 at 09:17 -0700, Yang, Bo wrote: > James, > > What is the status for this patch? We need to submit more patches based > on the acceptance of this patch. Sorry, I didn't actually get this message because that was in the window when my email server dropped off the internet. You'll have to give me time to dig up the threads on marc.info and read them. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Marvell 6440 SAS/SATA driver
Ke Wei wrote: Added support for Expander. Based on version 0.1 for mvsas. Signed-off-by: Ke Wei <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/mvsas.c b/drivers/scsi/mvsas.c old mode 100644 new mode 100755 index 03638b9..3c7a154 --- a/drivers/scsi/mvsas.c +++ b/drivers/scsi/mvsas.c @@ -2,6 +2,7 @@ mvsas.c - Marvell 88SE6440 SAS/SATA support Copyright 2007 Red Hat, Inc. + Copyright 2008 Marvell. <[EMAIL PROTECTED]> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as @@ -25,6 +26,13 @@ structures. this permits elimination of all the le32_to_cpu() and cpu_to_le32() conversions. + Changelog: + 2008-02-22 0.5 Added support for Expander. + 2008-02-05 0.4 Added support for hotplug and wide port. + 2008-01-22 0.3 Added support for SAS HD and SATA Devices. + 2008-01-09 0.2 detect SAS disk. + 2007-09-25 0.1 rough draft, Initial version. + */ #include Technical content: looks good, ACK Patch content: looks diff'd against correct version, ACK But we still have one major process problem, and a couple minor problems to fix: 1) [minor] please do not include a changelog in the source code. That's what the git repository history is for. 2) [minor] Your patch description (email body) is incorrect. It should describe all changes since version 0.1, the version you diff'd against: Convert rough draft Marvell 6440 driver to a working driver. Added support for SAS and SATA devices, hotplug, wide port, and expanders. 3) [minor] Your email subject should reflect that you are updating version 0.1, the version you diff'd against: [PATCH] mvsas: convert from rough draft to working driver 4) [major] Your email was encoded in base64, which makes it difficult for automated tools to handle, and difficult for some mail clients to view and reply-to. It will require some email configuration on your part to disable this, and send the email as a text/plain message. I've copied Saeed Bishara @ Marvell on this email. Saeed has been successfully sending patch for the sata_mv driver (5040, 6080, 6042, etc.) Maybe Saeed can advise you on his email setup? In any case, once we fix this last problem -- base64 -- we can finally apply your patch and get things moving. You are very close to having a working Linux kernel development setup, thanks for your patience! Jeff - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info
On Fri, Feb 22, 2008 at 09:17:33AM -0700, Yang, Bo wrote: > James, > > What is the status for this patch? We need to submit more patches based > on the acceptance of this patch. I've NAKed it. You need to use compat_ioctl. That way your code will work even if you run a 32-bit x86 application on ia64 (or a 64-bit application on x86). -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info
James, What is the status for this patch? We need to submit more patches based on the acceptance of this patch. Thanks. Bo Yang -Original Message- From: Yang, Bo Sent: Thursday, November 29, 2007 1:24 PM To: 'Matthew Wilcox' Cc: linux-scsi@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Patro, Sumant; Kolli, Neela Subject: RE: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info Matthew, I think my last email confused you. Here are the points for submitting this patch: 1. It is only to add the ia64 support to our driver. This is why we add the ia64 flag. 2. We did not change the implementation for x86 and x86_64. The implementation for them with u32 * is there from day one. 3. What I wanted to convey in my previous mail is that if one uses a 64 bit application (s)he may need to test for all functionality again. Regards. Bo Yang -Original Message- From: Matthew Wilcox [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 28, 2007 2:27 PM To: Yang, Bo Cc: linux-scsi@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Patro, Sumant; Kolli, Neela Subject: Re: [PATCH 1/1] scsi: megaraid_sas - Fix random failure of DCDB cmds with sense info On Wed, Nov 28, 2007 at 12:08:37PM -0700, Yang, Bo wrote: > Matthew, > > Yes, as I mentioned, our applications are built in 32-bit environment > (except for ia64). > We may see a different behavior if we build it in 64-bit for x86-64. Then this patch isn't acceptable. You need to find a way which works for both 32-bit and 64-bit applications. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Actually using the sg table/chain code
James Bottomley wrote: Yes, I can buy the argument for filesystem I/Os. What about tapes which currently use the block queue and have internal home grown stuff to handle larger transfers ... how are they supposed to set the larger default sector size? Just modify the bare q->max_sectors? Sorry for the late response. I have been doing userspace stuff and not keeping up with linux-scsi :( For scsi tape and passthrough (any place we use REQ_TYPE_BLOCK_PC like with st or sg or block/scsi_ioctl or bsg), the block/bio/scatterlist building code ignores q->max_sectors and uses q->max_hw_sectors. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] ips: sg chaining support to the path to non I/O commands
On Tue, 19 Feb 2008 04:39:00 -0800 "Salyzyn, Mark" <[EMAIL PROTECTED]> wrote: > ACK Thanks! > Other RAID drivers (eg: aacraid) makes the assumption that commands > in these paths (INQUIRY, READ CAPACITY, MODE SENSE etc spoofing) are > single scatter gather elements and have yet to be bitten. I agree > with Fujita-san about the practical unlikelihood. The fix does not > incur any change in code overhead, so it is worth hardening! > > Can one create a request via /dev/sg for a buffer smaller than 256 > and manage to create a multi-element scatter gather? I think that we can do post 2.6.24 since we relaxed the default alignment requirements (from 511 to 3). So a buffer more than 8 bytes can leads to multi-element scatter gathers though it rarely happens. Shortly I will submit the helper functions to copy data between sg list and a buffer. It can replace aac_internal_transfer but it's not for scsi-rc-fixes. If you worry that aac_internal_transfer can't handle multiple sg entries, you can do something like this, I think: diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c index ae5f74f..73fa7c2 100644 --- a/drivers/scsi/aacraid/linit.c +++ b/drivers/scsi/aacraid/linit.c @@ -455,6 +455,12 @@ static int aac_slave_configure(struct scsi_device *sdev) return 0; } +static int aac_slave_alloc(struct scsi_device *sdev) +{ + blk_queue_update_dma_alignment(sdev->request_queue, (512 - 1)); + return 0; +} + /** * aac_change_queue_depth - alter queue depths * @sdev: SCSI device we are considering @@ -1015,6 +1021,7 @@ static struct scsi_host_template aac_driver_template = { .queuecommand = aac_queuecommand, .bios_param = aac_biosparm, .shost_attrs= aac_attrs, + .slave_alloc= aac_slave_alloc, .slave_configure= aac_slave_configure, .change_queue_depth = aac_change_queue_depth, .sdev_attrs = aac_dev_attrs, = Here's a malicious version of sg_inq, which tries to create multiple sg entries: = #include #include #include #include #include #include #include #include #include #define INQ_REPLY_LEN 96 #define INQ_CMD_LEN 6 int main(int argc, char **argv) { struct sg_io_hdr hdr; unsigned char scb[INQ_CMD_LEN] = {0x12, 0, 0, 0, INQ_REPLY_LEN, 0}; unsigned char sense[32]; void *buf; int offset = 4096 - 4; int fd, ret; buf = valloc(8192); memset(&hdr, 0, sizeof(hdr)); hdr.interface_id = 'S'; hdr.cmd_len = sizeof(scb); hdr.mx_sb_len = sizeof(sense); hdr.dxfer_direction = SG_DXFER_FROM_DEV; hdr.dxfer_len = INQ_REPLY_LEN; hdr.dxferp = buf + offset; hdr.cmdp = scb; hdr.sbp = sense; hdr.flags |= SG_FLAG_DIRECT_IO; fd = open(argv[1], O_RDONLY); if (fd < 0) { fprintf(stderr, "fail to open %s\n", argv[1]); return fd; } ret = ioctl(fd, SG_IO, &hdr); if (ret < 0) { fprintf(stderr, "fail to ioctl %d\n", ret); return ret; } return ret; } - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] stex: stex_direct_copy shouldn't call dma_map_sg
stex_direct_copy copies an in-kernel buffer to a sg list in order to spoof some SCSI commands. stex_direct_copy calls dma_map_sg and then stex_internal_copy with the value that dma_map_sg returned. It calls scsi_kmap_atomic_sg to copy data. scsi_kmap_atomic_sg doesn't see sg->dma_length so if dma_map_sg merges sg entries, stex_internal_copy gets the smaller number of sg entries than the acutual number, which means it wrongly think that the data length in the sg list is shorter than the actual length. stex_direct_copy shouldn't call dma_map_sg and it doesn't need since this code path doesn't involve dma transfers. This patch removes stex_direct_copy and simply calls stex_internal_copy with the actual number of sg entries. Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> Cc: Ed Lin <[EMAIL PROTECTED]> Cc: James Bottomley <[EMAIL PROTECTED]> --- drivers/scsi/stex.c | 34 -- 1 files changed, 12 insertions(+), 22 deletions(-) diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 72f6d80..4b6861c 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -461,23 +461,6 @@ static void stex_internal_copy(struct scsi_cmnd *cmd, } } -static int stex_direct_copy(struct scsi_cmnd *cmd, - const void *src, size_t count) -{ - size_t cp_len = count; - int n_elem = 0; - - n_elem = scsi_dma_map(cmd); - if (n_elem < 0) - return 0; - - stex_internal_copy(cmd, src, &cp_len, n_elem, ST_TO_CMD); - - scsi_dma_unmap(cmd); - - return cp_len == count; -} - static void stex_controller_info(struct st_hba *hba, struct st_ccb *ccb) { struct st_frame *p; @@ -569,8 +552,10 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* done)(struct scsi_cmnd *)) unsigned char page; page = cmd->cmnd[2] & 0x3f; if (page == 0x8 || page == 0x3f) { - stex_direct_copy(cmd, ms10_caching_page, - sizeof(ms10_caching_page)); + size_t cp_len = sizeof(ms10_caching_page); + stex_internal_copy(cmd, ms10_caching_page, + &cp_len, scsi_sg_count(cmd), + ST_TO_CMD); cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8; done(cmd); } else @@ -599,8 +584,10 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* done)(struct scsi_cmnd *)) if (id != host->max_id - 1) break; if (lun == 0 && (cmd->cmnd[1] & INQUIRY_EVPD) == 0) { - stex_direct_copy(cmd, console_inq_page, - sizeof(console_inq_page)); + size_t cp_len = sizeof(console_inq_page); + stex_internal_copy(cmd, console_inq_page, + &cp_len, scsi_sg_count(cmd), + ST_TO_CMD); cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8; done(cmd); } else @@ -609,6 +596,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* done)(struct scsi_cmnd *)) case PASSTHRU_CMD: if (cmd->cmnd[1] == PASSTHRU_GET_DRVVER) { struct st_drvver ver; + size_t cp_len = sizeof(ver); ver.major = ST_VER_MAJOR; ver.minor = ST_VER_MINOR; ver.oem = ST_OEM; @@ -616,7 +604,9 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* done)(struct scsi_cmnd *)) ver.signature[0] = PASSTHRU_SIGNATURE; ver.console_id = host->max_id - 1; ver.host_no = hba->host->host_no; - cmd->result = stex_direct_copy(cmd, &ver, sizeof(ver)) ? + stex_internal_copy(cmd, &ver, &cp_len, + scsi_sg_count(cmd), ST_TO_CMD); + cmd->result = sizeof(ver) == cp_len ? DID_OK << 16 | COMMAND_COMPLETE << 8 : DID_ERROR << 16 | COMMAND_COMPLETE << 8; done(cmd); -- 1.5.3.4 - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] stex: stex_internal_copy should be called with sg_count in struct st_ccb
stex_internal_copy copies an in-kernel buffer to a sg list by using scsi_kmap_atomic_sg. Some functions calls stex_internal_copy with sg_count in struct st_ccb, which is the value that dma_map_sg returned. However it might be shorter than the actual number of sg entries (if the IOMMU merged the sg entries). scsi_kmap_atomic_sg doesn't see sg->dma_length so stex_internal_copy should be called with the actual number of sg entries (i.e. scsi_sg_count), because if the sg entries were merged, stex_direct_copy wrongly think that the data length in the sg list is shorter than the actual length. Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> Cc: Ed Lin <[EMAIL PROTECTED]> Cc: James Bottomley <[EMAIL PROTECTED]> --- drivers/scsi/stex.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c index 4b6861c..654430e 100644 --- a/drivers/scsi/stex.c +++ b/drivers/scsi/stex.c @@ -467,7 +467,8 @@ static void stex_controller_info(struct st_hba *hba, struct st_ccb *ccb) size_t count = sizeof(struct st_frame); p = hba->copy_buffer; - stex_internal_copy(ccb->cmd, p, &count, ccb->sg_count, ST_FROM_CMD); + stex_internal_copy(ccb->cmd, p, &count, scsi_sg_count(ccb->cmd), + ST_FROM_CMD); memset(p->base, 0, sizeof(u32)*6); *(unsigned long *)(p->base) = pci_resource_start(hba->pdev, 0); p->rom_addr = 0; @@ -485,7 +486,8 @@ static void stex_controller_info(struct st_hba *hba, struct st_ccb *ccb) p->subid = hba->pdev->subsystem_vendor << 16 | hba->pdev->subsystem_device; - stex_internal_copy(ccb->cmd, p, &count, ccb->sg_count, ST_TO_CMD); + stex_internal_copy(ccb->cmd, p, &count, scsi_sg_count(ccb->cmd), + ST_TO_CMD); } static void @@ -699,7 +701,7 @@ static void stex_copy_data(struct st_ccb *ccb, if (ccb->cmd == NULL) return; stex_internal_copy(ccb->cmd, - resp->variable, &count, ccb->sg_count, ST_TO_CMD); + resp->variable, &count, scsi_sg_count(ccb->cmd), ST_TO_CMD); } static void stex_ys_commands(struct st_hba *hba, @@ -724,7 +726,7 @@ static void stex_ys_commands(struct st_hba *hba, count = STEX_EXTRA_SIZE; stex_internal_copy(ccb->cmd, hba->copy_buffer, - &count, ccb->sg_count, ST_FROM_CMD); + &count, scsi_sg_count(ccb->cmd), ST_FROM_CMD); inq_data = (ST_INQ *)hba->copy_buffer; if (inq_data->DeviceTypeQualifier != 0) ccb->srb_status = SRB_STATUS_SELECTION_TIMEOUT; -- 1.5.3.4 - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libsas: fix error handling
--- On Thu, 2/21/08, James Bottomley <[EMAIL PROTECTED]> wrote: > Look, Luben; I don't mind you making an arse of > yourself on the mailing > lists; that's about par for the course nowadays. No, you just did in this thread by pretending to be any kind of authority or have an expertise on how to fix this problem, while in fact to properly address it, one really needs knowledge of the protocol, how the HW works and how the sequencer works, and a protocol trace of this specific problem. And then fixing the problem after it's been fully understood is yet another feat. It's unfortunate that you feel the need to attack me and call me names. You need to self-examine yourself and figure out and fix your hostile attitude. I've mostly stopped exposing your complete and utter incompetence in SCSI infrastructure and left the weed grow, but calling me names on public lists is very unprofessional. Speaking of "arse", a recent example of one of your "pearls" is this: http://marc.info/?l=linux-ide&m=120291905515015&w=2 "However, I'm happy to be proven wrong ... anyone on this thread is welcome to come up with a userland enclosure infrastructure. Once it does everything the in-kernel one does (which is really about the minimal possible set), I'll be glad to erase the in-kernel one." You introduce unnecessary code, the vendors tell you they don't agree with you, you respond with "I'm happy to be proven wrong" in spite of everyone's opinion differing yours. You clearly decide to do what you think is right without listening to vendors or people in the industry. Also, you completely ignore sg_ses(8). There are other examples, this one is just more recent. > However, failing to report serious bugs you knew about in > your code, and > have known about for two years, is tantamount to sabotage > ... especially In 2005, you decided to take a working code out of a GIT tree, change it privately, and then submit it to GIT as new. Since of course GIT history is immutable, there is no way to sync back to the solution I had for users to inspect, evaluate and compare to your derived code, since you took it out of GIT and changed it privately before submitting it back to the kernel (I had a GIT repo with already integrated working code, synced daily to Linus' kernel). Then a lot of people started asking me: "It doesn't work" and "SATA doesn't work", etc, and I had to explain over and over again what's happened. So please do not talk about sabotage. You changed the code quite a bit, to the point where patches between what I maintain and what you decided to fork off and make it your own play-ground were different and patches wouldn't apply cleanly or at all. For this thread, I've looked at the "error handling" code of what's in the kernel for aic94xx and your version of the "libsas" and frankly, I cannot submit patches to something which completely does NOT follow the Sequencer Interface Spec for AIC94xx series of chips. For example, handling of REQ_TASK_ABORT -- its implementation is quite naive and out of spec, i.e. already broken. I cannot submit patches to something that I know is already broken, which had been changed from a per-spec code version. You know, people are still asking me to sync up to kernel x.y.z the SAS/SATL+aic94xx I maintain, because they tell me "have been running it for 2 years on production hardware [i.e. IBM] and no problems and I just need to upgrade the kernel". Will you also attack me for having a SATL as part of my SAS code which I didn't submit to the kernel? Do you know how much code is out there that vendors don't bother to try to integrate because of your attitude such as this? > when you've been reading the bug reports on the mailing > list. I have to > wonder how many more such bugs you've left in the code > for users to > find. I didn't "leave" any bugs. Bugs are just part of the development cycle. You should know this. aic94xx is an old product, most customers have upgraded. The ones who still have IBM servers with this chip, have had the need to upgrade their kernels for various other reasons and since the in-kernel solution has failed them, have requested to run my code. > > BTW, the problem you're "debugging" may > require a protocol > > trace and a sequencer update by Adaptec. > > I think you're fully aware that my test rig consists of > a couple of SAS > cards and drives donated by IBM and two expanders donated > by LSI, plus a bit of equipment scavenged from Intel. No, I'm not aware of this. I have absolutely no interest in what your "test rig" consists of. Here is a question though: why do you even have SAS HW? You don't work for a SAS HW vendor and you don't have expertise in SAS or SCSI in general. Do you have hardware for each and every LLDD that's in the kernel? > I have no access to sophisticated tools like protocol analyzers. See above. You don't need to have access to such things unless you worked for a vendor, and your job is to make your employer's
Re: New 2.6.24.2 SG_IO SCSI problems
Mark Hounschell wrote: > James Bottomley wrote: >> On Thu, 2008-02-21 at 10:15 -0500, Mark Hounschell wrote: >>> I seem to have run into some sort of regression in the SG_IO interface of >>> 2.6.24.2. >>> I have an application that up until 2.6.24 worked fine. The 2.6.23.16 >>> kernel works fine. >>> >>> During reads I get these kernel messages. Writes and other functions _seem_ >>> OK. Actually basic >>> reads are working. Its with large BC reads using an io_vec list that the >>> problem shows up. >>> >>> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): data overrun detected in >>> Data-in phase. Tag == 0x1. >>> Feb 21 09:27:51 harley kernel: (scsi1:A:2:0): Have seen Data Phase. Length >>> = 256. NumSGs = 1. >>> Feb 21 09:27:51 harley kernel: sg[0] - Addr 0x06256100 : Length 256 >> Help me a little here. What was the io_vec and command you sent in to >> produce this? The aic debugging information implies a single element sg >> list for a 256 byte read. >> >> James >> >> >> > > Well, I did no 256 byte xfer at all. My failing io_vec list has 6 elements. > The first 5 are for byte counts of 0xfffc and the last 0x6114. See below. > > This is some debug info from within my app of the commands leading up the > failure: > If you need actual values of the io_vec lists I will need to add some > additional debug > info into the app. I will do if needed. > > The disk BTW is formated at 768 byte sector size. > > The first read has a 2 element io_vec list and reports no error: > ScsiDev_thread_7e00: Read CBD = 0x08 0x00 0x00 0x00 0x01 0x00 > ScsiDev_thread_7e00: Read1(1) bc = 0x0078 addr = 0x00 Skip 0 > ScsiDev_thread_7e00: Read(2) bc = 0x0288 addr = 0xb6cea368 Skip 1 > ScsiDev_thread_7e00: SRead DC ops = 2 short_bc = 288 > > The second read has a 4 element io_vec list and reports no error: > ScsiDev_thread_7e00: Read CBD = 0x08 0x00 0x00 0x00 0x05 0x00 > ScsiDev_thread_7e00: Read1(1) bc = 0x0780 addr = 0x00 Skip 0 > ScsiDev_thread_7e00: Read1(2) bc = 0x0670 addr = 0x00 Skip 1 > ScsiDev_thread_7e00: Read1(3) bc = 0x003c addr = 0x000200 Skip 0 > ScsiDev_thread_7e00: SRead(4) bc = 0x022c addr = 0xb6cea368 Skip 1 > ScsiDev_thread_7e00: Read DC ops = 4 short_bc = 22c > > There is a seek here that reports no error: > ScsiDev_thread_7e00: Seek address 2752 BPS 768 > > > This read has a 6 element io_vec list and reports the error to the app: > ScsiDev_thread_7e00: ReadX CBD = 0x28 0x00 0x00 0x00 0x27 0x52 0x00 0x01 0xcb > 0x00 > ScsiDev_thread_7e00: Read1(1) bc = 0xfffc addr = 0x000784 Skip 0 > ScsiDev_thread_7e00: Read1(2) bc = 0xfffc addr = 0x010780 Skip 0 > ScsiDev_thread_7e00: Read1(3) bc = 0xfffc addr = 0x02077c Skip 0 > ScsiDev_thread_7e00: Read1(4) bc = 0xfffc addr = 0x030778 Skip 0 > ScsiDev_thread_7e00: Read1(5) bc = 0xfffc addr = 0x040774 Skip 0 > ScsiDev_thread_7e00: Read1(6) bc = 0x6114 addr = 0x050770 Skip 0 > ScsiDev_thread_7e00: Read DC ops = 6 short_bc = 0 > > ScsiDev_thread_7e00: scsi = 0x0 msg 0x0 host = 0x0007 driver = 0x > ScsiDev_thread_7e00: Read error: sns = 0x00 residual = 0x > ScsiDev_thread_7e00: Posting IPL status 0x0090 0x000e for Suba > to loc 0x00 > ScsiDev_thread_7e00: Sleeping!! > > > Below is the complete dump of kernel messages for the above. I assume they are > a result of the last failing read but there is an awfull lot there just for > that 6 element > io_vec list. Sorry to put all this in there but wanted to you to get the idea. > > > Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): data overrun detected in > Data-in phase. Tag == 0x1. > Feb 21 10:51:03 harley kernel: (scsi1:A:2:0): Have seen Data Phase. Length = > 256. NumSGs = 1. > Feb 21 10:51:03 harley kernel: sg[0] - Addr 0x03252e100 : Length 256 . . Snip . > Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): data overrun detected in > Data-in phase. Tag == 0x1. > Feb 21 10:51:49 harley kernel: (scsi1:A:2:0): Have seen Data Phase. Length = > 256. NumSGs = 1. > Feb 21 10:51:49 harley kernel: sg[0] - Addr 0x0f156100 : Length 256 > > Again, sorry for all that. > FWIW, on a different machine doing the same thing I get only a single kernel message and NOT the tons shown above. Regards Mark - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LSI Logic MegaRAID SATA 150-4 / LSI Logic New Generation RAID Device Drivers (MEGARAID_NEWGEN) problems (megaraid abort: scsi cmd:14600, do now own)
(cc's added) On Mon, 18 Feb 2008 21:09:22 -0500 "David M. Strang" <[EMAIL PROTECTED]> wrote: > Greetings - > > A couple months back I purchased a LSI Logic MegaRAID ATA 150-4 > controller, as well as 3 Seagate 500GB SATA-II hard drives to use in my > system. Previously, I was using a pair of WD4000YR's in software raid, > which seemed to work well. I've just not gotten around to working on > migrating my data to these new drivers + controller, and it's giving me > some issues. As with most, I'm having some severe performance issues, > the performance is simply abysmal. Before getting into the details, here > is a quick overview of my configuration: > > System: > Tyan Tiger i7320/R (S5350) System Board > 2x Intel Xeon 3.0 GHz > 4GB RAM > > LSI Logic MegaRAID ATA 150-4 controller - Firmware Revision: 713S > 3x Seagate 7200.10 (Perpendicular Recording) ST3500630AS 500GB SATA-II > drives configured as a RAID-1 array with a HotSpare. > > Also, connected to the onboard controller is a WD4000YR, where all of my > data currently resides. > > I'm running Gentoo Hardended AMD64 MultiLib > (/usr/portage/profiles/hardened/amd64/multilib) > > My current kernel revision is 2.6.23-hardened-r7. > > Here are some (possibly) relevant snippets from dmesg during startup: > > ... > megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006) > megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006) > megaraid: probe new device 0x1000:0x1960:0x1000:0x4523: bus 3:slot 3:func 0 > ACPI: PCI Interrupt :03:03.0[A] -> GSI 24 (level, low) -> IRQ 24 > megaraid: fw version:[713S] bios version:[G121] > scsi0 : LSI Logic MegaRAID driver > scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices > scsi[0]: scanning scsi channel 1 [virtual] for logical drives > scsi 0:1:0:0: Direct-Access MegaRAID LD 0 RAID1 476G 713S PQ: 0 ANSI: 2 > sd 0:1:0:0: [sda] 976762880 512-byte hardware sectors (500103 MB) > sd 0:1:0:0: [sda] Write Protect is off > sd 0:1:0:0: [sda] Mode Sense: 00 00 00 00 > sd 0:1:0:0: [sda] Asking for cache data failed > sd 0:1:0:0: [sda] Assuming drive cache: write through > sd 0:1:0:0: [sda] 976762880 512-byte hardware sectors (500103 MB) > sd 0:1:0:0: [sda] Write Protect is off > sd 0:1:0:0: [sda] Mode Sense: 00 00 00 00 > sd 0:1:0:0: [sda] Asking for cache data failed > sd 0:1:0:0: [sda] Assuming drive cache: write through > sda: sda1 sda2 sda3 sda4 > sd 0:1:0:0: [sda] Attached SCSI disk > ata_piix :00:1f.2: version 2.12 > ata_piix :00:1f.2: MAP [ P0 -- P1 -- ] > ACPI: PCI Interrupt :00:1f.2[A] -> GSI 18 (level, low) -> IRQ 18 > PCI: Setting latency timer of device :00:1f.2 to 64 > scsi1 : ata_piix > scsi2 : ata_piix > ata1: SATA max UDMA/133 cmd 0x000114a0 ctl 0x0001149a > bmdma 0x00011470 irq 18 > ata2: SATA max UDMA/133 cmd 0x00011490 ctl 0x00011486 > bmdma 0x00011478 irq 18 > ata1.00: ATA-7: WDC WD4000YR-01PLB0, 01.06A01, max UDMA/133 > ata1.00: 781422768 sectors, multi 16: LBA48 NCQ (depth 0/32) > ata1.00: configured for UDMA/133 > scsi 1:0:0:0: Direct-Access ATA WDC WD4000YR-01P 01.0 PQ: 0 ANSI: 5 > sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB) > sd 1:0:0:0: [sdb] Write Protect is off > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB) > sd 1:0:0:0: [sdb] Write Protect is off > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sdb: sdb1 sdb2 sdb3 sdb4 > sd 1:0:0:0: [sdb] Attached SCSI disk > ... > > My controller is configured for Write Back Caching, Adaptive Read Ahead, > and Direct I/O (I've also tried cached I/O but it scared me...) > > The first thing I'm noticing is the horrible performance on the raid > disk, compared to the single standalone hard disk. Here is the output > from hdparm -tT on the single disk: > > -([EMAIL PROTECTED])-(~)- # hdparm -tT /dev/sdb1 > > /dev/sdb1: > Timing cached reads: 1670 MB in 2.00 seconds = 835.00 MB/sec > Timing buffered disk reads: 140 MB in 3.01 seconds = 46.45 MB/sec > > And then, the output from the raid-1 array: > > -([EMAIL PROTECTED])-(~)- # hdparm -tT /dev/sda1 > > /dev/sda1: > Timing cached reads: 1718 MB in 2.00 seconds = 859.65 MB/sec > Timing buffered disk reads: 92 MB in 3.09 seconds = 29.76 MB/sec > > I'm not sure what the deal is with the buffered disk reads being so much > WORSE than a single disk. So poor performance is a concern, but what's > more alarming are the messages showing up in DMESG. When I first tried > Cached IO - performance seemed good... except, dmesg was littered with > these errors (?): > > megaraid: aborting-14610 cmd=2a > megaraid abort: scsi cmd:14610, do now own > megaraid: aborting-14612 cmd=2a > megaraid abort: scsi cmd:14612