Re: [PATCH] enclosure: add support for enclosure services
On Tue, 2008-02-05 at 16:12 -0800, Andrew Morton wrote: > On Sun, 03 Feb 2008 18:16:51 -0600 > James Bottomley <[EMAIL PROTECTED]> wrote: > > > > > From: James Bottomley <[EMAIL PROTECTED]> > > Date: Sun, 3 Feb 2008 15:40:56 -0600 > > Subject: [SCSI] enclosure: add support for enclosure services > > > > The enclosure misc device is really just a library providing sysfs > > support for physical enclosure devices and their components. > > > > Thanks for sending it out for review. > > > +struct enclosure_device *enclosure_find(struct device *dev) > > +{ > > + struct enclosure_device *edev = NULL; > > + > > + mutex_lock(&container_list_lock); > > + list_for_each_entry(edev, &container_list, node) { > > + if (edev->cdev.dev == dev) { > > + mutex_unlock(&container_list_lock); > > + return edev; > > + } > > + } > > + mutex_unlock(&container_list_lock); > > + > > + return NULL; > > +} > > +EXPORT_SYMBOL_GPL(enclosure_find); > > This looks a little odd. We don't take a ref on the object after looking > it up, so what prevents some other thread of control from freeing or > otherwise altering the returned object while the caller is playing with it? The use case is for enclosure destruction, so the free should never happen, but I take the point; I've added a class_device_get(). > > +/** > > + * enclosure_for_each_device - calls a function for each enclosure > > + * @fn:the function to call > > + * @data: the data to pass to each call > > + * > > + * Loops over all the enclosures calling the function. > > + * > > + * Note, this function uses a mutex which will be held across calls to > > + * @fn, so it must have user context, and @fn should not sleep or > > Probably "non atomic context" would be more accurate. > > fn() actually _can_ sleep. "should" to me means you don't have to do this but ought to. I'll add a may (but should not). > > + if (!cb) { > > + kfree(edev); > > + return ERR_PTR(-EINVAL); > > + } > > It would be less fuss if this were to test cb before doing the kzalloc(). > > Can cb==NULL actually and legitimately happen? Not really ... I'll make it a BUG_ON. > > +void enclosure_unregister(struct enclosure_device *edev) > > +{ > > + int i; > > + > > + if (!edev) > > + return; > > Is this legal? No ... it'll oops on the null deref later ... I'll remove this. > > + mutex_lock(&container_list_lock); > > + list_del(&edev->node); > > + mutex_unlock(&container_list_lock); > > See, right now, someone who found this enclosure_device via > enclosure_find() could still be playing with it? Yes, fixed. > > + if (!edev || number >= edev->components) > > + return ERR_PTR(-EINVAL); > > Is !edev possible and legitimate? It shouldn't be, no ... I can remove it. > > + snprintf(cdev->class_id, BUS_ID_SIZE, "%d", number); > > %u :) Nitpicker! > > + return snprintf(buf, 40, "%d\n", edev->components); > > +} > > "40"? I just followed precedence ;-P There doesn't seem to be a define for this maximum length, so 40 is the most commonly picked constant. > > +static char *enclosure_type [] = { > > + [ENCLOSURE_COMPONENT_DEVICE] = "device", > > + [ENCLOSURE_COMPONENT_ARRAY_DEVICE] = "array device", > > +}; > > One could play with const here, if sufficiently keen. One will try to summon up the enthusiasm. > > +static ssize_t set_component_fault(struct class_device *cdev, const char > > *buf, > > + size_t count) > > +{ > > + struct enclosure_device *edev = to_enclosure_device(cdev->parent); > > + struct enclosure_component *ecomp = to_enclosure_component(cdev); > > + int val = simple_strtoul(buf, NULL, 0); > > hrm, we do this conversion about 1e99 times in the kernel and we have to go > and pass three args where only one was needed. katoi()? Yes ... I'll add it to the todo list. > > + for (i = 0; enclosure_status[i]; i++) { > > + if (strncmp(buf, enclosure_status[i], > > + strlen(enclosure_status[i])) == 0 && > > + buf[strlen(enclosure_status[i])] == '\n') > > + break; > > + } > > So if an application does > > write(fd, "foo", 3) > > it won't work? Thye have to do > > write(fd, "foo\n", 4) > > ? No ... it's designed for echo; however, I'll add a check for '\0' which will catch the write case. > > +#define to_enclosure_device(x) container_of((x), struct enclosure_device, > > cdev) > > +#define to_enclosure_component(x) container_of((x), struct > > enclosure_component, cdev) > > These could be C functions... OK ... I was just following precedence again, but I can make them inlines. > Nice looking driver. Thanks, James --- Here's the incremental diff. diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c index 42e6e43..6fcb0e9 100644 --- a/drivers/misc/enclosure.c +++ b/drivers/misc/enclosure.c @@ -39,7 +39,8
Re: new scsi sense handling
--- On Tue, 2/5/08, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > > --- On Tue, 2/5/08, FUJITA Tomonori > <[EMAIL PROTECTED]> wrote: > > > On Mon, 4 Feb 2008 18:39:22 -0800 (PST) > > > Luben Tuikov <[EMAIL PROTECTED]> wrote: > > > > > > > --- On Mon, 2/4/08, Boaz Harrosh > > > <[EMAIL PROTECTED]> wrote: > > > > > There are 3 usages of sense handling in > drivers > > > > > > > > > > 1. sense is available in driver > internal > > > structure and is > > > > > mem-copied to upper level > > > > > 2. A CHECK_CONDITION status was > returned and the > > > driver > > > > > uses the scsi_eh_prep_cmnd() > > > > >for a REQUEST_SENSE invocation to > the target. > > > Then > > > > > returning the sense in > > > > >scsi_eh_return_cmnd(). A variation > on this is > > > when the > > > > > driver does nothing the queue > > > > >is frozen an the scsi watchdog timer > does the > > > above. > > > > > 3. The underline host adapter does the > > > REQUEST_SENSE and a > > > > > pre-allocated and DMA mapped > > > > >sense buffer receives the sense > information > > > from HW. > > > > > > > > Many years ago when "ACA" had a > constructive > > > meaning, > > > > so did "Autosense". Then about 5 > years ago, > > > "Autosense" > > > > disappeared completely since it became the > de facto > > > > implementation of the then SCSI Execute > Command > > > "RPC", > > > > now just SCSI Execute Command procedure > call. > > > > > > > > At that point in time, the SCSI mid-layer > decided > > > > to embrace this model and give the LLDD a > scsi command > > > > structure which included the sense data > buffer to > > > > a size that the SCSI mid-layer was > interested in, > > > > at the moment 96 bytes, macro defined in > > > > include/scsi/scsi_cmnd.h. > > > > > > > > The concept of "Autosense" was > off-loaded to > > > LLDD > > > > to emulate it if the specific target device > to > > > > which the command was issued, didn't > supply the > > > > sense data on CHECK CONDITION, and more so > > > > relevant to target devices which implemented > > > > queuing, thus the ACA. > > > > > > > > And the mid-layer would consider extracting > > > > the sense data via REQUEST SENSE command > > > > as a _special case_ if the LLDD/transport > layer > > > > didn't implement the > "autosense" model. > > > > > > Only SPI and USB? > > > > I don't understand this question. > > I meant, 'what transport protocols are categorized into > the transport > protocol that doesn't implement the > "autosense" model?' If any transport protocol conforms to SAM, it supports it. Either emulated in the transport itself or supported by the device (target) itself. But ideally, the SCSI mid-layer shouldn't have to get a CHECK CONDITION and then turn around and send REQUEST SENSE, due to the atomicity (per command) of the sense data, especially if the target supports queuing. There used to be a mechanism to support this in SAM but is now obsolete. > > > > > The most of LLDs using the transport protocol > that we care > > > about today > > > uses sense buffer in their own internal > structure. > > > > Yes. > > > > > > > > I think that the issue to solve to kill > > > scsi_cmnd:sense_buffer is how > > > to share (or export) such sense buffer with the > scsi > > > mid-layer. > > > > And therein lies the problem. Sense data is SCSI > specific, > > it should be left to SCSI, unless of course you can > > stipulate that _all_ block devices return sense data. > > Yeah, sense data is SCSI specific and it should be left to > SCSI. But > I'm not sure we need to stipulate that _all_ block > devices return > sense data. Today the block layer users (sg, bsg, etc) use > it only > when it's appropriate (or only if they want to use it). I agree. Luben - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
On Wed, 2008-02-06 at 10:29 +0900, FUJITA Tomonori wrote: > On Tue, 05 Feb 2008 18:09:15 +0100 > Matteo Tescione <[EMAIL PROTECTED]> wrote: > > > On 5-02-2008 14:38, "FUJITA Tomonori" <[EMAIL PROTECTED]> wrote: > > > > > On Tue, 05 Feb 2008 08:14:01 +0100 > > > Tomasz Chmielewski <[EMAIL PROTECTED]> wrote: > > > > > >> James Bottomley schrieb: > > >> > > >>> These are both features being independently worked on, are they not? > > >>> Even if they weren't, the combination of the size of SCST in kernel plus > > >>> the problem of having to find a migration path for the current STGT > > >>> users still looks to me to involve the greater amount of work. > > >> > > >> I don't want to be mean, but does anyone actually use STGT in > > >> production? Seriously? > > >> > > >> In the latest development version of STGT, it's only possible to stop > > >> the tgtd target daemon using KILL / 9 signal - which also means all > > >> iSCSI initiator connections are corrupted when tgtd target daemon is > > >> started again (kernel upgrade, target daemon upgrade, server reboot > > >> etc.). > > > > > > I don't know what "iSCSI initiator connections are corrupted" > > > mean. But if you reboot a server, how can an iSCSI target > > > implementation keep iSCSI tcp connections? > > > > > > > > >> Imagine you have to reboot all your NFS clients when you reboot your NFS > > >> server. Not only that - your data is probably corrupted, or at least the > > >> filesystem deserves checking... > > The TCP connection will drop, remember that the TCP connection state for one side has completely vanished. Depending on iSCSI/iSER ErrorRecoveryLevel that is set, this will mean: 1) Session Recovery, ERL=0 - Restarting the entire nexus and all connections across all of the possible subnets or comm-links. All outstanding un-StatSN acknowledged commands will be returned back to the SCSI subsystem with RETRY status. Once a single connection has been reestablished to start the nexus, the CDBs will be resent. 2) Connection Recovery, ERL=2 - CDBs from the failed connection(s) will be retried (nothing changes in the PDU) to fill the iSCSI CmdSN ordering gap, or be explictly retried with TMR TASK_REASSIGN for ones already acknowledged by the ExpCmdSN that are returned to the initiator in response packets or by way of unsolicited NopINs. > > Don't know if matters, but in my setup (iscsi on top of drbd+heartbeat) > > rebooting the primary server doesn't affect my iscsi traffic, SCST correctly > > manages stop/crash, by sending unit attention to clients on reconnect. > > Drbd+heartbeat correctly manages those things too. > > Still from an end-user POV, i was able to reboot/survive a crash only with > > SCST, IETD still has reconnect problems and STGT are even worst. > > Please tell us on stgt-devel mailing list if you see problems. We will > try to fix them. > FYI, the LIO code also supports rmmoding iscsi_target_mod while at full 10 Gb/sec speed. I think it should be a requirement to be able to control per initiator, per portal group, per LUN, per device, per HBA in the design without restarting any other objects. --nab > Thanks, > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
On Tue, 2008-02-05 at 16:11 -0800, Nicholas A. Bellinger wrote: > On Tue, 2008-02-05 at 22:21 +0300, Vladislav Bolkhovitin wrote: > > Jeff Garzik wrote: > > >>> iSCSI is way, way too complicated. > > >> > > >> I fully agree. From one side, all that complexity is unavoidable for > > >> case of multiple connections per session, but for the regular case of > > >> one connection per session it must be a lot simpler. > > > > > > Actually, think about those multiple connections... we already had to > > > implement fast-failover (and load bal) SCSI multi-pathing at a higher > > > level. IMO that portion of the protocol is redundant: You need the > > > same capability elsewhere in the OS _anyway_, if you are to support > > > multi-pathing. > > > > I'm thinking about MC/S as about a way to improve performance using > > several physical links. There's no other way, except MC/S, to keep > > commands processing order in that case. So, it's really valuable > > property of iSCSI, although with a limited application. > > > > Vlad > > > > Greetings, > > I have always observed the case with LIO SE/iSCSI target mode (as well > as with other software initiators we can leave out of the discussion for > now, and congrats to the open/iscsi on folks recent release. :-) that > execution core hardware thread and inter-nexus per 1 Gb/sec ethernet > port performance scales up to 4x and 2x core x86_64 very well with > MC/S). I have been seeing 450 MB/sec using 2x socket 4x core x86_64 for > a number of years with MC/S. Using MC/S on 10 Gb/sec (on PCI-X v2.0 > 266mhz as well, which was the first transport that LIO Target ran on > that was able to reach handle duplex ~1200 MB/sec with 3 initiators and > MC/S. In the point to point 10 GB/sec tests on IBM p404 machines, the > initiators where able to reach ~910 MB/sec with MC/S. Open/iSCSI was > able to go a bit faster (~950 MB/sec) because it uses struct sk_buff > directly. > Sorry, these where IBM p505 express (not p404, duh) which had a 2x socket 2x core POWER5 setup. These along with an IBM X-series machine) where the only ones available for PCI-X v2.0, and this probably is still the case. :-) Also, these numbers where with a ~9000 MTU (I don't recall what the hardware limit on the 10 Gb/sec switch lwas) doing direct struct iovec to preallocated struct page mapping for payload on the target side. This is known as RAMDISK_DR plugin in the LIO-SE. On the initiator, LTP disktest and O_DIRECT where used for direct to SCSI block device access. I can big up this paper if anyone is interested. --nab - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
On Tue, 05 Feb 2008 18:09:15 +0100 Matteo Tescione <[EMAIL PROTECTED]> wrote: > On 5-02-2008 14:38, "FUJITA Tomonori" <[EMAIL PROTECTED]> wrote: > > > On Tue, 05 Feb 2008 08:14:01 +0100 > > Tomasz Chmielewski <[EMAIL PROTECTED]> wrote: > > > >> James Bottomley schrieb: > >> > >>> These are both features being independently worked on, are they not? > >>> Even if they weren't, the combination of the size of SCST in kernel plus > >>> the problem of having to find a migration path for the current STGT > >>> users still looks to me to involve the greater amount of work. > >> > >> I don't want to be mean, but does anyone actually use STGT in > >> production? Seriously? > >> > >> In the latest development version of STGT, it's only possible to stop > >> the tgtd target daemon using KILL / 9 signal - which also means all > >> iSCSI initiator connections are corrupted when tgtd target daemon is > >> started again (kernel upgrade, target daemon upgrade, server reboot etc.). > > > > I don't know what "iSCSI initiator connections are corrupted" > > mean. But if you reboot a server, how can an iSCSI target > > implementation keep iSCSI tcp connections? > > > > > >> Imagine you have to reboot all your NFS clients when you reboot your NFS > >> server. Not only that - your data is probably corrupted, or at least the > >> filesystem deserves checking... > > Don't know if matters, but in my setup (iscsi on top of drbd+heartbeat) > rebooting the primary server doesn't affect my iscsi traffic, SCST correctly > manages stop/crash, by sending unit attention to clients on reconnect. > Drbd+heartbeat correctly manages those things too. > Still from an end-user POV, i was able to reboot/survive a crash only with > SCST, IETD still has reconnect problems and STGT are even worst. Please tell us on stgt-devel mailing list if you see problems. We will try to fix them. Thanks, - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: new scsi sense handling
On Tue, 5 Feb 2008 11:43:58 -0800 (PST) Luben Tuikov <[EMAIL PROTECTED]> wrote: > --- On Tue, 2/5/08, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > > On Mon, 4 Feb 2008 18:39:22 -0800 (PST) > > Luben Tuikov <[EMAIL PROTECTED]> wrote: > > > > > --- On Mon, 2/4/08, Boaz Harrosh > > <[EMAIL PROTECTED]> wrote: > > > > There are 3 usages of sense handling in drivers > > > > > > > > 1. sense is available in driver internal > > structure and is > > > > mem-copied to upper level > > > > 2. A CHECK_CONDITION status was returned and the > > driver > > > > uses the scsi_eh_prep_cmnd() > > > >for a REQUEST_SENSE invocation to the target. > > Then > > > > returning the sense in > > > >scsi_eh_return_cmnd(). A variation on this is > > when the > > > > driver does nothing the queue > > > >is frozen an the scsi watchdog timer does the > > above. > > > > 3. The underline host adapter does the > > REQUEST_SENSE and a > > > > pre-allocated and DMA mapped > > > >sense buffer receives the sense information > > from HW. > > > > > > Many years ago when "ACA" had a constructive > > meaning, > > > so did "Autosense". Then about 5 years ago, > > "Autosense" > > > disappeared completely since it became the de facto > > > implementation of the then SCSI Execute Command > > "RPC", > > > now just SCSI Execute Command procedure call. > > > > > > At that point in time, the SCSI mid-layer decided > > > to embrace this model and give the LLDD a scsi command > > > structure which included the sense data buffer to > > > a size that the SCSI mid-layer was interested in, > > > at the moment 96 bytes, macro defined in > > > include/scsi/scsi_cmnd.h. > > > > > > The concept of "Autosense" was off-loaded to > > LLDD > > > to emulate it if the specific target device to > > > which the command was issued, didn't supply the > > > sense data on CHECK CONDITION, and more so > > > relevant to target devices which implemented > > > queuing, thus the ACA. > > > > > > And the mid-layer would consider extracting > > > the sense data via REQUEST SENSE command > > > as a _special case_ if the LLDD/transport layer > > > didn't implement the "autosense" model. > > > > Only SPI and USB? > > I don't understand this question. I meant, 'what transport protocols are categorized into the transport protocol that doesn't implement the "autosense" model?' > > The most of LLDs using the transport protocol that we care > > about today > > uses sense buffer in their own internal structure. > > Yes. > > > > > I think that the issue to solve to kill > > scsi_cmnd:sense_buffer is how > > to share (or export) such sense buffer with the scsi > > mid-layer. > > And therein lies the problem. Sense data is SCSI specific, > it should be left to SCSI, unless of course you can > stipulate that _all_ block devices return sense data. Yeah, sense data is SCSI specific and it should be left to SCSI. But I'm not sure we need to stipulate that _all_ block devices return sense data. Today the block layer users (sg, bsg, etc) use it only when it's appropriate (or only if they want to use it). > If that's not the case and you move it to the block > layer, then you get a whole bunch of other problems, > like does this device want/use it, should we allocate > it, etc. OTOH, if that _is_ the case, then you don't > have to worry about this and the model is pretty > much as the SCSI mid-layer has it, i.e. sense buffer > always present. So I guess the question is, can > you stipulate that _all_ block devices return sense data? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
On Tue, 2008-02-05 at 16:48 -0800, Nicholas A. Bellinger wrote: > On Tue, 2008-02-05 at 22:01 +0300, Vladislav Bolkhovitin wrote: > > Jeff Garzik wrote: > > > Alan Cox wrote: > > > > > >>>better. So for example, I personally suspect that ATA-over-ethernet is > > >>>way > > >>>better than some crazy SCSI-over-TCP crap, but I'm biased for simple and > > >>>low-level, and against those crazy SCSI people to begin with. > > >> > > >>Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP > > >>would probably trash iSCSI for latency if nothing else. > > > > > > > > > AoE is truly a thing of beauty. It has a two/three page RFC (say no > > > more!). > > > > > > But quite so... AoE is limited to MTU size, which really hurts. Can't > > > really do tagged queueing, etc. > > > > > > > > > iSCSI is way, way too complicated. > > > > I fully agree. From one side, all that complexity is unavoidable for > > case of multiple connections per session, but for the regular case of > > one connection per session it must be a lot simpler. > > > > And now think about iSER, which brings iSCSI on the whole new complexity > > level ;) > > Actually, the iSER protocol wire protocol itself is quite simple, > because it builds on iSCSI and IPS fundamentals, and because traditional > iSCSI's recovery logic for CRC failures (and hence alot of > acknowledgement sequence PDUs that go missing, etc) and the RDMA > Capable > Protocol (RCaP). this should be: .. and instead the RDMA Capacle Protocol (RCaP) provides the 32-bit or greater data integrity. --nab - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
On Tue, 2008-02-05 at 22:01 +0300, Vladislav Bolkhovitin wrote: > Jeff Garzik wrote: > > Alan Cox wrote: > > > >>>better. So for example, I personally suspect that ATA-over-ethernet is way > >>>better than some crazy SCSI-over-TCP crap, but I'm biased for simple and > >>>low-level, and against those crazy SCSI people to begin with. > >> > >>Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP > >>would probably trash iSCSI for latency if nothing else. > > > > > > AoE is truly a thing of beauty. It has a two/three page RFC (say no more!). > > > > But quite so... AoE is limited to MTU size, which really hurts. Can't > > really do tagged queueing, etc. > > > > > > iSCSI is way, way too complicated. > > I fully agree. From one side, all that complexity is unavoidable for > case of multiple connections per session, but for the regular case of > one connection per session it must be a lot simpler. > > And now think about iSER, which brings iSCSI on the whole new complexity > level ;) Actually, the iSER protocol wire protocol itself is quite simple, because it builds on iSCSI and IPS fundamentals, and because traditional iSCSI's recovery logic for CRC failures (and hence alot of acknowledgement sequence PDUs that go missing, etc) and the RDMA Capable Protocol (RCaP). The logic that iSER collectively disables is known as within-connection and within-command recovery (negotiated as ErrorRecoveryLevel=1 on the wire), RFC-5046 requires that the iSCSI layer that iSER is being enabled to disable CRC32C checksums and any associated timeouts for ERL=1. Also, have a look at Appendix A. in the iSER spec. A.1. iWARP Message Format for iSER Hello Message ...73 A.2. iWARP Message Format for iSER HelloReply Message ..74 A.3. iWARP Message Format for SCSI Read Command PDU 75 A.4. iWARP Message Format for SCSI Read Data ...76 A.5. iWARP Message Format for SCSI Write Command PDU ...77 A.6. iWARP Message Format for RDMA Read Request 78 A.7. iWARP Message Format for Solicited SCSI Write Data 79 A.8. iWARP Message Format for SCSI Response PDU 80 This is about as 1/2 as many traditional iSCSI PDUs, that iSER encapulates. --nab - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] scsi_dh: Add support for SDEV_PASSIVE
On Tue, 2008-02-05 at 13:56 -0800, Mike Anderson wrote: > Mike Christie <[EMAIL PROTECTED]> wrote: > > When IO is sent to a path that cannot execute IO optimally, the scsi hw > > handler hook for sense processing (see rdac_check_sense in "[PATCH 8/9] > > scsi_dh: add lsi rdac device handler" and the scsi_error.c hook in in > > "scsi_dh: add skeleton for SCSI Device Handlers") will detect this and set > > the state to passive so future IO is not execute on the path > > (SG_IO/passthrough is allowed). > > > > I am not sure about alternatives. If we just exported the port access state > > in sysfs, but did not fail IO from scsi_prep_state_check, then the users > > could still check the state before sending IO. Would it be horrible to > > convert apps to do this? > > The majority of the boot up delays is caused by the kernel partition > scanning and other kernel init code (Chandra please correct if that is not Yes, this is the case. Some level of scanning happens at the rc scripts level too. That can be reduced by what Mikec is suggesting. But, as andmike is suggesting, it won't be a complete solution. > true). Sysfs attributes would not help here. One option maybe to add > handling of the newer BLKERR_ codes in the generators of IO or some > similar solution with a rollout possibly focused at the top generators of are you suggesting the partition scanners (kernel) and lvm(user space scanner) should stop sending I/Os to a passive device once they realize that the device is passive (thru BLKERR_ return codes) ? > IO. > > A number of user apps like lvm scanning that execute media access commands > already have filter capability to filter devices that one does not want to Yes, it will help. But, it will lead to additional instructions to the users which if they do not follow (due to not knowing it or some such) will lead to a delayed boot. IMO, It will be good if it works nicely out of the box. > scan. Another class of device scanners just use inquiries which are not > effected by the passive state (though some could probably use udevinfo and > reduce the amount of repeated SCSI inquiries execute on the system. > > -andmike > -- > Michael Anderson > [EMAIL PROTECTED] -- -- Chandra Seetharaman | Be careful what you choose - [EMAIL PROTECTED] | ...you may get it. -- - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
On Tue, 2008-02-05 at 14:12 -0500, Jeff Garzik wrote: > Vladislav Bolkhovitin wrote: > > Jeff Garzik wrote: > >> iSCSI is way, way too complicated. > > > > I fully agree. From one side, all that complexity is unavoidable for > > case of multiple connections per session, but for the regular case of > > one connection per session it must be a lot simpler. > > > Actually, think about those multiple connections... we already had to > implement fast-failover (and load bal) SCSI multi-pathing at a higher > level. IMO that portion of the protocol is redundant: You need the > same capability elsewhere in the OS _anyway_, if you are to support > multi-pathing. > > Jeff > > Hey Jeff, I put a whitepaper on the LIO cluster recently about this topic.. It is from a few years ago but the datapoints are very relevant. http://linux-iscsi.org/builds/user/nab/Inter.vs.OuterNexus.Multiplexing.pdf The key advantage to MC/S and ERL=2 has always been that they are completely OS independent. They are designed to work together and actually benefit from one another. They are also are protocol independent between Traditional iSCSI and iSER. --nab PS: A great thanks for my former colleague Edward Cheng for putting this together. > > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] enclosure: add support for enclosure services
On Sun, 03 Feb 2008 18:16:51 -0600 James Bottomley <[EMAIL PROTECTED]> wrote: > > From: James Bottomley <[EMAIL PROTECTED]> > Date: Sun, 3 Feb 2008 15:40:56 -0600 > Subject: [SCSI] enclosure: add support for enclosure services > > The enclosure misc device is really just a library providing sysfs > support for physical enclosure devices and their components. > Thanks for sending it out for review. > +struct enclosure_device *enclosure_find(struct device *dev) > +{ > + struct enclosure_device *edev = NULL; > + > + mutex_lock(&container_list_lock); > + list_for_each_entry(edev, &container_list, node) { > + if (edev->cdev.dev == dev) { > + mutex_unlock(&container_list_lock); > + return edev; > + } > + } > + mutex_unlock(&container_list_lock); > + > + return NULL; > +} > +EXPORT_SYMBOL_GPL(enclosure_find); This looks a little odd. We don't take a ref on the object after looking it up, so what prevents some other thread of control from freeing or otherwise altering the returned object while the caller is playing with it? > +/** > + * enclosure_for_each_device - calls a function for each enclosure > + * @fn: the function to call > + * @data:the data to pass to each call > + * > + * Loops over all the enclosures calling the function. > + * > + * Note, this function uses a mutex which will be held across calls to > + * @fn, so it must have user context, and @fn should not sleep or Probably "non atomic context" would be more accurate. fn() actually _can_ sleep. > + * otherwise cause the mutex to be held for indefinite periods > + */ > +int enclosure_for_each_device(int (*fn)(struct enclosure_device *, void *), > + void *data) > +{ > + int error = 0; > + struct enclosure_device *edev; > + > + mutex_lock(&container_list_lock); > + list_for_each_entry(edev, &container_list, node) { > + error = fn(edev, data); > + if (error) > + break; > + } > + mutex_unlock(&container_list_lock); > + > + return error; > +} > +EXPORT_SYMBOL_GPL(enclosure_for_each_device); > + > +/** > + * enclosure_register - register device as an enclosure > + * > + * @dev: device containing the enclosure > + * @components: number of components in the enclosure > + * > + * This sets up the device for being an enclosure. Note that @dev does > + * not have to be a dedicated enclosure device. It may be some other type > + * of device that additionally responds to enclosure services > + */ > +struct enclosure_device * > +enclosure_register(struct device *dev, const char *name, int components, > +struct enclosure_component_callbacks *cb) > +{ > + struct enclosure_device *edev = > + kzalloc(sizeof(struct enclosure_device) + > + sizeof(struct enclosure_component)*components, > + GFP_KERNEL); > + int err, i; > + > + if (!edev) > + return ERR_PTR(-ENOMEM); > + > + if (!cb) { > + kfree(edev); > + return ERR_PTR(-EINVAL); > + } It would be less fuss if this were to test cb before doing the kzalloc(). Can cb==NULL actually and legitimately happen? > + edev->components = components; > + > + edev->cdev.class = &enclosure_class; > + edev->cdev.dev = get_device(dev); > + edev->cb = cb; > + snprintf(edev->cdev.class_id, BUS_ID_SIZE, "%s", name); > + err = class_device_register(&edev->cdev); > + if (err) > + goto err; > + > + for (i = 0; i < components; i++) > + edev->component[i].number = -1; > + > + mutex_lock(&container_list_lock); > + list_add_tail(&edev->node, &container_list); > + mutex_unlock(&container_list_lock); > + > + return edev; > + > + err: > + put_device(edev->cdev.dev); > + kfree(edev); > + return ERR_PTR(err); > +} > +EXPORT_SYMBOL_GPL(enclosure_register); > + > +static struct enclosure_component_callbacks enclosure_null_callbacks; > + > +/** > + * enclosure_unregister - remove an enclosure > + * > + * @edev:the registered enclosure to remove; > + */ > +void enclosure_unregister(struct enclosure_device *edev) > +{ > + int i; > + > + if (!edev) > + return; Is this legal? > + mutex_lock(&container_list_lock); > + list_del(&edev->node); > + mutex_unlock(&container_list_lock); See, right now, someone who found this enclosure_device via enclosure_find() could still be playing with it? > + for (i = 0; i < edev->components; i++) > + if (edev->component[i].number != -1) > + class_device_unregister(&edev->component[i].cdev); > + > + /* prevent any callbacks into service user */ > + edev->cb = &enclosure_null_callbacks; > + class_device_unregister(&edev->cdev); > +} > +EXPORT_SYMBOL_GPL(enclosure_unregister); > + > +/** > + * enclosure_component_
Re: Integration of SCST in the mainstream Linux kernel
On Tue, 2008-02-05 at 22:21 +0300, Vladislav Bolkhovitin wrote: > Jeff Garzik wrote: > >>> iSCSI is way, way too complicated. > >> > >> I fully agree. From one side, all that complexity is unavoidable for > >> case of multiple connections per session, but for the regular case of > >> one connection per session it must be a lot simpler. > > > > Actually, think about those multiple connections... we already had to > > implement fast-failover (and load bal) SCSI multi-pathing at a higher > > level. IMO that portion of the protocol is redundant: You need the > > same capability elsewhere in the OS _anyway_, if you are to support > > multi-pathing. > > I'm thinking about MC/S as about a way to improve performance using > several physical links. There's no other way, except MC/S, to keep > commands processing order in that case. So, it's really valuable > property of iSCSI, although with a limited application. > > Vlad > Greetings, I have always observed the case with LIO SE/iSCSI target mode (as well as with other software initiators we can leave out of the discussion for now, and congrats to the open/iscsi on folks recent release. :-) that execution core hardware thread and inter-nexus per 1 Gb/sec ethernet port performance scales up to 4x and 2x core x86_64 very well with MC/S). I have been seeing 450 MB/sec using 2x socket 4x core x86_64 for a number of years with MC/S. Using MC/S on 10 Gb/sec (on PCI-X v2.0 266mhz as well, which was the first transport that LIO Target ran on that was able to reach handle duplex ~1200 MB/sec with 3 initiators and MC/S. In the point to point 10 GB/sec tests on IBM p404 machines, the initiators where able to reach ~910 MB/sec with MC/S. Open/iSCSI was able to go a bit faster (~950 MB/sec) because it uses struct sk_buff directly. A good rule to keep in mind here while considering performance is that context switching overhead and pipeline <-> bus stalling (along with other legacy OS specific storage stack limitations with BLOCK and VFS with O_DIRECT, et al and I will leave out of the discussion for iSCSI and SE engine target mode) is that a initiator will scale roughly 1/2 as well as a target, given comparable hardware and virsh output. The software target case target case also depends, in great regard in many cases, if we are talking about something something as simple as doing contiguous DMA memory allocations in from a SINGLE kernel thread, and handling direction execution to a storage hardware DMA ring that may have not been allocated in the current kernel thread. In MC/S mode this breaks down to: 1) Sorting logic that handles pre execution statemachine for transport from local RDMA memory and OS specific data buffers. TCP application data buffer, struct sk_buff, or RDMA struct page or SG. This should be generic between iSCSI and iSER. 2) Allocation of said memory buffers to OS subsystem dependent code that can be queued up to these drivers. It breaks down to what you can get drivers and OS subsystem folks to agree to implement, and can be made generic in a Transport / BLOCK / VFS layered storage stack. In the "allocate thread DMA ring and use OS supported software and vendor available hardware" I don't think the kernel space requirement will every completely be able to go away. Without diving into RFC-3720 specifics, the statemachine for MC/S side for memory allocation, login and logout generic to iSCSi and ISER, and ERL=2 recovery. My plan is to post the locations in the LIO code where this has been implemented, and where we where can make this easier, etc. In the early in the development of what eventually became LIO Target code, ERL was broken into separete files and separete function prefixes. iscsi_target_erl0, iscsi_target_erl1 and iscsi_target_erl2. The statemachine for ERL=0 and ERL=2 is pretty simple in RFC-3720 (have a look for those interested in the discussion) 7.1.1. State Descriptions for Initiators and Targets The LIO target code is also pretty simple for this: [EMAIL PROTECTED] target]# wc -l iscsi_target_erl* 1115 iscsi_target_erl0.c 45 iscsi_target_erl0.h 526 iscsi_target_erl0.o 1426 iscsi_target_erl1.c 51 iscsi_target_erl1.h 1253 iscsi_target_erl1.o 605 iscsi_target_erl2.c 45 iscsi_target_erl2.h 447 iscsi_target_erl2.o 5513 total erl1.c is a bit larger than the others because it contains the MC/S statemachine functions. iscsi_target_erl1.c:iscsi_execute_cmd() and iscsi_target_util.c:iscsi_check_received_cmdsn() do most of the work for LIO MC/S state machine. I would probably benefit from being in broken up into say iscsi_target_mcs.c. Note that all of this code is MC/S safe, with the exception of the specific SCSI TMR functions. For the SCSI TMR pieces, I have always hoped to use SCST code for doing this... Most of the login/logout code is done in iscsi_target.c, which is could probably also benefit fot getting broken out... --nab - To unsubscribe from this list: send the l
Re: [PATCH 7/9] scsi_dh: Add support for SDEV_PASSIVE
Mike Christie <[EMAIL PROTECTED]> wrote: > When IO is sent to a path that cannot execute IO optimally, the scsi hw > handler hook for sense processing (see rdac_check_sense in "[PATCH 8/9] > scsi_dh: add lsi rdac device handler" and the scsi_error.c hook in in > "scsi_dh: add skeleton for SCSI Device Handlers") will detect this and set > the state to passive so future IO is not execute on the path > (SG_IO/passthrough is allowed). > > I am not sure about alternatives. If we just exported the port access state > in sysfs, but did not fail IO from scsi_prep_state_check, then the users > could still check the state before sending IO. Would it be horrible to > convert apps to do this? The majority of the boot up delays is caused by the kernel partition scanning and other kernel init code (Chandra please correct if that is not true). Sysfs attributes would not help here. One option maybe to add handling of the newer BLKERR_ codes in the generators of IO or some similar solution with a rollout possibly focused at the top generators of IO. A number of user apps like lvm scanning that execute media access commands already have filter capability to filter devices that one does not want to scan. Another class of device scanners just use inquiries which are not effected by the passive state (though some could probably use udevinfo and reduce the amount of repeated SCSI inquiries execute on the system. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Marvell 6440 SAS/SATA driver
--- On Tue, 2/5/08, Ke Wei <[EMAIL PROTECTED]> wrote: > + for_each_phy(port->wide_port_phymap, no, j, mvi->chip->n_phy) { > + mvs_write_port_cfg_addr(mvi, no, PHYR_WIDE_PORT); > + mvs_write_port_cfg_data(mvi, no , port->wide_port_phymap); > + } else { > + mvs_write_port_cfg_addr(mvi, no, PHYR_WIDE_PORT); > + mvs_write_port_cfg_data(mvi, no , 0); > + } > +} Don't do this. Make the "if" explicit. Since I can see you've taken this verbatim from the SAS code, if "no" means number, then it is "j". "no" is just a temporary register which gets shifted right each iteration and not of much use outside the macro. Also if "__rest" (which you added to the macro) is 0, then nether statement would execute, which is probably not what you want. If "n_phy" means "number of phys", then its usage that you added into the macro is inconsistent. Furthermore it shouldn't be necessary since wide_port_phymap & ~((2^n_phy)-1) must never be true. Luben - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
> > index 4c54954..6aa49e7 100644 > > --- a/drivers/scsi/aic7xxx/Makefile > > +++ b/drivers/scsi/aic7xxx/Makefile > > @@ -44,8 +44,8 @@ clean-files += aic79xx_seq.h aic79xx_reg.h > > aic79xx_reg_print.c > > > > # Dependencies for generated files need to be listed explicitly > > > > -$(addprefix $(src)/,$(aic7xxx-y:.o=.c)): $(obj)/aic7xxx_seq.h > > $(obj)/aic7xxx_reg.h > > -$(addprefix $(src)/,$(aic79xx-y:.o=.c)): $(obj)/aic79xx_seq.h > > $(obj)/aic79xx_reg.h > > +$(addprefix $(src)/,$(aic7xxx-y)): $(obj)/aic7xxx_seq.h > > $(obj)/aic7xxx_reg.h > > +$(addprefix $(src)/,$(aic79xx-y)): $(obj)/aic79xx_seq.h > > $(obj)/aic79xx_reg.h > > OK, I think it's time for me to give up completely on understanding > kbuild. To me this construction looks like you're adding source > directory prefixes to objects ... which can never be satisfied can it, > if the objectas are in the object directory? Or maybe I'm just so damn tired that I should sleep instead of trying to fix this Makefile for 117 time. You are right that it should read: -$(addprefix $(src)/,$(aic7xxx-y:.o=.c)): $(obj)/aic7xxx_seq.h $(obj)/aic7xxx_reg.h -$(addprefix $(src)/,$(aic79xx-y:.o=.c)): $(obj)/aic79xx_seq.h $(obj)/aic79xx_reg.h +$(addprefix $(obj)/,$(aic7xxx-y)): $(obj)/aic7xxx_seq.h $(obj)/aic7xxx_reg.h +$(addprefix $(obj)/,$(aic79xx-y)): $(obj)/aic79xx_seq.h $(obj)/aic79xx_reg.h But for now the distinction between src and obj is purely for documentation as they have the same value - also when O= is used. So it should work anyway. If you use M=... (or SUBDIRS=...) I think it matters but this is not the case for this in-tree driver in normal usage situations. I will test some more tomorrow and if feedback from Adrian is positive I will submit the hopefully last update to this Makefile to Linus. [I need to test if it can generate the files using the aicasm tool for instance). Sam - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
On Tue, Feb 05, 2008 at 09:06:23PM +0100, Sam Ravnborg wrote: > On Tue, Feb 05, 2008 at 07:47:35PM +0100, Sam Ravnborg wrote: > > On Tue, Feb 05, 2008 at 07:40:24PM +0200, Adrian Bunk wrote: > > > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx > > > compilation: > > > > > > <-- snip --> > > > > > > $ make O=../out/x86-full > > > ... > > > SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h > > > SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h > > > CC drivers/scsi/aic7xxx/aic79xx_core.o > > > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > > > gcc: no input files > > > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > > > > > > <-- snip --> > > > > > > Next "make" run brings the same failure in > > > drivers/scsi/aic7xxx/aic7xxx_core.c. > > > > > > With the third "make" it works. > > > > > > It might compile for people with SMP systems using -j? > > > > I can reproduce it and will fix it. > Seems I was sidetracked by some wrong assumptions. > Could you please test this fix. > > Works for me but this time I will do more testing Thanks, works fine for me. > Sam > > diff --git a/drivers/scsi/aic7xxx/Makefile b/drivers/scsi/aic7xxx/Makefile > index 4c54954..6aa49e7 100644 > --- a/drivers/scsi/aic7xxx/Makefile > +++ b/drivers/scsi/aic7xxx/Makefile > @@ -44,8 +44,8 @@ clean-files += aic79xx_seq.h aic79xx_reg.h > aic79xx_reg_print.c > > # Dependencies for generated files need to be listed explicitly > > -$(addprefix $(src)/,$(aic7xxx-y:.o=.c)): $(obj)/aic7xxx_seq.h > $(obj)/aic7xxx_reg.h > -$(addprefix $(src)/,$(aic79xx-y:.o=.c)): $(obj)/aic79xx_seq.h > $(obj)/aic79xx_reg.h > +$(addprefix $(src)/,$(aic7xxx-y)): $(obj)/aic7xxx_seq.h $(obj)/aic7xxx_reg.h > +$(addprefix $(src)/,$(aic79xx-y)): $(obj)/aic79xx_seq.h $(obj)/aic79xx_reg.h > > aic7xxx-gen-$(CONFIG_AIC7XXX_BUILD_FIRMWARE) := $(obj)/aic7xxx_reg.h > aic7xxx-gen-$(CONFIG_AIC7XXX_REG_PRETTY_PRINT) += > $(obj)/aic7xxx_reg_print.c cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] enclosure: add support for enclosure services
--- On Tue, 2/5/08, James Bottomley <[EMAIL PROTECTED]> wrote: > > > Wrong ... we don't export non-SCSI devices as > SCSI > > > (with the single and > > > rather annoying exception of ATA via SAT). > > > > I didn't say you should do that. I had already > > mentioned that vendors export such controls > > as either enclosure or processor type devices, > > and this is why I told you that that is what > > needs to be exported, which incidentally is > > a device node of that type. > > > > Without a common usage model already in the kernel > > to abstract (e.g. sd for block device, since you > brought > > that up) your abstraction seems redundant and > arbitrary. > > Exactly, so the first patch in this series (a while ago ^^^ See last paragraph. > now) was a > common usage model abstraction of enclosures, and the > second was an > implementation in terms of SES. I will do one in terms of > SGPIO as > well ... assuming I ever find a SGPIO enclosure ... The vendor would've abstracted that away most commonly using SES. > > > Your kernel code already uses READ DIAGNOSTIC, etc, > > and I'd rather leave that to user-space. > > You can do it in user space as well. It's just a bit > difficult to get > information out of a SES enclosure without using it, and > getting some of > the information is a requirement of the abstraction. You missed my point. Your abstraction is redundant and arbitrary -- it is not based on any known, in-practice, usage model, already in place that needs a better, common way of doing XYZ, and therefore needs an abstraction. Luben - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] fix BUG when sum(scatterlist) > bufflen
Tony Battersby wrote: When sending a SCSI command to a tape drive via the SCSI Generic (sg) driver, if the command has a data transfer length more than scatter_elem_sz (32 KB default) and not a multiple of 512, then I either hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else the command never completes (depending on the LLDD). When constructing scatterlists, the sg driver rounds up the scatterlist element sizes to be a multiple of 512. This can result in sum(scatterlist lengths) > bufflen. In this case, scsi_req_map_sg() incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to bufflen. When the command completes, req_bio_endio() detects that bio->bi_size != 0, and so it doesn't call bio_endio(). This causes the command to be resubmitted, resulting in BUG_ON or the command never completing. This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather than to sum(scatterlist lengths), which fixes the problem. Signed-off-by: Tony Battersby <[EMAIL PROTECTED]> --- --- linux-2.6.24-git14/drivers/scsi/scsi_lib.c.orig 2008-02-05 09:33:05.0 -0500 +++ linux-2.6.24-git14/drivers/scsi/scsi_lib.c 2008-02-05 09:33:10.0 -0500 @@ -301,7 +301,6 @@ static int scsi_req_map_sg(struct reques page = sg_page(sg); off = sg->offset; len = sg->length; - data_len += len; Thanks for finding this. I am not sure what happened. That line got deleted in this commit when we fixed this problem: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=bd441deaf341c524b28fd72831ebf6fef88f1c41 but was added back here: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=c6132da1704be252ee6c923f47501083d835c238 Acked-by: Mike Christie <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] enclosure: add support for enclosure services
On Tue, 2008-02-05 at 11:33 -0800, Luben Tuikov wrote: > > Wrong ... we don't export non-SCSI devices as SCSI > > (with the single and > > rather annoying exception of ATA via SAT). > > I didn't say you should do that. I had already > mentioned that vendors export such controls > as either enclosure or processor type devices, > and this is why I told you that that is what > needs to be exported, which incidentally is > a device node of that type. > > Without a common usage model already in the kernel > to abstract (e.g. sd for block device, since you brought > that up) your abstraction seems redundant and arbitrary. Exactly, so the first patch in this series (a while ago now) was a common usage model abstraction of enclosures, and the second was an implementation in terms of SES. I will do one in terms of SGPIO as well ... assuming I ever find a SGPIO enclosure ... > Your kernel code already uses READ DIAGNOSTIC, etc, > and I'd rather leave that to user-space. You can do it in user space as well. It's just a bit difficult to get information out of a SES enclosure without using it, and getting some of the information is a requirement of the abstraction. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
On Tue, 2008-02-05 at 21:06 +0100, Sam Ravnborg wrote: > On Tue, Feb 05, 2008 at 07:47:35PM +0100, Sam Ravnborg wrote: > > On Tue, Feb 05, 2008 at 07:40:24PM +0200, Adrian Bunk wrote: > > > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx > > > compilation: > > > > > > <-- snip --> > > > > > > $ make O=../out/x86-full > > > ... > > > SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h > > > SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h > > > CC drivers/scsi/aic7xxx/aic79xx_core.o > > > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > > > gcc: no input files > > > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > > > > > > <-- snip --> > > > > > > Next "make" run brings the same failure in > > > drivers/scsi/aic7xxx/aic7xxx_core.c. > > > > > > With the third "make" it works. > > > > > > It might compile for people with SMP systems using -j? > > > > I can reproduce it and will fix it. > Seems I was sidetracked by some wrong assumptions. > Could you please test this fix. > > Works for me but this time I will do more testing > > Sam > > diff --git a/drivers/scsi/aic7xxx/Makefile b/drivers/scsi/aic7xxx/Makefile > index 4c54954..6aa49e7 100644 > --- a/drivers/scsi/aic7xxx/Makefile > +++ b/drivers/scsi/aic7xxx/Makefile > @@ -44,8 +44,8 @@ clean-files += aic79xx_seq.h aic79xx_reg.h > aic79xx_reg_print.c > > # Dependencies for generated files need to be listed explicitly > > -$(addprefix $(src)/,$(aic7xxx-y:.o=.c)): $(obj)/aic7xxx_seq.h > $(obj)/aic7xxx_reg.h > -$(addprefix $(src)/,$(aic79xx-y:.o=.c)): $(obj)/aic79xx_seq.h > $(obj)/aic79xx_reg.h > +$(addprefix $(src)/,$(aic7xxx-y)): $(obj)/aic7xxx_seq.h $(obj)/aic7xxx_reg.h > +$(addprefix $(src)/,$(aic79xx-y)): $(obj)/aic79xx_seq.h $(obj)/aic79xx_reg.h OK, I think it's time for me to give up completely on understanding kbuild. To me this construction looks like you're adding source directory prefixes to objects ... which can never be satisfied can it, if the objectas are in the object directory? James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] scsi_dh: Add support for SDEV_PASSIVE
James Bottomley wrote: On Mon, 2008-02-04 at 12:15 -0800, Chandra Seetharaman wrote: On Mon, 2008-02-04 at 12:58 -0600, James Bottomley wrote: On Wed, 2008-01-23 at 16:32 -0800, Chandra Seetharaman wrote: Subject: scsi_dh: Add support for SDEV_PASSIVE From: Chandra Seetharaman <[EMAIL PROTECTED]> This patch adds a new device state SDEV_PASSIVE, to correspond to the passive side access of an active/passive multipathed device. Really, no; this isn't right. The state field of a SCSI device is for the SCSI state model. Passive might be a valid device mapper state, but Hi James, It is not the "device mapper state", it is the state of the device itself. These devices have active/passive paths, the passive paths will be represented by SDEV_PASSIVE device state in SCSI. Yes, it is .. you're killing commands on the basis of being in this state, which nothing in SCSI ever sets. SCSI does set this. See below. A proper return from a passive path is the SCSI standard NOT_READY LOGICAL UNIT NOT READY, INITIALIZING COMMAND REQUIRED. We expect to see this, not the command being killed. I think this part of the patch is trying to implement and detect the Target port asymetric access states from spc3 section 5.8.2.4 (it does not follow it exactly because devices like RDAC or old clarrions did not implement the spec), and then use that info to fail commands before they are even sent to the device to avoid start up delays from when programs like udev, hal, kernel partition scanning probe the device. For the LSI patch it works like the following: When IO is sent to a path that cannot execute IO optimally, the scsi hw handler hook for sense processing (see rdac_check_sense in "[PATCH 8/9] scsi_dh: add lsi rdac device handler" and the scsi_error.c hook in in "scsi_dh: add skeleton for SCSI Device Handlers") will detect this and set the state to passive so future IO is not execute on the path (SG_IO/passthrough is allowed). I am not sure about alternatives. If we just exported the port access state in sysfs, but did not fail IO from scsi_prep_state_check, then the users could still check the state before sending IO. Would it be horrible to convert apps to do this? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
On Tue, Feb 05, 2008 at 07:47:35PM +0100, Sam Ravnborg wrote: > On Tue, Feb 05, 2008 at 07:40:24PM +0200, Adrian Bunk wrote: > > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx > > compilation: > > > > <-- snip --> > > > > $ make O=../out/x86-full > > ... > > SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h > > SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h > > CC drivers/scsi/aic7xxx/aic79xx_core.o > > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > > gcc: no input files > > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > > > > <-- snip --> > > > > Next "make" run brings the same failure in > > drivers/scsi/aic7xxx/aic7xxx_core.c. > > > > With the third "make" it works. > > > > It might compile for people with SMP systems using -j? > > I can reproduce it and will fix it. Seems I was sidetracked by some wrong assumptions. Could you please test this fix. Works for me but this time I will do more testing Sam diff --git a/drivers/scsi/aic7xxx/Makefile b/drivers/scsi/aic7xxx/Makefile index 4c54954..6aa49e7 100644 --- a/drivers/scsi/aic7xxx/Makefile +++ b/drivers/scsi/aic7xxx/Makefile @@ -44,8 +44,8 @@ clean-files += aic79xx_seq.h aic79xx_reg.h aic79xx_reg_print.c # Dependencies for generated files need to be listed explicitly -$(addprefix $(src)/,$(aic7xxx-y:.o=.c)): $(obj)/aic7xxx_seq.h $(obj)/aic7xxx_reg.h -$(addprefix $(src)/,$(aic79xx-y:.o=.c)): $(obj)/aic79xx_seq.h $(obj)/aic79xx_reg.h +$(addprefix $(src)/,$(aic7xxx-y)): $(obj)/aic7xxx_seq.h $(obj)/aic7xxx_reg.h +$(addprefix $(src)/,$(aic79xx-y)): $(obj)/aic79xx_seq.h $(obj)/aic79xx_reg.h aic7xxx-gen-$(CONFIG_AIC7XXX_BUILD_FIRMWARE) := $(obj)/aic7xxx_reg.h aic7xxx-gen-$(CONFIG_AIC7XXX_REG_PRETTY_PRINT) += $(obj)/aic7xxx_reg_print.c - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: new scsi sense handling
--- On Tue, 2/5/08, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > On Mon, 4 Feb 2008 18:39:22 -0800 (PST) > Luben Tuikov <[EMAIL PROTECTED]> wrote: > > > --- On Mon, 2/4/08, Boaz Harrosh > <[EMAIL PROTECTED]> wrote: > > > There are 3 usages of sense handling in drivers > > > > > > 1. sense is available in driver internal > structure and is > > > mem-copied to upper level > > > 2. A CHECK_CONDITION status was returned and the > driver > > > uses the scsi_eh_prep_cmnd() > > >for a REQUEST_SENSE invocation to the target. > Then > > > returning the sense in > > >scsi_eh_return_cmnd(). A variation on this is > when the > > > driver does nothing the queue > > >is frozen an the scsi watchdog timer does the > above. > > > 3. The underline host adapter does the > REQUEST_SENSE and a > > > pre-allocated and DMA mapped > > >sense buffer receives the sense information > from HW. > > > > Many years ago when "ACA" had a constructive > meaning, > > so did "Autosense". Then about 5 years ago, > "Autosense" > > disappeared completely since it became the de facto > > implementation of the then SCSI Execute Command > "RPC", > > now just SCSI Execute Command procedure call. > > > > At that point in time, the SCSI mid-layer decided > > to embrace this model and give the LLDD a scsi command > > structure which included the sense data buffer to > > a size that the SCSI mid-layer was interested in, > > at the moment 96 bytes, macro defined in > > include/scsi/scsi_cmnd.h. > > > > The concept of "Autosense" was off-loaded to > LLDD > > to emulate it if the specific target device to > > which the command was issued, didn't supply the > > sense data on CHECK CONDITION, and more so > > relevant to target devices which implemented > > queuing, thus the ACA. > > > > And the mid-layer would consider extracting > > the sense data via REQUEST SENSE command > > as a _special case_ if the LLDD/transport layer > > didn't implement the "autosense" model. > > Only SPI and USB? I don't understand this question. > > The most of LLDs using the transport protocol that we care > about today > uses sense buffer in their own internal structure. Yes. > > I think that the issue to solve to kill > scsi_cmnd:sense_buffer is how > to share (or export) such sense buffer with the scsi > mid-layer. And therein lies the problem. Sense data is SCSI specific, it should be left to SCSI, unless of course you can stipulate that _all_ block devices return sense data. If that's not the case and you move it to the block layer, then you get a whole bunch of other problems, like does this device want/use it, should we allocate it, etc. OTOH, if that _is_ the case, then you don't have to worry about this and the model is pretty much as the SCSI mid-layer has it, i.e. sense buffer always present. So I guess the question is, can you stipulate that _all_ block devices return sense data? Luben - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] enclosure: add support for enclosure services
--- On Tue, 2/5/08, James Bottomley <[EMAIL PROTECTED]> wrote: > > > > I guess the same could be said for STGT and > SCST, > > > right? > > > > > > You mean both of their kernel pieces are modular? > > > > That's correct. > > > > No, you know very well what I mean. > > > > By the same logic you're preaching to include your > > solution part of the kernel, you can also apply to > > SCST. > > Ah, but it's not ... the current patch is merely > exporting an interface. > The debate in STGT vs SCST is not whether to export an > interface but > where to draw the line. "draw the line" -- I see. BTW, what is wrong with "exporting the interface"? What is wrong if both implementations are in the kernel and then let the users and distros decide which one they like best and use more? It'll not be the fist time this has happened in the kernel. Both are actively maintained. It seems highly arbitrary to say: "X is in the kernel, Y is not. If you want Y, just forget about it and fix X." Give people choice at config time. This is off topic anyway. > You could also argue in the same vein that sd is redundant > because a > filesystem could talk directly to the device via /dev/sgX > (in fact OSD > based filesystems already do this). Yes, I've mentioned this thing before on this list. Oh, maybe 3 years ago. This is why I had wanted for transport protocols to export ... (oh, let's not get this off topic). (Apparently it takes 3 years...) > The argument is true, > but misses > the bigger picture that the interfaces exported by sd are > more portable > (apply to non-SCSI block devices) and easier to use. It isn't quite the same thing. It's like comparing apples to oranges. > > > > > Yes, for which the transport layer, > implements the > > > > scsi device node for the SES device. It > doesn't > > > really > > > > matter if the SCSI commands sent to the SES > device go > > > > over SGPIO or FC or SAS or Bluetooth or I2C, > etc, the > > > > transport layer can implement that and > present the > > > > /dev/sgX node. > > > > > > But it does matter if the enclosure device > doesn't > > > speak SCSI. > > > > Enclosure management isn't as simple as you're > > portraying it here. The enclosure management > > device speaks either SES or SAF-TE. The transport > > protocol to access it could be SGPIO or I2C or... > > Look, just read the spec; SGPIO is a bus for driving > enclosures ... I thought Serial General Purpose Input Output (SGPIO) was a method to serialize general purpose IO signals. > it > doesn't require SES or SAF-TE or even any SCSI > protocol. That's true. And this is why I mentioned a couple of emails ago to simply export a sgpio device node *IF* this is what is needed. Of course devices that use SGPIO abstract it away for their functional purpose, e.g. enclosures, LED, etc, and provide a more general way to control it -- highly hardware specific on one side. Your abstraction currently deals with "SES" devices and I'd rather leave that to user-space. Alternatively, which I presume is what you're thinking, a HW specific core would be using your "abstraction" to provide some unified access to raw features, and that "unified access" isn't defined anywhere, and would likely not be. Alternatively that "unified access" is things like SES and SAF-TE, which is what vendors prefer to export, or they prefer to drive this directly via other means. That is, I fail to see the kernel bloat, for things that aren't necessary in the kernel. If you want your abstraction to fly, it first needs a common usage model to abstract, and the latter is missing _from the kernel_. Unless I don't know the details and you've been asked to implement this for a single vendor's HW solution. > > > SGPIO > > > isn't a SCSI protocol ... it's a general > purpose > > > serial bus protocol. > > > It's pretty simple and register based, but it > might (or > > > might not) be > > > accessible via a SCSI bridge. > > > > I see. You've just discovered SGPIO -- good for > you. > > > > At any rate, I told you already that what is needed > > is not what you've provided but a _device node_ > > exported by the kernel, either a processor or > > enclosure type. > > Wrong ... we don't export non-SCSI devices as SCSI > (with the single and > rather annoying exception of ATA via SAT). I didn't say you should do that. I had already mentioned that vendors export such controls as either enclosure or processor type devices, and this is why I told you that that is what needs to be exported, which incidentally is a device node of that type. Without a common usage model already in the kernel to abstract (e.g. sd for block device, since you brought that up) your abstraction seems redundant and arbitrary. Your kernel code already uses READ DIAGNOSTIC, etc, and I'd rather leave that to user-space. Luben - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: Integration of SCST in the mainstream Linux kernel
Jeff Garzik wrote: iSCSI is way, way too complicated. I fully agree. From one side, all that complexity is unavoidable for case of multiple connections per session, but for the regular case of one connection per session it must be a lot simpler. Actually, think about those multiple connections... we already had to implement fast-failover (and load bal) SCSI multi-pathing at a higher level. IMO that portion of the protocol is redundant: You need the same capability elsewhere in the OS _anyway_, if you are to support multi-pathing. I'm thinking about MC/S as about a way to improve performance using several physical links. There's no other way, except MC/S, to keep commands processing order in that case. So, it's really valuable property of iSCSI, although with a limited application. Vlad - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
On Tue, 2008-02-05 at 21:59 +0300, Vladislav Bolkhovitin wrote: > >>Hmm, how can one write to an mmaped page and don't touch it? > > > > I meant from user space ... the writes are done inside the kernel. > > Sure, the mmap() approach agreed to be unpractical, but could you > elaborate more on this anyway, please? I'm just curious. Do you think > about implementing a new syscall, which would put pages with data in the > mmap'ed area? No, it has to do with the way invalidation occurs. When you mmap a region from a device or file, the kernel places page translations for that region into your vm_area. The regions themselves aren't backed until faulted. For write (i.e. incoming command to target) you specify the write flag and send the area off to receive the data. The gather, expecting the pages to be overwritten, backs them with pages marked dirty but doesn't fault in the contents (unless it already exists in the page cache). The kernel writes the data to the pages and the dirty pages go back to the user. msync() flushes them to the device. The disadvantage of all this is that the handle for the I/O if you will is a virtual address in a user process that doesn't actually care to see the data. non-x86 architectures will do flushes/invalidates on this address space as the I/O occurs. > > However, as Linus has pointed out, this discussion is getting a bit off > > topic. > > No, that isn't off topic. We've just proved that there is no good way to > implement zero-copy cached I/O for STGT. I see the only practical way > for that, proposed by FUJITA Tomonori some time ago: duplicating Linux > page cache in the user space. But will you like it? Well, there's no real evidence that zero copy or lack of it is a problem yet. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Vladislav Bolkhovitin wrote: Jeff Garzik wrote: iSCSI is way, way too complicated. I fully agree. From one side, all that complexity is unavoidable for case of multiple connections per session, but for the regular case of one connection per session it must be a lot simpler. Actually, think about those multiple connections... we already had to implement fast-failover (and load bal) SCSI multi-pathing at a higher level. IMO that portion of the protocol is redundant: You need the same capability elsewhere in the OS _anyway_, if you are to support multi-pathing. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Erez Zilber wrote: Bart Van Assche wrote: As you probably know there is a trend in enterprise computing towards networked storage. This is illustrated by the emergence during the past few years of standards like SRP (SCSI RDMA Protocol), iSCSI (Internet SCSI) and iSER (iSCSI Extensions for RDMA). Two different pieces of software are necessary to make networked storage possible: initiator software and target software. As far as I know there exist three different SCSI target implementations for Linux: - The iSCSI Enterprise Target Daemon (IETD, http://iscsitarget.sourceforge.net/); - The Linux SCSI Target Framework (STGT, http://stgt.berlios.de/); - The Generic SCSI Target Middle Level for Linux project (SCST, http://scst.sourceforge.net/). Since I was wondering which SCSI target software would be best suited for an InfiniBand network, I started evaluating the STGT and SCST SCSI target implementations. Apparently the performance difference between STGT and SCST is small on 100 Mbit/s and 1 Gbit/s Ethernet networks, but the SCST target software outperforms the STGT software on an InfiniBand network. See also the following thread for the details: http://sourceforge.net/mailarchive/forum.php?thread_name=e2e108260801170127w2937b2afg9bef324efa945e43%40mail.gmail.com&forum_name=scst-devel. Sorry for the late response (but better late than never). One may claim that STGT should have lower performance than SCST because its data path is from userspace. However, your results show that for non-IB transports, they both show the same numbers. Furthermore, with IB there shouldn't be any additional difference between the 2 targets because data transfer from userspace is as efficient as data transfer from kernel space. And now consider if one target has zero-copy cached I/O. How much that will improve its performance? The only explanation that I see is that fine tuning for iSCSI & iSER is required. As was already mentioned in this thread, with SDR you can get ~900 MB/sec with iSER (on STGT). Erez - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
On Feb 5, 2008 6:10 PM, Erez Zilber <[EMAIL PROTECTED]> wrote: > One may claim that STGT should have lower performance than SCST because > its data path is from userspace. However, your results show that for > non-IB transports, they both show the same numbers. Furthermore, with IB > there shouldn't be any additional difference between the 2 targets > because data transfer from userspace is as efficient as data transfer > from kernel space. > > The only explanation that I see is that fine tuning for iSCSI & iSER is > required. As was already mentioned in this thread, with SDR you can get > ~900 MB/sec with iSER (on STGT). My most recent measurements also show that one can get 900 MB/s with STGT + iSER on an SDR IB network, but only for very large block sizes (>= 100 MB). A quote from Linus Torvalds is relevant here (February 5, 2008): Block transfer sizes over about 64kB are totally irrelevant for 99% of all people. Please read my e-mail (posted earlier today) with a comparison for 4 KB - 64 KB block transfer sizes between SCST and STGT. Bart Van Assche. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Jeff Garzik wrote: Alan Cox wrote: better. So for example, I personally suspect that ATA-over-ethernet is way better than some crazy SCSI-over-TCP crap, but I'm biased for simple and low-level, and against those crazy SCSI people to begin with. Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP would probably trash iSCSI for latency if nothing else. AoE is truly a thing of beauty. It has a two/three page RFC (say no more!). But quite so... AoE is limited to MTU size, which really hurts. Can't really do tagged queueing, etc. iSCSI is way, way too complicated. I fully agree. From one side, all that complexity is unavoidable for case of multiple connections per session, but for the regular case of one connection per session it must be a lot simpler. And now think about iSER, which brings iSCSI on the whole new complexity level ;) It's an Internet protocol designed by storage designers, what do you expect? For years I have been hoping that someone will invent a simple protocol (w/ strong auth) that can transit ATA and SCSI commands and responses. Heck, it would be almost trivial if the kernel had a TLS/SSL implementation. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Linus Torvalds wrote: I'd assumed the move was primarily because of the difficulty of getting correct semantics on a shared filesystem .. not even shared. It was hard to get correct semantics full stop. Which is a traditional problem. The thing is, the kernel always has some internal state, and it's hard to expose all the semantics that the kernel knows about to user space. So no, performance is not the only reason to move to kernel space. It can easily be things like needing direct access to internal data queues (for a iSCSI target, this could be things like barriers or just tagged commands - yes, you can probably emulate things like that without access to the actual IO queues, but are you sure the semantics will be entirely right? The kernel/userland boundary is not just a performance boundary, it's an abstraction boundary too, and these kinds of protocols tend to break abstractions. NFS broke it by having "file handles" (which is not something that really exists in user space, and is almost impossible to emulate correctly), and I bet the same thing happens when emulating a SCSI target in user space. Yes, there is something like that for SCSI target as well. It's a "local initiator" or "local nexus", see http://thread.gmane.org/gmane.linux.scsi/31288 and http://news.gmane.org/find-root.php?message_id=%3c463F36AC.3010207%40vlnb.net%3e for more info about that. In fact, existence of local nexus is one more point why SCST is better, than STGT, because for STGT it's pretty hard to support it (all locally generated commands would have to be passed through its daemon, which would be a total disaster for performance), while for SCST it can be done relatively simply. Vlad - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Linus Torvalds wrote: So just going by what has happened in the past, I'd assume that iSCSI would eventually turn into "connecting/authentication in user space" with "data transfers in kernel space". This is exactly how iSCSI-SCST (iSCSI target driver for SCST) is implemented, credits to IET and Ardis target developers. Vlad - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
James Bottomley wrote: On Mon, 2008-02-04 at 21:38 +0300, Vladislav Bolkhovitin wrote: James Bottomley wrote: On Mon, 2008-02-04 at 20:56 +0300, Vladislav Bolkhovitin wrote: James Bottomley wrote: On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote: James Bottomley wrote: So, James, what is your opinion on the above? Or the overall SCSI target project simplicity doesn't matter much for you and you think it's fine to duplicate Linux page cache in the user space to keep the in-kernel part of the project as small as possible? The answers were pretty much contained here http://marc.info/?l=linux-scsi&m=120164008302435 and here: http://marc.info/?l=linux-scsi&m=120171067107293 Weren't they? No, sorry, it doesn't look so for me. They are about performance, but I'm asking about the overall project's architecture, namely about one part of it: simplicity. Particularly, what do you think about duplicating Linux page cache in the user space to have zero-copy cached I/O? Or can you suggest another architectural solution for that problem in the STGT's approach? Isn't that an advantage of a user space solution? It simply uses the backing store of whatever device supplies the data. That means it takes advantage of the existing mechanisms for caching. No, please reread this thread, especially this message: http://marc.info/?l=linux-kernel&m=120169189504361&w=2. This is one of the advantages of the kernel space implementation. The user space implementation has to have data copied between the cache and user space buffer, but the kernel space one can use pages in the cache directly, without extra copy. Well, you've said it thrice (the bellman cried) but that doesn't make it true. The way a user space solution should work is to schedule mmapped I/O from the backing store and then send this mmapped region off for target I/O. For reads, the page gather will ensure that the pages are up to date from the backing store to the cache before sending the I/O out. For writes, You actually have to do a msync on the region to get the data secured to the backing store. James, have you checked how fast is mmaped I/O if work size > size of RAM? It's several times slower comparing to buffered I/O. It was many times discussed in LKML and, seems, VM people consider it unavoidable. Erm, but if you're using the case of work size > size of RAM, you'll find buffered I/O won't help because you don't have the memory for buffers either. James, just check and you will see, buffered I/O is a lot faster. So in an out of memory situation the buffers you don't have are a lot faster than the pages I don't have? There isn't OOM in both cases. Just pages reclamation/readahead work much better in the buffered case. So, using mmaped IO isn't an option for high performance. Plus, mmaped IO isn't an option for high reliability requirements, since it doesn't provide a practical way to handle I/O errors. I think you'll find it does ... the page gather returns -EFAULT if there's an I/O error in the gathered region. Err, to whom return? If you try to read from a mmaped page, which can't be populated due to I/O error, you will get SIGBUS or SIGSEGV, I don't remember exactly. It's quite tricky to get back to the faulted command from the signal handler. Or do you mean mmap(MAP_POPULATE)/munmap() for each command? Do you think that such mapping/unmapping is good for performance? msync does something similar if there's a write failure. You also have to pull tricks with the mmap region in the case of writes to prevent useless data being read in from the backing store. Can you be more exact and specify what kind of tricks should be done for that? Actually, just avoid touching it seems to do the trick with a recent kernel. Hmm, how can one write to an mmaped page and don't touch it? I meant from user space ... the writes are done inside the kernel. Sure, the mmap() approach agreed to be unpractical, but could you elaborate more on this anyway, please? I'm just curious. Do you think about implementing a new syscall, which would put pages with data in the mmap'ed area? However, as Linus has pointed out, this discussion is getting a bit off topic. No, that isn't off topic. We've just proved that there is no good way to implement zero-copy cached I/O for STGT. I see the only practical way for that, proposed by FUJITA Tomonori some time ago: duplicating Linux page cache in the user space. But will you like it? There's no actual evidence that copy problems are causing any performatince issues issues for STGT. In fact, there's evidence that they're not for everything except IB networks. The zero-copy cached I/O has not yet been implemented in SCST, I simply so far have not had time for that. Currently SCST performs better STGT, because of simpler processing path and less context switches per command. Memcpy() speed on modern systems is about t
Re: aic7xxx build failure
On Tue, Feb 05, 2008 at 07:40:24PM +0200, Adrian Bunk wrote: > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx > compilation: > > <-- snip --> > > $ make O=../out/x86-full > ... > SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h > SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h > CC drivers/scsi/aic7xxx/aic79xx_core.o > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > gcc: no input files > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > > <-- snip --> > > Next "make" run brings the same failure in > drivers/scsi/aic7xxx/aic7xxx_core.c. > > With the third "make" it works. > > It might compile for people with SMP systems using -j? I can reproduce it and will fix it. Sam - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
On Tue, 05 Feb 2008 12:30:56 -0600 > > make: *** [sub-make] Error 2 > > Do I assume from this that you have different source and object > directories? There shouldn't be a failure if this is building > in /home/bunk/linux/kernel-2.6/git/linux-2.6/ because the source file > should be there. > > James time to run a fsck as well just to rule stuff out? -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
This email somehow didn't manage to make it to the list (I suspect because it had html attachments). James --- From: Julian Satran <[EMAIL PROTECTED]> To: Nicholas A. Bellinger <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]>, Alan Cox <[EMAIL PROTECTED]>, Bart Van Assche <[EMAIL PROTECTED]>, FUJITA Tomonori <[EMAIL PROTECTED]>, James Bottomley <[EMAIL PROTECTED]>, ... Subject: Re: Integration of SCST in the mainstream Linux kernel Date: Mon, 4 Feb 2008 21:31:48 -0500 (20:31 CST) Well stated. In fact the "layers" above ethernet do provide the services that make the TCP/IP stack compelling - a whole complement of services. ALL services required (naming, addressing, discovery, security etc.) will have to be recreated if you take the FcOE route. That makes good business for some but not necessary for the users. Those services BTW are not on the data path and are not "overhead". The TCP/IP stack pathlength is decently low. What makes most implementations poor is that they where naively extended in the SMP world. Recent implementations (published) from IBM and Intel show excellent performance (4-6 times the regular stack). I do not have unfortunately latency numbers (as the community major stress has been throughput) but I assume that RDMA (not necessarily hardware RDMA) and/or the use of infiniband or latency critical applications - within clusters may be the ultimate low latency solution. Ethernet has some inherent latency issues (the bridges) that are inherited by anything on ethernet (FcOE included). The IP protocol stack is not inherently slow but some implementations are somewhat sluggish. But instead of replacing them with new and half backed contraptions we would be all better of improving what we have and understand. In the whole debate of around FcOE I heard a single argument that may have some merit - building convertors iSCSI-FCP to support legacy islands of FCP (read storage products that do not support iSCSI natively) is expensive. It is correct technically - only that FcOE eliminates an expense at the wrong end of the wire - it reduces the cost of the storage box at the expense of added cost at the server (and usually there a many servers using a storage box). FcOE vendors are also bound to provide FCP like services for FcOE - naming, security, discovery etc. - that do not exist on Ethernet. It is a good business for FcOE vendors - a duplicate set of solution for users. It should be apparent by now that if one speaks about a "converged" network we should speak about an IP network and not about Ethernet. If we take this route we might get perhaps also to an "infrastructure physical variants" that support very low latency better than ethernet and we might be able to use them with the same "stack" - a definite forward looking solution. IMHO it is foolish to insist on throwing away the whole stack whenever we make a slight improvement in the physical layer of the network. We have a substantial investment and body of knowledge in the protocol stack and nothing proposed improves on it - obviously not as in its total level of service nor in performance. Julo - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
On Tue, Feb 05, 2008 at 12:30:56PM -0600, James Bottomley wrote: > > On Tue, 2008-02-05 at 20:24 +0200, Adrian Bunk wrote: > > On Tue, Feb 05, 2008 at 12:18:04PM -0600, James Bottomley wrote: > > > > > > On Tue, 2008-02-05 at 19:40 +0200, Adrian Bunk wrote: > > > > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx > > > > compilation: > > > > > > > > <-- snip --> > > > > > > > > $ make O=../out/x86-full > > > > ... > > > > SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h > > > > SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h > > > > CC drivers/scsi/aic7xxx/aic79xx_core.o > > > > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > > > > gcc: no input files > > > > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > > > > > > Could you run this with V=1 to get us a verbose output of what the exact > > > files gcc is failing on are? > > > > make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build > > obj=drivers/scsi > > make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build > > obj=drivers/scsi/aacraid > > (cat /dev/null; ) > drivers/scsi/aacraid/modules.order > > make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build > > obj=drivers/scsi/aic7xxx > > cat > > /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx/aic79xx_seq.h_shipped > > > drivers/scsi/aic7xxx/aic79xx_seq.h > > cat > > /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx/aic79xx_reg.h_shipped > > > drivers/scsi/aic7xxx/aic79xx_reg.h > > gcc -Wp,-MD,drivers/scsi/aic7xxx/.aic79xx_core.o.d -nostdinc -isystem > > /usr/lib/gcc/i486-linux-gnu/4.2.3/include -D__KERNEL__ -Iinclude -Iinclude2 > > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include -include > > include/linux/autoconf.h > > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx > > -Idrivers/scsi/aic7xxx -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs > > -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os > > -m32 -msoft-float -mregparm=3 -freg-struct-return > > -mpreferred-stack-boundary=2 -march=athlon -ffreestanding -DCONFIG_AS_CFI=1 > > -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare > > -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow > > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-x86/mach-generic > > -Iinclude/asm-x86/mach-generic > > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-x86/mach-default > > -Iinclude/asm-x86/mach-default -fno-omit-frame-pointer > > -fno-optimize-sibling-calls -fno-stack-protector > > -Wdeclaration-after-statement -Wno-pointer-sign > > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi -Idrivers/scsi > > -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(aic79xx_core)" > > -D"KBUILD_MODNAME=KBUILD_STR(aic79xx)" -c -o > > drivers/scsi/aic7xxx/aic79xx_core.o drivers/scsi/aic7xxx/aic79xx_core.c > > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > > gcc: no input files > > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > > make[3]: *** [drivers/scsi/aic7xxx] Error 2 > > make[2]: *** [drivers/scsi] Error 2 > > make[1]: *** [drivers] Error 2 > > make: *** [sub-make] Error 2 > > Do I assume from this that you have different source and object > directories? Yes, as I wrote in my bug report: make O=../out/x86-full > There shouldn't be a failure if this is building > in /home/bunk/linux/kernel-2.6/git/linux-2.6/ because the source file > should be there. > > James cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
On Tue, 2008-02-05 at 20:24 +0200, Adrian Bunk wrote: > On Tue, Feb 05, 2008 at 12:18:04PM -0600, James Bottomley wrote: > > > > On Tue, 2008-02-05 at 19:40 +0200, Adrian Bunk wrote: > > > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx > > > compilation: > > > > > > <-- snip --> > > > > > > $ make O=../out/x86-full > > > ... > > > SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h > > > SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h > > > CC drivers/scsi/aic7xxx/aic79xx_core.o > > > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > > > gcc: no input files > > > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > > > > Could you run this with V=1 to get us a verbose output of what the exact > > files gcc is failing on are? > > make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build > obj=drivers/scsi > make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build > obj=drivers/scsi/aacraid > (cat /dev/null; ) > drivers/scsi/aacraid/modules.order > make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build > obj=drivers/scsi/aic7xxx > cat > /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx/aic79xx_seq.h_shipped > > drivers/scsi/aic7xxx/aic79xx_seq.h > cat > /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx/aic79xx_reg.h_shipped > > drivers/scsi/aic7xxx/aic79xx_reg.h > gcc -Wp,-MD,drivers/scsi/aic7xxx/.aic79xx_core.o.d -nostdinc -isystem > /usr/lib/gcc/i486-linux-gnu/4.2.3/include -D__KERNEL__ -Iinclude -Iinclude2 > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include -include > include/linux/autoconf.h > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx > -Idrivers/scsi/aic7xxx -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs > -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os > -m32 -msoft-float -mregparm=3 -freg-struct-return > -mpreferred-stack-boundary=2 -march=athlon -ffreestanding -DCONFIG_AS_CFI=1 > -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare > -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-x86/mach-generic > -Iinclude/asm-x86/mach-generic > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-x86/mach-default > -Iinclude/asm-x86/mach-default -fno-omit-frame-pointer > -fno-optimize-sibling-calls -fno-stack-protector > -Wdeclaration-after-statement -Wno-pointer-sign > -I/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi -Idrivers/scsi > -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(aic79xx_core)" > -D"KBUILD_MODNAME=KBUILD_STR(aic79xx)" -c -o > drivers/scsi/aic7xxx/aic79xx_core.o drivers/scsi/aic7xxx/aic79xx_core.c > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > gcc: no input files > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > make[3]: *** [drivers/scsi/aic7xxx] Error 2 > make[2]: *** [drivers/scsi] Error 2 > make[1]: *** [drivers] Error 2 > make: *** [sub-make] Error 2 Do I assume from this that you have different source and object directories? There shouldn't be a failure if this is building in /home/bunk/linux/kernel-2.6/git/linux-2.6/ because the source file should be there. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
On Tue, Feb 05, 2008 at 12:18:04PM -0600, James Bottomley wrote: > > On Tue, 2008-02-05 at 19:40 +0200, Adrian Bunk wrote: > > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx > > compilation: > > > > <-- snip --> > > > > $ make O=../out/x86-full > > ... > > SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h > > SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h > > CC drivers/scsi/aic7xxx/aic79xx_core.o > > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > > gcc: no input files > > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > > Could you run this with V=1 to get us a verbose output of what the exact > files gcc is failing on are? make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build obj=drivers/scsi make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build obj=drivers/scsi/aacraid (cat /dev/null; ) > drivers/scsi/aacraid/modules.order make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build obj=drivers/scsi/aic7xxx cat /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx/aic79xx_seq.h_shipped > drivers/scsi/aic7xxx/aic79xx_seq.h cat /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx/aic79xx_reg.h_shipped > drivers/scsi/aic7xxx/aic79xx_reg.h gcc -Wp,-MD,drivers/scsi/aic7xxx/.aic79xx_core.o.d -nostdinc -isystem /usr/lib/gcc/i486-linux-gnu/4.2.3/include -D__KERNEL__ -Iinclude -Iinclude2 -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include -include include/linux/autoconf.h -I/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx -Idrivers/scsi/aic7xxx -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=athlon -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-x86/mach-generic -Iinclude/asm-x86/mach-generic -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-x86/mach-default -Iinclude/asm-x86/mach-default -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -I/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi -Idrivers/scsi -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(aic79xx_core)" -D"KBUILD_MODNAME=KBUILD_STR(aic79xx)" -c -o drivers/scsi/aic7xxx/aic79xx_core.o drivers/scsi/aic7xxx/aic79xx_core.c gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory gcc: no input files make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 make[3]: *** [drivers/scsi/aic7xxx] Error 2 make[2]: *** [drivers/scsi] Error 2 make[1]: *** [drivers] Error 2 make: *** [sub-make] Error 2 > Thanks, > > James cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
On Tue, 5 Feb 2008, Bart Van Assche wrote: > > Results that I did not expect: > * A block transfer size of 1 MB is not enough to measure the maximal > throughput. The maximal throughput is only reached at much higher > block sizes (about 10 MB for SCST + SRP and about 100 MB for STGT + > iSER). Block transfer sizes over about 64kB are totally irrelevant for 99% of all people. Don't even bother testing anything more. Yes, bigger transfers happen, but a lot of common loads have *smaller* transfers than 64kB. So benchmarks that try to find "theoretical throughput" by just making big transfers should just be banned. They give numbers, yes, but the numbers are pointless. Linus - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
On Tue, 2008-02-05 at 19:40 +0200, Adrian Bunk wrote: > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx > compilation: > > <-- snip --> > > $ make O=../out/x86-full > ... > SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h > SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h > CC drivers/scsi/aic7xxx/aic79xx_core.o > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > gcc: no input files > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 Could you run this with V=1 to get us a verbose output of what the exact files gcc is failing on are? Thanks, James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aic7xxx build failure
On Tue, 2008-02-05 at 19:40 +0200, Adrian Bunk wrote: > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx > compilation: > > <-- snip --> > > $ make O=../out/x86-full > ... > SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h > SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h > CC drivers/scsi/aic7xxx/aic79xx_core.o > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory > gcc: no input files > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 > > <-- snip --> > > Next "make" run brings the same failure in > drivers/scsi/aic7xxx/aic7xxx_core.c. > > With the third "make" it works. > > It might compile for people with SMP systems using -j? I'd just say "weird behaviour" the file being complained about is definitely part of the tree ... does it actually exist in your tree when gcc claims it doesn't? if so, I suspect some type of make path screwup here. The commit in question is this: commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb Author: Sam Ravnborg <[EMAIL PROTECTED]> Date: Sun Feb 3 21:55:49 2008 +0100 scsi: fix dependency bug in aic7 Makefile Building the aic7xxx driver includes the copy of an .h file from a _shipped file. In a highly parallel build Ingo saw that the build sometimes failed (included distcc usage). It was tracked down to a missing dependency from the .c source file to the generated .h file. We started to build the .c file before the copy (cat) operation of the .h file completed and we then only got half of the definitions from the copied .h file. Add an explicit dependency from the .c files to the generated .h files so make knows all dependencies and finsih the build of the .h files before it starts building the .o files. Ingo tested this fix and reported: good news: hundreds of successful kernel builds and no failures overnight. Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]> Acked-by: Ingo Molnar <[EMAIL PROTECTED]> Acked-by: James Bottomley <[EMAIL PROTECTED]> diff --git a/drivers/scsi/aic7xxx/Makefile b/drivers/scsi/aic7xxx/Makefile index e4f70c5..4c54954 100644 --- a/drivers/scsi/aic7xxx/Makefile +++ b/drivers/scsi/aic7xxx/Makefile @@ -44,13 +44,8 @@ clean-files += aic79xx_seq.h aic79xx_reg.h aic79xx_reg_print.c # Dependencies for generated files need to be listed explicitly -$(obj)/aic7xxx_core.o: $(obj)/aic7xxx_seq.h -$(obj)/aic7xxx_core.o: $(obj)/aic7xxx_reg.h -$(obj)/aic79xx_core.o: $(obj)/aic79xx_seq.h -$(obj)/aic79xx_core.o: $(obj)/aic79xx_reg.h - -$(addprefix $(obj)/,$(aic7xxx-y)): $(obj)/aic7xxx_seq.h -$(addprefix $(obj)/,$(aic79xx-y)): $(obj)/aic79xx_seq.h +$(addprefix $(src)/,$(aic7xxx-y:.o=.c)): $(obj)/aic7xxx_seq.h $(obj)/aic7xxx_reg.h +$(addprefix $(src)/,$(aic79xx-y:.o=.c)): $(obj)/aic79xx_seq.h $(obj)/aic79xx_reg.h aic7xxx-gen-$(CONFIG_AIC7XXX_BUILD_FIRMWARE) := $(obj)/aic7xxx_reg.h aic7xxx-gen-$(CONFIG_AIC7XXX_REG_PRETTY_PRINT) += $(obj)/aic7xxx_reg_print.c The last two additions look wrong: they make source files depend on headers, which isn't right: it's object files that depend on headers (we don't know how to rebuild the .c files because they're not auto generated). However, the commit log indicates the cause might be deeper. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Olivier Galibert wrote: On Mon, Feb 04, 2008 at 05:57:47PM -0500, Jeff Garzik wrote: iSCSI and NBD were passe ideas at birth. :) Networked block devices are attractive because the concepts and implementation are more simple than networked filesystems... but usually you want to run some sort of filesystem on top. At that point you might as well run NFS or [gfs|ocfs|flavor-of-the-week], and ditch your networked block device (and associated complexity). Call me a sysadmin, but I find easier to plug in and keep in place an ethernet cable than these parallel scsi cables from hell. Every server has at least two ethernet ports by default, with rarely any surprises at the kernel level. Adding ethernet cards is inexpensive, and you pretty much never hear of compatibility problems between cards. So ethernet as a connection medium is really nice compared to scsi. Too bad iscsi is demented and ATAoE/NBD inexistant. Maybe external SAS will be nice, but I don't see it getting to the level of universality of ethernet any time soon. And it won't get the same amount of user-level compatibility testing in any case. Indeed, at the end of the day iSCSI is a bloated cabling standard. :) It has its uses, but I don't see it as ever coming close to replacing direct-to-network (perhaps backed with local cachefs) filesystems... which is how all the hype comes across to me. Cheap "Lintel" boxes everybody is familiar with _are_ the storage appliances. Until mass-produced ATA and SCSI devices start shipping with ethernet connectors, anyway. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bugfix for an underflow condition in usb storage & isd200.c
On Mon, Feb 04, 2008 at 03:05:58PM -0500, Alan Stern wrote: > On Sun, 3 Feb 2008, Matthew Dharm wrote: > > I think the correct approach is to modify those routines so that they > will never overrun the s-g buffer (like Boaz has done), and _document_ > this behavior. Then the callers can feel free to try and transfer as > much as they want, knowing that an overrun can't occur. There won't > be any need for a WARN_ON or anything else. Six of one and a half-dozen of the other. All we're arguing over is the definition of "correct behavior" here. You want to change the API so that overrun is acceptable and handled; I prefer calling it a Bad Thing(tm). We both agree that the code shouldn't run off the end of the s-g list. Since you've already committed to updating the patch, then we can do it your way. Just make sure it's very very clear in the comments. Matt -- Matthew Dharm Home: [EMAIL PROTECTED] Maintainer, Linux USB Mass Storage Driver E: You run this ship with Windows?! YOU IDIOT! L: Give me a break, it came bundled with the computer! -- ESR and Lan Solaris User Friendly, 12/8/1998 pgpcpyc8SXPyv.pgp Description: PGP signature
Re: Integration of SCST in the mainstream Linux kernel
Bart Van Assche wrote: On Feb 4, 2008 11:57 PM, Jeff Garzik <[EMAIL PROTECTED]> wrote: Networked block devices are attractive because the concepts and implementation are more simple than networked filesystems... but usually you want to run some sort of filesystem on top. At that point you might as well run NFS or [gfs|ocfs|flavor-of-the-week], and ditch your networked block device (and associated complexity). Running a filesystem on top of iSCSI results in better performance than NFS, especially if the NFS client conforms to the NFS standard (=synchronous writes). By searching the web search for the keywords NFS, iSCSI and performance I found the following (6 years old) document: http://www.technomagesinc.com/papers/ip_paper.html. A quote from the conclusion: Our results, generated by running some of industry standard benchmarks, show that iSCSI significantly outperforms NFS for situations when performing streaming, database like accesses and small file transactions. async performs better than sync... this is news? Furthermore, NFSv4 has not only async capability but delegation too (and RDMA if you like such things), so the comparison is not relevant to modern times. But a networked filesystem (note I'm using that term, not "NFS", from here on) is simply far more useful to the average user. A networked block device is a building block -- and a useful one. A networked filesystem is an immediately usable solution. For remotely accessing data, iSCSI+fs is quite simply more overhead than a networked fs. With iSCSI you are doing local VFS -> local blkdev -> network whereas a networked filesystem is local VFS -> network iSCSI+fs also adds new manageability issues, because unless the filesystem is single-computer (such as diskless iSCSI root fs), you still need to go across the network _once again_ to handle filesystem locking and coordination issues. There is no _fundamental_ reason why remote shared storage via iSCSI OSD is any faster than a networked filesystem. SCSI-over-IP has its uses. Absolutely. It needed to be standardized. But let's not pretend iSCSI is anything more than what it is. Its a bloated cat5 cabling standard :) Jeff - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
aic7xxx build failure
Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx compilation: <-- snip --> $ make O=../out/x86-full ... SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h CC drivers/scsi/aic7xxx/aic79xx_core.o gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory gcc: no input files make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1 <-- snip --> Next "make" run brings the same failure in drivers/scsi/aic7xxx/aic7xxx_core.c. With the third "make" it works. It might compile for people with SMP systems using -j? cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/24][RFC] scsi_tgt: use of sense accessors
FUJITA Tomonori wrote: On Tue, 5 Feb 2008 11:21:33 -0500 Pete Wyckoff <[EMAIL PROTECTED]> wrote: [EMAIL PROTECTED] wrote on Mon, 04 Feb 2008 19:53 +0200: FIXME: I need help with this driver (Pete?) I used scsi_sense() in a none const way. But since scsi_tgt is the ULD here, it can just access it's own sense buffer directly. I did not use scsi_eh_cpy_sense() because I did not want the extra copy. Pete will want to use a 260 bytes buffer here. Signed-off-by: Boaz Harrosh <[EMAIL PROTECTED]> Need-help-from: Pete Wyckoff <[EMAIL PROTECTED]> FYI, I never use scsi_tgt. Only just pure userspace on the target, and a dumb ethernet NIC that does not know it is speaking any form of SCSI. Seems that many people misunderstand STGT iSCSI (and iSER), FCoE, and SRP (not implemented yet) software target drivers. They don't use the tgt kernel module. They just run in user space like user-space nfs daemon. FWIW, some AHCI and other SATA chips implement ATA target mode. I'm watching this SCSI work with interest, hoping that many of the concepts (and code?) can be applied to SATA as well. If for no other reason than I can build a cheap ATA protocol analyzer, or bridge. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
On 5-02-2008 14:38, "FUJITA Tomonori" <[EMAIL PROTECTED]> wrote: > On Tue, 05 Feb 2008 08:14:01 +0100 > Tomasz Chmielewski <[EMAIL PROTECTED]> wrote: > >> James Bottomley schrieb: >> >>> These are both features being independently worked on, are they not? >>> Even if they weren't, the combination of the size of SCST in kernel plus >>> the problem of having to find a migration path for the current STGT >>> users still looks to me to involve the greater amount of work. >> >> I don't want to be mean, but does anyone actually use STGT in >> production? Seriously? >> >> In the latest development version of STGT, it's only possible to stop >> the tgtd target daemon using KILL / 9 signal - which also means all >> iSCSI initiator connections are corrupted when tgtd target daemon is >> started again (kernel upgrade, target daemon upgrade, server reboot etc.). > > I don't know what "iSCSI initiator connections are corrupted" > mean. But if you reboot a server, how can an iSCSI target > implementation keep iSCSI tcp connections? > > >> Imagine you have to reboot all your NFS clients when you reboot your NFS >> server. Not only that - your data is probably corrupted, or at least the >> filesystem deserves checking... Don't know if matters, but in my setup (iscsi on top of drbd+heartbeat) rebooting the primary server doesn't affect my iscsi traffic, SCST correctly manages stop/crash, by sending unit attention to clients on reconnect. Drbd+heartbeat correctly manages those things too. Still from an end-user POV, i was able to reboot/survive a crash only with SCST, IETD still has reconnect problems and STGT are even worst. Regards, --matteo - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Bart Van Assche wrote: > As you probably know there is a trend in enterprise computing towards > networked storage. This is illustrated by the emergence during the > past few years of standards like SRP (SCSI RDMA Protocol), iSCSI > (Internet SCSI) and iSER (iSCSI Extensions for RDMA). Two different > pieces of software are necessary to make networked storage possible: > initiator software and target software. As far as I know there exist > three different SCSI target implementations for Linux: > - The iSCSI Enterprise Target Daemon (IETD, > http://iscsitarget.sourceforge.net/); > - The Linux SCSI Target Framework (STGT, http://stgt.berlios.de/); > - The Generic SCSI Target Middle Level for Linux project (SCST, > http://scst.sourceforge.net/). > Since I was wondering which SCSI target software would be best suited > for an InfiniBand network, I started evaluating the STGT and SCST SCSI > target implementations. Apparently the performance difference between > STGT and SCST is small on 100 Mbit/s and 1 Gbit/s Ethernet networks, > but the SCST target software outperforms the STGT software on an > InfiniBand network. See also the following thread for the details: > http://sourceforge.net/mailarchive/forum.php?thread_name=e2e108260801170127w2937b2afg9bef324efa945e43%40mail.gmail.com&forum_name=scst-devel. > > Sorry for the late response (but better late than never). One may claim that STGT should have lower performance than SCST because its data path is from userspace. However, your results show that for non-IB transports, they both show the same numbers. Furthermore, with IB there shouldn't be any additional difference between the 2 targets because data transfer from userspace is as efficient as data transfer from kernel space. The only explanation that I see is that fine tuning for iSCSI & iSER is required. As was already mentioned in this thread, with SDR you can get ~900 MB/sec with iSER (on STGT). Erez - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Bart Van Assche wrote: > On Jan 30, 2008 12:32 AM, FUJITA Tomonori <[EMAIL PROTECTED]> wrote: > >> iSER has parameters to limit the maximum size of RDMA (it needs to >> repeat RDMA with a poor configuration)? >> > > Please specify which parameters you are referring to. As you know I > had already repeated my tests with ridiculously high values for the > following iSER parameters: FirstBurstLength, MaxBurstLength and > MaxRecvDataSegmentLength (16 MB, which is more than the 1 MB block > size specified to dd). > > Using such large values for FirstBurstLength will give you poor performance numbers for WRITE commands (with iSER). FirstBurstLength means how much data should you send as unsolicited data (i.e. without RDMA). It means that your WRITE commands were sent without RDMA. Erez - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/24][RFC] scsi_tgt: use of sense accessors
On Tue, 5 Feb 2008 11:21:33 -0500 Pete Wyckoff <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote on Mon, 04 Feb 2008 19:53 +0200: > > FIXME: I need help with this driver (Pete?) > > I used scsi_sense() in a none const way. But since > > scsi_tgt is the ULD here, it can just access it's own sense > > buffer directly. I did not use scsi_eh_cpy_sense() because > > I did not want the extra copy. Pete will want to use a 260 > > bytes buffer here. > > > > Signed-off-by: Boaz Harrosh <[EMAIL PROTECTED]> > > Need-help-from: Pete Wyckoff <[EMAIL PROTECTED]> > > FYI, I never use scsi_tgt. Only just pure userspace on the target, > and a dumb ethernet NIC that does not know it is speaking any form > of SCSI. Seems that many people misunderstand STGT iSCSI (and iSER), FCoE, and SRP (not implemented yet) software target drivers. They don't use the tgt kernel module. They just run in user space like user-space nfs daemon. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: (fwd) Bug#11922: I/O error on blank tapes
On Mon, 4 Feb 2008, James Bottomley wrote: > > On Mon, 2008-02-04 at 22:28 +0100, Borislav Petkov wrote: > > On Mon, Feb 04, 2008 at 03:22:06PM +0100, maximilian attems wrote: > > > > (Added Bart to CC) > > > > > hello borislav, > > > > > > may i forward you that *old* Debian kernel bug, > > > have seen you working on ide-tape: > > > http://bugs.debian.org/11922 > > > no we don't carry any ide patches anymore. > > > > > > maybe you've already fixed it in latest? > > > > > > thanks > > > > > > -- > > > maks > > > > > > - Forwarded message from Stephen Kitt <[EMAIL PROTECTED]> - > > > > > > Subject: Bug#11922: I/O error on blank tapes > > > Date: Sat, 1 Dec 2007 19:06:18 +0100 > > > From: Stephen Kitt <[EMAIL PROTECTED]> > > > To: [EMAIL PROTECTED] > > > > > > Hi, > > > > > > This does still occur with 2.6.22; with a blank tape in my HP DDS-4 drive: > > > > > > $ tar tzvf /dev/nst0 > > > tar: /dev/nst0: Cannot read: Input/output error > > That's a SCSI tape, not an IDE one. I cc'd the SCSI list > This is not a bug, it is a feature. There is _nothing_ on the tape and if you try to read something, you get an error. The same thing applies to reading after the last filemark. Note that after writing a filemark at the beginning of the tape, the situation is different. Now there is a file and the normal EOF semantics apply although there still is no data. I admit that the error return could be more descriptive but the st driver tries to be compatible with other Unices. The behavior can be changed if Linux does not match other Unices. I don't remember if I have tested just this with other Unices. I will try to test this with Tru64 tomorrow. If anyone has data on other Unices, it would be helpful. -- Kai - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bugfix for an underflow condition in usb storage & isd200.c
On Tue, Feb 05 2008 at 17:42 +0200, Alan Stern <[EMAIL PROTECTED]> wrote: > On Tue, 5 Feb 2008, Boaz Harrosh wrote: > >>> However the interface to usb_stor_access_xfer_buf() will have to change >>> slightly. Right now if it sees that *sgptr is NULL, it assumes this >>> means it should start at the beginning of the s-g buffer. But with >>> Boaz's change, *sgptr == NULL means the transfer has reached the end of >>> the buffer. So I'll have to go through and audit all the callers. >>> >>> Alan Stern >>> >>> - >> No it does not, this as not changed. Please look again. > > You look again. Your patched code goes like this: > > struct scatterlist *sg = *sgptr; > > if (!sg) > sg = (struct scatterlist *) srb->request_buffer; > > Hence if *sgptr is NULL upon entry, it is taken to mean that the > transfer should start at the beginning of the s-g buffer. > > /* This loop handles a single s-g list entry, which may >* include multiple pages. Find the initial page structure >* and the starting offset within the page, and update >* the *offset and *index values for the next loop. */ > cnt = 0; > while (cnt < buflen && sg) { > > Hence if sg is NULL, it indicates the end of the buffer has been > reached. And then down near the end of the routine: > > *sgptr = sg; > > Hence if the end is reached and the caller makes another call to try > transferring more data, the additional data will get stored back at the > beginning of the buffer. > That behavior did not change. In the likely event of sg-length matching bufflen the last call to sg_next will return NULL, and will be returned in *sgptr. The end condition of an outside caller is either sum of returned counts reaching some target count, or *sgptr return to NULL. The code before the sg change would have *indexptr >= some_sg_count, but now we do not have an index we have a pointer and the termination condition is *sgptr == NULL. So I guess you are afraid that calling code that was converted from index to pointer, was done wrong, and where something did *indexptr >= some_sg_count before, does not do *sgptr == NULL now. So I guess, yes you are welcome to check. I did not do the conversion so I can not comment. >> Note that this patch was tested and working. It is a bug >> in v2.2.24 and it should be accepted already. One way or >> the other. >> >> Callers of usb_stor_access_xfer_buf() need not change. >> Matthew Dharm should decide if he wants the WARN_ON in >> usb_stor_set_xfer_buf() or not and be done with it. >> >> I have found and fixed the bug, but it is not a SCSI >> related bug, and it is not do to any scsi changes. It >> is a bug from the SG changes of early 2.6.24. Please >> take it through the USB tree. Feel free to change it >> the way you like it, and submit it. > > I will post a new version of this which handles all these issues. > Expect it in a day or so. > Please do. Thanks, that would be better. Don't forget to also submit a patch for current head-of-line. It's exactly the same fix but has diff conflicts with surrounding code. > Alan Stern > Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
On Tue, 05 Feb 2008 17:07:07 +0100 Tomasz Chmielewski <[EMAIL PROTECTED]> wrote: > FUJITA Tomonori schrieb: > > On Tue, 05 Feb 2008 08:14:01 +0100 > > Tomasz Chmielewski <[EMAIL PROTECTED]> wrote: > > > >> James Bottomley schrieb: > >> > >>> These are both features being independently worked on, are they not? > >>> Even if they weren't, the combination of the size of SCST in kernel plus > >>> the problem of having to find a migration path for the current STGT > >>> users still looks to me to involve the greater amount of work. > >> I don't want to be mean, but does anyone actually use STGT in > >> production? Seriously? > >> > >> In the latest development version of STGT, it's only possible to stop > >> the tgtd target daemon using KILL / 9 signal - which also means all > >> iSCSI initiator connections are corrupted when tgtd target daemon is > >> started again (kernel upgrade, target daemon upgrade, server reboot etc.). > > > > I don't know what "iSCSI initiator connections are corrupted" > > mean. But if you reboot a server, how can an iSCSI target > > implementation keep iSCSI tcp connections? > > The problem with tgtd is that you can't start it (configured) in an > "atomic" way. > Usually, one will start tgtd and it's configuration in a script (I > replaced some parameters with "..." to make it shorter and more readable): Thanks for the details. So the way to stop the daemon is not related with your problem. It's easily fixable. Can you start a new thread about this on stgt-devel mailing list? When we agree on the interface to start the daemon, I'll implement it. > tgtd > tgtadm --op new ... > tgtadm --lld iscsi --op new ... (snip) > So the only way to start/restart tgtd reliably is to do hacks which are > needed with yet another iSCSI kernel implementation (IET): use iptables. > > iptables > tgtd > sleep 1 > tgtadm --op new ... > tgtadm --lld iscsi --op new ... > iptables > > > A bit ugly, isn't it? > Having to tinker with a firewall in order to start a daemon is by no > means a sign of a well-tested and mature project. > > That's why I asked how many people use stgt in a production environment > - James was worried about a potential migration path for current users. I don't know how many people use stgt in a production environment but I'm not sure that this problem prevents many people from using it in a production environment. You want to reboot a server running target devices while initiators connect to it. Rebooting the target server behind the initiators seldom works. System adminstorators in my workplace reboot storage devices once a year and tell us to shut down the initiator machines that use them before that. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24 regression w/ QLA2300
On Tue, 05 Feb 2008, Alan D. Brunelle wrote: > > and send the resultant kernel logs? > > Here's the output to the console (if there are other logs you need, > let me know). I'll try the patch next, and sorry, hadn't realized > merges were still coming in under 2.6.24 in Linus' tree... > > QLogic Fibre Channel HBA Driver > ACPI: PCI Interrupt :40:01.0[A] -> GSI 38 (level, low) -> IRQ 58 > qla2xxx :40:01.0: Found an ISP2312, irq 58, iobase 0xc000a0041000 > qla2xxx :40:01.0: Configuring PCI space... > qla2x00_get_flash_version(): Unrecognized code type ff at pcids da1c. > qla2x00_get_flash_version(): Unrecognized code type ff at pcids 1f61c. > qla2xxx :40:01.0: Configure NVRAM parameters... > qla2xxx :40:01.0: Verifying loaded RISC code... > scsi(14): Load RISC code > scsi(14): Verifying Checksum of loaded RISC code. > scsi(14): Checksum OK, start firmware. > qla2xxx :40:01.0: Allocated (412 KB) for firmware dump... > scsi(14): Issue init firmware. > qla2x00_mailbox_command(14): FAILED. mbx0=4001, mbx1=0, mbx2=ba8a, > cmd=48 Ok, this is what I would have expected with the linus' tree prior to the fix. I just double-checked, the fix in question has yet to make it's way to Linus' tree. It's currently in scsi-misc-2.6: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=a571fdf7caa010e17f6a70c0c52e0992e87af7db which should filter up to linux-2.6.git during Linus' next pull. thanks, av - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24 regression w/ QLA2300
Andrew Vasquez wrote: > On Tue, 05 Feb 2008, Andrew Vasquez wrote: > >> On Tue, 05 Feb 2008, Alan D. Brunelle wrote: >> >>> commit 9b73e76f3cf63379dcf45fcd4f112f5812418d0a >>> Merge: 50d9a12... 23c3e29... >>> Author: Linus Torvalds <[EMAIL PROTECTED]> >>> Date: Fri Jan 25 17:19:08 2008 -0800 >>> >>> Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 >>> >>> * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: >>> (200 commits) >>> >>> I believe a regression was introduced. I'm running on a 4-way IA64, >>> with straight 2.6.24 and 2 dual-port cards: >>> >>> 40:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) >>> 40:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) >>> c0:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) >>> c0:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) >>> >>> the adapters failed initialization. In particular, I narrowed it down >>> to failing the qla2x00_mbox_command call within qla2x00_init_firmware >>> function. I went and removed the qla2x00-related parts of this (large-ish) >>> merge, and the 4 ports initialized just fine. >> Could you load the (default 2.6.24) driver with >> ql2xextended_error_logging modules parameter set: >> >> # insmod qla2xxx ql2xextended_error_logging=1 >> >> and send the resultant kernel logs? > > Could you tray the patch referenced here: > > qla2xxx: Correct issue where incorrect init-fw mailbox command was used on > non-NPIV capable ISPs. > http://article.gmane.org/gmane.linux.scsi/38240 > > Thanks, av The referenced patch worked fine Andrew, thanks much! Alan - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24 regression w/ QLA2300
Andrew Vasquez wrote: > On Tue, 05 Feb 2008, Alan D. Brunelle wrote: > >> commit 9b73e76f3cf63379dcf45fcd4f112f5812418d0a >> Merge: 50d9a12... 23c3e29... >> Author: Linus Torvalds <[EMAIL PROTECTED]> >> Date: Fri Jan 25 17:19:08 2008 -0800 >> >> Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 >> >> * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 >> commits) >> >> I believe a regression was introduced. I'm running on a 4-way IA64, >> with straight 2.6.24 and 2 dual-port cards: >> >> 40:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) >> 40:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) >> c0:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) >> c0:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) >> >> the adapters failed initialization. In particular, I narrowed it down >> to failing the qla2x00_mbox_command call within qla2x00_init_firmware >> function. I went and removed the qla2x00-related parts of this (large-ish) >> merge, and the 4 ports initialized just fine. > > Could you load the (default 2.6.24) driver with > ql2xextended_error_logging modules parameter set: > > # insmod qla2xxx ql2xextended_error_logging=1 > > and send the resultant kernel logs? Here's the output to the console (if there are other logs you need, let me know). I'll try the patch next, and sorry, hadn't realized merges were still coming in under 2.6.24 in Linus' tree... QLogic Fibre Channel HBA Driver ACPI: PCI Interrupt :40:01.0[A] -> GSI 38 (level, low) -> IRQ 58 qla2xxx :40:01.0: Found an ISP2312, irq 58, iobase 0xc000a0041000 qla2xxx :40:01.0: Configuring PCI space... qla2x00_get_flash_version(): Unrecognized code type ff at pcids da1c. qla2x00_get_flash_version(): Unrecognized code type ff at pcids 1f61c. qla2xxx :40:01.0: Configure NVRAM parameters... qla2xxx :40:01.0: Verifying loaded RISC code... scsi(14): Load RISC code scsi(14): Verifying Checksum of loaded RISC code. scsi(14): Checksum OK, start firmware. qla2xxx :40:01.0: Allocated (412 KB) for firmware dump... scsi(14): Issue init firmware. qla2x00_mailbox_command(14): FAILED. mbx0=4001, mbx1=0, mbx2=ba8a, cmd=48 qla2x00_init_firmware(14): failed=102 mb0=4001. scsi(14): Init firmware FAILED . qla2xxx :40:01.0: Failed to initialize adapter scsi(14): Failed to initialize adapter - Adapter flags 10. ACPI: PCI Interrupt :40:01.1[B] -> GSI 39 (level, low) -> IRQ 59 qla2xxx :40:01.1: Found an ISP2312, irq 59, iobase 0xc000a004 qla2xxx :40:01.1: Configuring PCI space... qla2x00_get_flash_version(): Unrecognized code type ff at pcids da1c. qla2x00_get_flash_version(): Unrecognized code type ff at pcids 1f61c. qla2xxx :40:01.1: Configure NVRAM parameters... qla2xxx :40:01.1: Verifying loaded RISC code... scsi(15): Load RISC code scsi(15): Verifying Checksum of loaded RISC code. scsi(15): Checksum OK, start firmware. qla2xxx :40:01.1: Allocated (412 KB) for firmware dump... scsi(15): Issue init firmware. qla2x00_mailbox_command(15): FAILED. mbx0=4001, mbx1=0, mbx2=bac6, cmd=48 qla2x00_init_firmware(15): failed=102 mb0=4001. scsi(15): Init firmware FAILED . qla2xxx :40:01.1: Failed to initialize adapter scsi(15): Failed to initialize adapter - Adapter flags 10. ACPI: PCI Interrupt :c0:01.0[A] -> GSI 71 (level, low) -> IRQ 60 qla2xxx :c0:01.0: Found an ISP2312, irq 60, iobase 0xc000e0041000 qla2xxx :c0:01.0: Configuring PCI space... qla2x00_get_flash_version(): Unrecognized code type ff at pcids c61c. qla2x00_get_flash_version(): Unrecognized code type ff at pcids 1da1c. qla2xxx :c0:01.0: Configure NVRAM parameters... qla2xxx :c0:01.0: Verifying loaded RISC code... scsi(16): Load RISC code scsi(16): Verifying Checksum of loaded RISC code. scsi(16): Checksum OK, start firmware. qla2xxx :c0:01.0: Allocated (412 KB) for firmware dump... scsi(16): Issue init firmware. qla2x00_mailbox_command(16): FAILED. mbx0=4001, mbx1=0, mbx2=bae3, cmd=48 qla2x00_init_firmware(16): failed=102 mb0=4001. scsi(16): Init firmware FAILED . qla2xxx :c0:01.0: Failed to initialize adapter scsi(16): Failed to initialize adapter - Adapter flags 10. ACPI: PCI Interrupt :c0:01.1[B] -> GSI 72 (level, low) -> IRQ 61 qla2xxx :c0:01.1: Found an ISP2312, irq 61, iobase 0xc000e004 qla2xxx :c0:01.1: Configuring PCI space... qla2x00_get_flash_version(): Unrecognized code type ff at pcids c61c. qla2x00_get_flash_version(): Unrecognized code type ff at pcids 1da1c. qla2xxx :c0:01.1: Configure NVRAM parameters... qla2xxx :c0:01.1: Verifying loaded RISC code... scsi(17): Load RISC code scsi(17): Verifying Checksum of loaded RISC code. scsi(17): Checksum OK, start firmware. qla2xxx :c0:01.1: Allocated (412 KB) for firmware dump... sc
Re: Integration of SCST in the mainstream Linux kernel
Regarding the performance tests I promised to perform: although until now I only have been able to run two tests (STGT + iSER versus SCST + SRP), the results are interesting. I will run the remaining test cases during the next days. About the test setup: dd and xdd were used to transfer 2 GB of data between an initiator system and a target system via direct I/O over an SDR InfiniBand network (1GB/s). The block size varied between 512 bytes and 1 GB, but was always a power of two. Expected results: * The measurement results are consistent with the numbers I published earlier. * During data transfers all data is transferred in blocks between 4 KB and 32 KB in size (according to the SCST statistics). * For small and medium block sizes (<= 32 KB) transfer times can be modeled very well by the following formula: (transfer time) = (setup latency) + (bytes transferred)/(bandwidth). The correlation numbers are very close to one. * The latency and bandwidth parameters depend on the test tool (dd versus xdd), on the kind of test performed (reading versus writing), on the SCSI target and on the communication protocol. * When using RDMA (iSER or SRP), SCST has a lower latency and higher bandwidth than STGT (results from linear regression for block sizes <= 32 KB): Test Latency(us) Bandwidth (MB/s) Correlation STGT+iSER, read, dd 64 560 0.95 STGT+iSER, read, xdd 65 556 0.94 STGT+iSER, write, dd 53 394 0.71 STGT+iSER, write, xdd 54 445 0.59 SCST+SRP, read, dd39 657 0.83 SCST+SRP, read, xdd 41 668 0.87 SCST+SRP, write, dd 52 449 0.62 SCST+SRP, write, xdd 52 516 0.77 Results that I did not expect: * A block transfer size of 1 MB is not enough to measure the maximal throughput. The maximal throughput is only reached at much higher block sizes (about 10 MB for SCST + SRP and about 100 MB for STGT + iSER). * There is one case where dd and xdd results are inconsistent: when reading via SCST + SRP and for block sizes of about 1 MB. * For block sizes > 64 KB the measurements differ from the model. This is probably because all initiator-target transfers happen in blocks of 32 KB or less. For the details and some graphs, see also http://software.qlayer.com/display/iSCSI/Measurements . Bart Van Assche. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24 regression w/ QLA2300
On Tue, 05 Feb 2008, Andrew Vasquez wrote: > > Could you load the (default 2.6.24) driver with > > ql2xextended_error_logging modules parameter set: > > > > # insmod qla2xxx ql2xextended_error_logging=1 > > > > and send the resultant kernel logs? > > Could you tray the patch referenced here: > > qla2xxx: Correct issue where incorrect init-fw mailbox command was used on > non-NPIV capable ISPs. > http://article.gmane.org/gmane.linux.scsi/38240 BTW: the regression in question is not present in vanilla 2.6.24. Instead it was introduced early on in the 2.6.25 merge-window. Linus' tree currently has the patch referenced above as well. -- av - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
On Tue, 2008-02-05 at 17:07 +0100, Tomasz Chmielewski wrote: > FUJITA Tomonori schrieb: > > On Tue, 05 Feb 2008 08:14:01 +0100 > > Tomasz Chmielewski <[EMAIL PROTECTED]> wrote: > > > >> James Bottomley schrieb: > >> > >>> These are both features being independently worked on, are they not? > >>> Even if they weren't, the combination of the size of SCST in kernel plus > >>> the problem of having to find a migration path for the current STGT > >>> users still looks to me to involve the greater amount of work. > >> I don't want to be mean, but does anyone actually use STGT in > >> production? Seriously? > >> > >> In the latest development version of STGT, it's only possible to stop > >> the tgtd target daemon using KILL / 9 signal - which also means all > >> iSCSI initiator connections are corrupted when tgtd target daemon is > >> started again (kernel upgrade, target daemon upgrade, server reboot etc.). > > > > I don't know what "iSCSI initiator connections are corrupted" > > mean. But if you reboot a server, how can an iSCSI target > > implementation keep iSCSI tcp connections? > > The problem with tgtd is that you can't start it (configured) in an > "atomic" way. > Usually, one will start tgtd and it's configuration in a script (I > replaced some parameters with "..." to make it shorter and more readable): > > > tgtd > tgtadm --op new ... > tgtadm --lld iscsi --op new ... > > > However, this won't work - tgtd goes immediately in the background as it > is still starting, and the first tgtadm commands will fail: this should be a easy fix. start tgtd, get port setup ready in forked process, then signal its parent that ready to quit. or set port ready in parent, fork and pass to daemon. > > # bash -x tgtd-start > + tgtd > + tgtadm --op new --mode target ... > tgtadm: can't connect to the tgt daemon, Connection refused > tgtadm: can't send the request to the tgt daemon, Transport endpoint is > not connected > + tgtadm --lld iscsi --op new --mode account ... > tgtadm: can't connect to the tgt daemon, Connection refused > tgtadm: can't send the request to the tgt daemon, Transport endpoint is > not connected > + tgtadm --lld iscsi --op bind --mode account --tid 1 ... > tgtadm: can't find the target > + tgtadm --op new --mode logicalunit --tid 1 --lun 1 ... > tgtadm: can't find the target > + tgtadm --op bind --mode target --tid 1 -I ALL > tgtadm: can't find the target > + tgtadm --op new --mode target --tid 2 ... > + tgtadm --op new --mode logicalunit --tid 2 --lun 1 ... > + tgtadm --op bind --mode target --tid 2 -I ALL > > > OK, if tgtd takes longer to start, perhaps it's a good idea to sleep a > second right after tgtd? > > tgtd > sleep 1 > tgtadm --op new ... > tgtadm --lld iscsi --op new ... > > > No, it is not a good idea - if tgtd listens on port 3260 *and* is > unconfigured yet, any reconnecting initiator will fail, like below: this is another easy fix. tgtd started with unconfigured status and then a tgtadm can configure it and turn it into ready status. those are really minor usability issue. ( i know it is painful for user, i agree) the major problem here is to discuss in architectural wise, which one is better... linux kernel should have one implementation that is good from foundation... > > end_request: I/O error, dev sdb, sector 7045192 > Buffer I/O error on device sdb, logical block 880649 > lost page write due to I/O error on sdb > Aborting journal on device sdb. > ext3_abort called. > EXT3-fs error (device sdb): ext3_journal_start_sb: Detected aborted journal > Remounting filesystem read-only > end_request: I/O error, dev sdb, sector 7045880 > Buffer I/O error on device sdb, logical block 880735 > lost page write due to I/O error on sdb > end_request: I/O error, dev sdb, sector 6728 > Buffer I/O error on device sdb, logical block 841 > lost page write due to I/O error on sdb > end_request: I/O error, dev sdb, sector 7045192 > Buffer I/O error on device sdb, logical block 880649 > lost page write due to I/O error on sdb > end_request: I/O error, dev sdb, sector 7045880 > Buffer I/O error on device sdb, logical block 880735 > lost page write due to I/O error on sdb > __journal_remove_journal_head: freeing b_frozen_data > __journal_remove_journal_head: freeing b_frozen_data > > > Ouch. > > So the only way to start/restart tgtd reliably is to do hacks which are > needed with yet another iSCSI kernel implementation (IET): use iptables. > > iptables > tgtd > sleep 1 > tgtadm --op new ... > tgtadm --lld iscsi --op new ... > iptables > > > A bit ugly, isn't it? > Having to tinker with a firewall in order to start a daemon is by no > means a sign of a well-tested and mature project. > > That's why I asked how many people use stgt in a production environment > - James was worried about a potential migration path for current users. > > > > -- > Tomasz Chmielewski > http://wpkg.org > > > ---
Re: [PATCH 21/24][RFC] scsi_tgt: use of sense accessors
[EMAIL PROTECTED] wrote on Mon, 04 Feb 2008 19:53 +0200: > FIXME: I need help with this driver (Pete?) > I used scsi_sense() in a none const way. But since > scsi_tgt is the ULD here, it can just access it's own sense > buffer directly. I did not use scsi_eh_cpy_sense() because > I did not want the extra copy. Pete will want to use a 260 > bytes buffer here. > > Signed-off-by: Boaz Harrosh <[EMAIL PROTECTED]> > Need-help-from: Pete Wyckoff <[EMAIL PROTECTED]> FYI, I never use scsi_tgt. Only just pure userspace on the target, and a dumb ethernet NIC that does not know it is speaking any form of SCSI. People who need scsi_tgt have real target-enabled NICs like the fancy qla4xxx. Those act as SCSI targets across FC or IP or whatever and bring commands into the kernel, which then relays them to a userspace tgtd process, which does the read/write as necessary, and returns a result code to the NIC to ship back across FC. So sorry, I won't take a guess at what has to happen here. But yeah, you are right that an OSD target implementation would at times need a sense buffer bigger than 96. Protocol maximum length for all sense data is 264. -- Pete - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24 regression w/ QLA2300
On Tue, 05 Feb 2008, Andrew Vasquez wrote: > On Tue, 05 Feb 2008, Alan D. Brunelle wrote: > > > commit 9b73e76f3cf63379dcf45fcd4f112f5812418d0a > > Merge: 50d9a12... 23c3e29... > > Author: Linus Torvalds <[EMAIL PROTECTED]> > > Date: Fri Jan 25 17:19:08 2008 -0800 > > > > Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 > > > > * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: > > (200 commits) > > > > I believe a regression was introduced. I'm running on a 4-way IA64, > > with straight 2.6.24 and 2 dual-port cards: > > > > 40:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) > > 40:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) > > c0:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) > > c0:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) > > > > the adapters failed initialization. In particular, I narrowed it down > > to failing the qla2x00_mbox_command call within qla2x00_init_firmware > > function. I went and removed the qla2x00-related parts of this (large-ish) > > merge, and the 4 ports initialized just fine. > > Could you load the (default 2.6.24) driver with > ql2xextended_error_logging modules parameter set: > > # insmod qla2xxx ql2xextended_error_logging=1 > > and send the resultant kernel logs? Could you tray the patch referenced here: qla2xxx: Correct issue where incorrect init-fw mailbox command was used on non-NPIV capable ISPs. http://article.gmane.org/gmane.linux.scsi/38240 Thanks, av --- qla2xxx: Correct issue where incorrect init-fw mailbox command was used on non-NPIV capable ISPs. BIT_2 of the firmware attributes is only valid on FW-interface-2 type HBAs. Code in commit c48339decceec8e011498b0fc4c7c7d8b2ea06c1 would cause the incorrect initialize-firmware mailbox command to be issued for non-NPIV capable ISPs. Correct this by reverting to previously used (and correct) pre-condition 'if' check. Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]> --- drivers/scsi/qla2xxx/qla_mbx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c index 0c10c0b..99d29ff 100644 --- a/drivers/scsi/qla2xxx/qla_mbx.c +++ b/drivers/scsi/qla2xxx/qla_mbx.c @@ -980,7 +980,7 @@ qla2x00_init_firmware(scsi_qla_host_t *ha, uint16_t size) DEBUG11(printk("qla2x00_init_firmware(%ld): entered.\n", ha->host_no)); - if (ha->fw_attributes & BIT_2) + if (ha->flags.npiv_supported) mcp->mb[0] = MBC_MID_INITIALIZE_FIRMWARE; else mcp->mb[0] = MBC_INITIALIZE_FIRMWARE; -- 1.5.4.rc5.5.gab98 - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
FUJITA Tomonori schrieb: On Tue, 05 Feb 2008 08:14:01 +0100 Tomasz Chmielewski <[EMAIL PROTECTED]> wrote: James Bottomley schrieb: These are both features being independently worked on, are they not? Even if they weren't, the combination of the size of SCST in kernel plus the problem of having to find a migration path for the current STGT users still looks to me to involve the greater amount of work. I don't want to be mean, but does anyone actually use STGT in production? Seriously? In the latest development version of STGT, it's only possible to stop the tgtd target daemon using KILL / 9 signal - which also means all iSCSI initiator connections are corrupted when tgtd target daemon is started again (kernel upgrade, target daemon upgrade, server reboot etc.). I don't know what "iSCSI initiator connections are corrupted" mean. But if you reboot a server, how can an iSCSI target implementation keep iSCSI tcp connections? The problem with tgtd is that you can't start it (configured) in an "atomic" way. Usually, one will start tgtd and it's configuration in a script (I replaced some parameters with "..." to make it shorter and more readable): tgtd tgtadm --op new ... tgtadm --lld iscsi --op new ... However, this won't work - tgtd goes immediately in the background as it is still starting, and the first tgtadm commands will fail: # bash -x tgtd-start + tgtd + tgtadm --op new --mode target ... tgtadm: can't connect to the tgt daemon, Connection refused tgtadm: can't send the request to the tgt daemon, Transport endpoint is not connected + tgtadm --lld iscsi --op new --mode account ... tgtadm: can't connect to the tgt daemon, Connection refused tgtadm: can't send the request to the tgt daemon, Transport endpoint is not connected + tgtadm --lld iscsi --op bind --mode account --tid 1 ... tgtadm: can't find the target + tgtadm --op new --mode logicalunit --tid 1 --lun 1 ... tgtadm: can't find the target + tgtadm --op bind --mode target --tid 1 -I ALL tgtadm: can't find the target + tgtadm --op new --mode target --tid 2 ... + tgtadm --op new --mode logicalunit --tid 2 --lun 1 ... + tgtadm --op bind --mode target --tid 2 -I ALL OK, if tgtd takes longer to start, perhaps it's a good idea to sleep a second right after tgtd? tgtd sleep 1 tgtadm --op new ... tgtadm --lld iscsi --op new ... No, it is not a good idea - if tgtd listens on port 3260 *and* is unconfigured yet, any reconnecting initiator will fail, like below: end_request: I/O error, dev sdb, sector 7045192 Buffer I/O error on device sdb, logical block 880649 lost page write due to I/O error on sdb Aborting journal on device sdb. ext3_abort called. EXT3-fs error (device sdb): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only end_request: I/O error, dev sdb, sector 7045880 Buffer I/O error on device sdb, logical block 880735 lost page write due to I/O error on sdb end_request: I/O error, dev sdb, sector 6728 Buffer I/O error on device sdb, logical block 841 lost page write due to I/O error on sdb end_request: I/O error, dev sdb, sector 7045192 Buffer I/O error on device sdb, logical block 880649 lost page write due to I/O error on sdb end_request: I/O error, dev sdb, sector 7045880 Buffer I/O error on device sdb, logical block 880735 lost page write due to I/O error on sdb __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data Ouch. So the only way to start/restart tgtd reliably is to do hacks which are needed with yet another iSCSI kernel implementation (IET): use iptables. iptables tgtd sleep 1 tgtadm --op new ... tgtadm --lld iscsi --op new ... iptables A bit ugly, isn't it? Having to tinker with a firewall in order to start a daemon is by no means a sign of a well-tested and mature project. That's why I asked how many people use stgt in a production environment - James was worried about a potential migration path for current users. -- Tomasz Chmielewski http://wpkg.org - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24 regression w/ QLA2300
On Tue, 05 Feb 2008, Alan D. Brunelle wrote: > commit 9b73e76f3cf63379dcf45fcd4f112f5812418d0a > Merge: 50d9a12... 23c3e29... > Author: Linus Torvalds <[EMAIL PROTECTED]> > Date: Fri Jan 25 17:19:08 2008 -0800 > > Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 > > * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 > commits) > > I believe a regression was introduced. I'm running on a 4-way IA64, > with straight 2.6.24 and 2 dual-port cards: > > 40:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) > 40:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) > c0:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) > c0:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 03) > > the adapters failed initialization. In particular, I narrowed it down > to failing the qla2x00_mbox_command call within qla2x00_init_firmware > function. I went and removed the qla2x00-related parts of this (large-ish) > merge, and the 4 ports initialized just fine. Could you load the (default 2.6.24) driver with ql2xextended_error_logging modules parameter set: # insmod qla2xxx ql2xextended_error_logging=1 and send the resultant kernel logs? > Specifically, reverting the "patch" below enabled the devices to initialize > properly. > > If need be, I'm certainly willing to help narrow down to the specific part in > this patch... That's a rather large patch... :( Any chance you could git-bisect? Also, could you send your .config file you are using? Thanks, Andrew Vasquez - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24][RFC] scsi_eh: Define new API for sense handling
On Mon, Feb 04 2008 at 19:33 +0200, James Bottomley <[EMAIL PROTECTED]> wrote: > On Mon, 2008-02-04 at 17:30 +0200, Boaz Harrosh wrote: >> This patch defines a new API for sense handling. All drivers will >> be converted to this API, before the sense handling implementation will >> change. API is as follows: >> >> void scsi_eh_cpy_sense(struct scsi_cmnd *cmd, void* sense, >> unsigned sense_bytes); >> To be used by drivers, when they have sense-bits >> and wants to send them to upper layer. Max size >> need not be a concern, If upper layer does not have >> enough space it will be automatically truncated. >> >> u8 *scsi_make_sense(struct scsi_cmnd *cmd); >> To be used by drivers, and scsi-midlayer. Returns a DMA-able >> sense buffer. Must be returned by scsi_return_sense(). It should >> never fail if .pre_allocate_sense && .sense_buffsize in host >> template where properly set. >> the buffer is of shost->sense_buffsize long. >> >> void *scsi_return_sense(struct scsi_cmnd *cmd, u8 *sb); >> Frees and returns the sense to the upper layer, >> copying only what's necessary. >> >> void scsi_eh_reset_sense(struct scsi_cmnd *cmd) >> Should not be used or necessary. >> >> const u8 *scsi_sense(struct scsi_cmnd *cmd) >> Used by ULDs and for inspecting the returned sense, can not >> be modified. It is only valid after a call to >> scsi_eh_cpy_sense() or a call to scsi_return_sense(). Before >> that it will/should return an empty buffer. >> >> New members at scsi host template: >> .sense_buffsize - if a driver calls scsi_make_sense() or >> scsi_eh_prep_cmnd(), This value should be none >> zero indicating the max sense size, the driver >> supports. In most cases it should be >> SCSI_SENSE_BUFFERSIZE. >> If this value is zero the driver will only call >> scsi_eh_cpy_sense(). >> >> .pre_allocate_sense - if a Driver calls scsi_make_sense() >> in .queuecommand for every cmnd, this >> should be set to true. In which case >> scsi_make_sense() will not fail because >> midlayer will fail the command allocation. >> If the drivers calls scsi_eh_prep_cmnd() >> then sense_buffsize is not Zero but this >> here is set to false. > > My initial reaction to this is that you're doing too many contortions to > ensure something we don't particularly care about: whether we can > allocate a sense buffer atomically or not. I hope that now, once you've actually seen the implementation, my motivation is clearer. Perhaps I explained it badly, but the actual code is pretty simple and contortions is not how I would describe it. The API above is just a way for drivers to say how they intend to behave, and the midlayer will accommodate easily. None of the solutions are hard and they are all simpler then what exists today. The only added complexity introduced is the initial choice. > > What all this code should be doing is simply allocating the sense buffer > in scsi_eh_prep_cmnd() using tomo's existing slab (and GFP_ATOMIC) This is what we are doing. Only allocating the sense buffer in the very unlikely event of the call to scsi_eh_prep_cmnd(). So we are in agreement here. > if > that fails, we need a return from scsi_eh_prep_cmnd() telling us. At > that point, the driver should abandon the auto request sense attempt and > instead just return the CC/UA without the DRIVER_SENSE bit set which > will trigger the eh to collect the sense for us. > This is a nightmare and a serious regression. It will cause an IO deadlock in the event of an IO error during an IO-to-free-memory scenario. The memory footprint of a system running with my patchset, after the very first request, is the same as with the current (Post Tomo) code. Only thing is, my system will preallocate a bit more memory, 96 bytes, per device scanned. This happens anyway, currently, with Tomo's code as soon as the device is used the first time. Preallocating the sense buffer during initialization eliminates the need to allocate it for every command, providing considerable performance and memory consumption benefits. All that without compromising robustness in the event of an IO error on heavily loaded systems. > Ideally, doing it this way might mean we could even dump the > sense_buffer pointer from the command (although I don't see that as > necessary). > > This solves the 99% case without getting into preallocation contortions. > after the final patch you can see that I have ditched the sense_buffer pointer without sacrificing anything in reliability, and absolutely got rid of any sense al
Re: [PATCH] bugfix for an underflow condition in usb storage & isd200.c
On Tue, 5 Feb 2008, Boaz Harrosh wrote: > > However the interface to usb_stor_access_xfer_buf() will have to change > > slightly. Right now if it sees that *sgptr is NULL, it assumes this > > means it should start at the beginning of the s-g buffer. But with > > Boaz's change, *sgptr == NULL means the transfer has reached the end of > > the buffer. So I'll have to go through and audit all the callers. > > > > Alan Stern > > > > - > No it does not, this as not changed. Please look again. You look again. Your patched code goes like this: struct scatterlist *sg = *sgptr; if (!sg) sg = (struct scatterlist *) srb->request_buffer; Hence if *sgptr is NULL upon entry, it is taken to mean that the transfer should start at the beginning of the s-g buffer. /* This loop handles a single s-g list entry, which may * include multiple pages. Find the initial page structure * and the starting offset within the page, and update * the *offset and *index values for the next loop. */ cnt = 0; while (cnt < buflen && sg) { Hence if sg is NULL, it indicates the end of the buffer has been reached. And then down near the end of the routine: *sgptr = sg; Hence if the end is reached and the caller makes another call to try transferring more data, the additional data will get stored back at the beginning of the buffer. > Note that this patch was tested and working. It is a bug > in v2.2.24 and it should be accepted already. One way or > the other. > > Callers of usb_stor_access_xfer_buf() need not change. > Matthew Dharm should decide if he wants the WARN_ON in > usb_stor_set_xfer_buf() or not and be done with it. > > I have found and fixed the bug, but it is not a SCSI > related bug, and it is not do to any scsi changes. It > is a bug from the SG changes of early 2.6.24. Please > take it through the USB tree. Feel free to change it > the way you like it, and submit it. I will post a new version of this which handles all these issues. Expect it in a day or so. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [SCSI] fix BUG when sum(scatterlist) > bufflen
When sending a SCSI command to a tape drive via the SCSI Generic (sg) driver, if the command has a data transfer length more than scatter_elem_sz (32 KB default) and not a multiple of 512, then I either hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else the command never completes (depending on the LLDD). When constructing scatterlists, the sg driver rounds up the scatterlist element sizes to be a multiple of 512. This can result in sum(scatterlist lengths) > bufflen. In this case, scsi_req_map_sg() incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to bufflen. When the command completes, req_bio_endio() detects that bio->bi_size != 0, and so it doesn't call bio_endio(). This causes the command to be resubmitted, resulting in BUG_ON or the command never completing. This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather than to sum(scatterlist lengths), which fixes the problem. Signed-off-by: Tony Battersby <[EMAIL PROTECTED]> --- --- linux-2.6.24-git14/drivers/scsi/scsi_lib.c.orig 2008-02-05 09:33:05.0 -0500 +++ linux-2.6.24-git14/drivers/scsi/scsi_lib.c 2008-02-05 09:33:10.0 -0500 @@ -301,7 +301,6 @@ static int scsi_req_map_sg(struct reques page = sg_page(sg); off = sg->offset; len = sg->length; - data_len += len; while (len > 0 && data_len > 0) { /* - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] enclosure: add support for enclosure services
On Mon, 2008-02-04 at 21:35 -0800, Luben Tuikov wrote: > > > I guess the same could be said for STGT and SCST, > > right? > > > > You mean both of their kernel pieces are modular? > > That's correct. > > No, you know very well what I mean. > > By the same logic you're preaching to include your > solution part of the kernel, you can also apply to > SCST. Ah, but it's not ... the current patch is merely exporting an interface. The debate in STGT vs SCST is not whether to export an interface but where to draw the line. You could also argue in the same vein that sd is redundant because a filesystem could talk directly to the device via /dev/sgX (in fact OSD based filesystems already do this). The argument is true, but misses the bigger picture that the interfaces exported by sd are more portable (apply to non-SCSI block devices) and easier to use. > > > Yes, for which the transport layer, implements the > > > scsi device node for the SES device. It doesn't > > really > > > matter if the SCSI commands sent to the SES device go > > > over SGPIO or FC or SAS or Bluetooth or I2C, etc, the > > > transport layer can implement that and present the > > > /dev/sgX node. > > > > But it does matter if the enclosure device doesn't > > speak SCSI. > > Enclosure management isn't as simple as you're > portraying it here. The enclosure management > device speaks either SES or SAF-TE. The transport > protocol to access it could be SGPIO or I2C or... Look, just read the spec; SGPIO is a bus for driving enclosures ... it doesn't require SES or SAF-TE or even any SCSI protocol. > > SGPIO > > isn't a SCSI protocol ... it's a general purpose > > serial bus protocol. > > It's pretty simple and register based, but it might (or > > might not) be > > accessible via a SCSI bridge. > > I see. You've just discovered SGPIO -- good for you. > > At any rate, I told you already that what is needed > is not what you've provided but a _device node_ > exported by the kernel, either a processor or > enclosure type. Wrong ... we don't export non-SCSI devices as SCSI (with the single and rather annoying exception of ATA via SAT). James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 01/13] git-scsi-misc: fix isa/pcmcia compile problem
On Mon, 2008-02-04 at 23:53 -0800, [EMAIL PROTECTED] wrote: > From: Tejun Heo <[EMAIL PROTECTED]> > > aha152x.c and fdomain are built twice - once for the isa driver and once > for the PCMCIA one. Through #ifdefs, the compiled codes are slightly > different; thus, global symbols need to be given different names depending > on which flavor is being built. This patch adds GLOBAL() macro to > aha152x.h and fdomain.h which change the symbol depending on PCMCIA. > > This bug has always existed but has been masked by the fact the > drivers/scsi/pcmcia used subdir-(y|m) instead of obj-(y|m) which made > drivers/scsi/pcmcia/built_in.o not linked into the kernel and thus avoided > the duplicate symbols during compilation. > > [EMAIL PROTECTED]: coding-style fixes] > Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> > Tested-by: Kamalesh Babulal <[EMAIL PROTECTED]> > Cc: James Bottomley <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> An alternative fix for this is already in. Author: James Bottomley <[EMAIL PROTECTED]> Date: Fri Jan 18 17:47:56 2008 -0600 [SCSI] fix pcmcia compile problem James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
On Mon, 4 Feb 2008 20:07:01 -0600 "Chris Weiss" <[EMAIL PROTECTED]> wrote: > On Feb 4, 2008 11:30 AM, Douglas Gilbert <[EMAIL PROTECTED]> wrote: > > Alan Cox wrote: > > >> better. So for example, I personally suspect that ATA-over-ethernet is > > >> way > > >> better than some crazy SCSI-over-TCP crap, but I'm biased for simple and > > >> low-level, and against those crazy SCSI people to begin with. > > > > > > Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP > > > would probably trash iSCSI for latency if nothing else. > > > > And a variant that doesn't do ATA or IP: > > http://www.fcoe.com/ > > > > however, and interestingly enough, the open-fcoe software target > depends on scst (for now anyway) STGT also supports software FCoE target driver though it's still experimental stuff. http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12705.html It works in user space like STGT's iSCSI (and iSER) target driver (i.e. no kernel/user space interaction). - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: new scsi sense handling
On Mon, 4 Feb 2008 18:39:22 -0800 (PST) Luben Tuikov <[EMAIL PROTECTED]> wrote: > --- On Mon, 2/4/08, Boaz Harrosh <[EMAIL PROTECTED]> wrote: > > There are 3 usages of sense handling in drivers > > > > 1. sense is available in driver internal structure and is > > mem-copied to upper level > > 2. A CHECK_CONDITION status was returned and the driver > > uses the scsi_eh_prep_cmnd() > >for a REQUEST_SENSE invocation to the target. Then > > returning the sense in > >scsi_eh_return_cmnd(). A variation on this is when the > > driver does nothing the queue > >is frozen an the scsi watchdog timer does the above. > > 3. The underline host adapter does the REQUEST_SENSE and a > > pre-allocated and DMA mapped > >sense buffer receives the sense information from HW. > > Many years ago when "ACA" had a constructive meaning, > so did "Autosense". Then about 5 years ago, "Autosense" > disappeared completely since it became the de facto > implementation of the then SCSI Execute Command "RPC", > now just SCSI Execute Command procedure call. > > At that point in time, the SCSI mid-layer decided > to embrace this model and give the LLDD a scsi command > structure which included the sense data buffer to > a size that the SCSI mid-layer was interested in, > at the moment 96 bytes, macro defined in > include/scsi/scsi_cmnd.h. > > The concept of "Autosense" was off-loaded to LLDD > to emulate it if the specific target device to > which the command was issued, didn't supply the > sense data on CHECK CONDITION, and more so > relevant to target devices which implemented > queuing, thus the ACA. > > And the mid-layer would consider extracting > the sense data via REQUEST SENSE command > as a _special case_ if the LLDD/transport layer > didn't implement the "autosense" model. Only SPI and USB? The most of LLDs using the transport protocol that we care about today uses sense buffer in their own internal structure. I think that the issue to solve to kill scsi_cmnd:sense_buffer is how to share (or export) such sense buffer with the scsi mid-layer. For the old transport protocols, we could do something that James said in this thread to to kill scsi_cmnd:sense_buffer. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 5/24][RFC] dpt_i2o: Use new scsi_eh_cpy_sense()
I do not think the midlayer needs to be fixed. I think this was a bug/feature that presented itself in the 2.2 tree when we were developing this driver in 1996... Sincerely -- Mark Salyzyn > -Original Message- > From: Boaz Harrosh [mailto:[EMAIL PROTECTED] > Sent: Tuesday, February 05, 2008 3:52 AM > To: Salyzyn, Mark > Cc: James Bottomley; FUJITA Tomonori; Christoph Hellwig; Jens > Axboe; Jeff Garzik; linux-scsi; Andrew Morton > Subject: Re: [PATCH 5/24][RFC] dpt_i2o: Use new scsi_eh_cpy_sense() > > On Mon, Feb 04 2008 at 20:32 +0200, "Salyzyn, Mark" > <[EMAIL PROTECTED]> wrote: > > ACK with condition that community accepts the RFC's entire premise. > > > > The removed code that shunted the REQUEST_SENSE was based > on the assumption > > that the sense data in the current scsi command packet was > left over from the > > previous command's execution with a check condition as the > scsi command packet > > is reused to issue the REQUEST_SENSE. For a new, or second > from the target's point > > of view, request sense to the target issued by these older > kernels would always > > return an erased sense. The dpt_i2o driver does not itself > maintain the sense history, > > nor does the Firmware. This behavior, I believe, is not the > case for current kernels so > > the code fragment made little sense (pun not intended). If > my historical knowledge is > > correct, this (now removed) workaround makes no more sense > because the scsi layer correctly > > manages adapters that produce auto-request sense and does > not ever turn around the command > > and send a second request for sense information. > > > Given this understanding, I have no problem with the > removed fragment of REQUEST_SENSE shunting. > > However, I do urge some target error recovery testing, tape > drives being the likely type of target > > affected by this change. I have no such hardware to confirm... > > Sincerely -- Mark Salyzyn > > I have removed this test because the midlayer does a > scsi_eh_reset_sense() just before > the new invocation of a command. So even if the second bad > REQUEST_SENSE comes this > will not filter it out anymore. If such a thing still > happens? A driver state machine > must be used to filter it out, or of course midlayer should be fixed. > > Boaz > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
On Tue, 05 Feb 2008 05:43:10 +0100 Matteo Tescione <[EMAIL PROTECTED]> wrote: > Hi all, > And sorry for intrusion, i am not a developer but i work everyday with iscsi > and i found it fantastic. > Altough Aoe, Fcoe and so on could be better, we have to look in real world > implementations what is needed *now*, and if we look at vmware world, > virtual iron, microsoft clustering etc, the answer is iSCSI. > And now, SCST is the best open-source iSCSI target. So, from an end-user > point of view, what are the really problems to not integrate scst in the > mainstream kernel? Currently, the best open-source iSCSI target implemenation in Linux is Nicholas's LIO, I guess. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel
On Tue, 05 Feb 2008 08:14:01 +0100 Tomasz Chmielewski <[EMAIL PROTECTED]> wrote: > James Bottomley schrieb: > > > These are both features being independently worked on, are they not? > > Even if they weren't, the combination of the size of SCST in kernel plus > > the problem of having to find a migration path for the current STGT > > users still looks to me to involve the greater amount of work. > > I don't want to be mean, but does anyone actually use STGT in > production? Seriously? > > In the latest development version of STGT, it's only possible to stop > the tgtd target daemon using KILL / 9 signal - which also means all > iSCSI initiator connections are corrupted when tgtd target daemon is > started again (kernel upgrade, target daemon upgrade, server reboot etc.). I don't know what "iSCSI initiator connections are corrupted" mean. But if you reboot a server, how can an iSCSI target implementation keep iSCSI tcp connections? > Imagine you have to reboot all your NFS clients when you reboot your NFS > server. Not only that - your data is probably corrupted, or at least the > filesystem deserves checking... - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Marvell 6440 SAS/SATA driver
Added support for hotplug and wide port. Signed-off-by: Ke Wei <[EMAIL PROTECTED]> --- drivers/scsi/mvsas.c | 445 ++ 1 files changed, 339 insertions(+), 106 deletions(-) diff --git a/drivers/scsi/mvsas.c b/drivers/scsi/mvsas.c index 3bf009b..e5cf3ad 100755 --- a/drivers/scsi/mvsas.c +++ b/drivers/scsi/mvsas.c @@ -26,6 +26,12 @@ structures. this permits elimination of all the le32_to_cpu() and cpu_to_le32() conversions. + Changelog: + 2008-02-05 0.4 Added support for hotplug and wide port. + 2008-01-22 0.3 Added support for SAS HD and SATA Devices. + 2008-01-09 0.2 detect SAS disk. + 2007-09-95 0.1 rough draft, Initial version. + */ #include @@ -39,13 +45,13 @@ #include #define DRV_NAME "mvsas" -#define DRV_VERSION"0.3" +#define DRV_VERSION"0.4" #define _MV_DUMP 0 #define MVS_DISABLE_NVRAM #define mr32(reg) readl(regs + MVS_##reg) #define mw32(reg,val) writel((val), regs + MVS_##reg) -#define mw32_f(reg,val)do {\ +#define mw32_f(reg,val)do {\ writel((val), regs + MVS_##reg);\ readl(regs + MVS_##reg);\ } while (0) @@ -54,13 +60,19 @@ #define MVS_CHIP_SLOT_SZ (1U << mvi->chip->slot_width) /* offset for D2H FIS in the Received FIS List Structure */ -#define SATA_RECEIVED_D2H_FIS(reg_set) \ +#define SATA_RECEIVED_D2H_FIS(reg_set) \ ((void *) mvi->rx_fis + 0x400 + 0x100 * reg_set + 0x40) -#define SATA_RECEIVED_PIO_FIS(reg_set) \ +#define SATA_RECEIVED_PIO_FIS(reg_set) \ ((void *) mvi->rx_fis + 0x400 + 0x100 * reg_set + 0x20) -#define UNASSOC_D2H_FIS(id) \ +#define UNASSOC_D2H_FIS(id)\ ((void *) mvi->rx_fis + 0x100 * id) +#define for_each_phy(__lseq_mask, __mc, __lseq, __rest) \ + for ((__mc) = (__lseq_mask), (__lseq) = 0; \ + (__mc) != 0 && __rest; \ + (++__lseq), (__mc) >>= 1) \ + if (((__mc) & 1)) + /* driver compile-time configuration */ enum driver_configuration { MVS_TX_RING_SZ = 1024, /* TX ring size (12-bit) */ @@ -130,6 +142,7 @@ enum hw_registers { MVS_INT_STAT= 0x150, /* Central int status */ MVS_INT_MASK= 0x154, /* Central int enable */ MVS_INT_STAT_SRS= 0x158, /* SATA register set status */ + MVS_INT_MASK_SRS= 0x15C, /* ports 1-3 follow after this */ MVS_P0_INT_STAT = 0x160, /* port0 interrupt status */ @@ -223,7 +236,7 @@ enum hw_register_bits { /* shl for ports 1-3 */ CINT_PORT_STOPPED = (1U << 16), /* port0 stopped */ - CINT_PORT = (1U << 8),/* port0 event */ + CINT_PORT = (1U << 8),/* port0 event */ CINT_PORT_MASK_OFFSET = 8, CINT_PORT_MASK = (0xFF << CINT_PORT_MASK_OFFSET), @@ -300,6 +313,7 @@ enum hw_register_bits { PHY_READY_MASK = (1U << 20), /* MVS_Px_INT_STAT, MVS_Px_INT_MASK (per-phy events) */ + PHYEV_DEC_ERR = (1U << 24), /* Phy Decoding Error */ PHYEV_UNASSOC_FIS = (1U << 19), /* unassociated FIS rx'd */ PHYEV_AN= (1U << 18), /* SATA async notification */ PHYEV_BIST_ACT = (1U << 17), /* BIST activate FIS */ @@ -501,6 +515,9 @@ enum status_buffer { SB_RFB_MAX = 0x400, /* RFB size*/ }; +enum error_info_rec { + CMD_ISS_STPD= (1U << 31), /* Cmd Issue Stopped */ +}; struct mvs_chip_info { u32 n_phy; @@ -534,6 +551,7 @@ struct mvs_cmd_hdr { struct mvs_slot_info { struct sas_task *task; u32 n_elem; + u32 tx; /* DMA buffer for storing cmd tbl, open addr frame, status buffer, * and PRD table @@ -546,23 +564,28 @@ struct mvs_slot_info { struct mvs_port { struct asd_sas_port sas_port; - u8 taskfileset; + u8 port_attached; + union { + u8 taskfileset; + u8 wide_port_phymap; + }; }; struct mvs_phy { struct mvs_port *port; struct asd_sas_phy sas_phy; - struct sas_identify identify; + struct sas_identify identify; + struct scsi_device *sdev; u64 dev_sas_addr; u64 att_dev_sas_addr; u32 att_dev_info; u32 dev_info; - u32 type; + u32 phy_type; u32 phy_status;
Re: Integration of SCST in the mainstream Linux kernel
On Mon, Feb 04, 2008 at 05:57:47PM -0500, Jeff Garzik wrote: > iSCSI and NBD were passe ideas at birth. :) > > Networked block devices are attractive because the concepts and > implementation are more simple than networked filesystems... but usually > you want to run some sort of filesystem on top. At that point you might > as well run NFS or [gfs|ocfs|flavor-of-the-week], and ditch your > networked block device (and associated complexity). Call me a sysadmin, but I find easier to plug in and keep in place an ethernet cable than these parallel scsi cables from hell. Every server has at least two ethernet ports by default, with rarely any surprises at the kernel level. Adding ethernet cards is inexpensive, and you pretty much never hear of compatibility problems between cards. So ethernet as a connection medium is really nice compared to scsi. Too bad iscsi is demented and ATAoE/NBD inexistant. Maybe external SAS will be nice, but I don't see it getting to the level of universality of ethernet any time soon. And it won't get the same amount of user-level compatibility testing in any case. OG. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
diskdump fails on x3850 (x86_64)
Hello folks, Diskdump on x3850 with adp94xx driver is not working (Redhat Enterprise Linux 4.6). 'Service diskdump restart' fails. Is there any plan to support it? Thanks Ciju - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/7] blk_end_request: full I/O completion handler
Hello, We would like to know in which kernel version these patches are available. Thanks, Chandrakala -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jens Axboe Sent: Monday, September 03, 2007 1:16 PM To: Kiyoshi Ueda Cc: [EMAIL PROTECTED]; linux-scsi@vger.kernel.org; [EMAIL PROTECTED]; Miller, Mike (OS Dev); [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [PATCH 0/7] blk_end_request: full I/O completion handler On Fri, Aug 31 2007, Kiyoshi Ueda wrote: > Hello, > > This set of patches changes request completion interface between > device drivers and block layer to 1 step procedure from current 2 step > procedures using end_that_request_{first/chunk} and > end_that_request_last(). > > This change allows request-based multipath to hook in before > completing each chunk of request, check errors for it and retry it > using another path if error is detected. > > Summaries of each patch are below: > 1/7: add new request completion interface, blk_end_request() > 2/7: add some macros to get the size of request in bytes > 3/7: convert normal drivers to use blk_end_request() > 4/7: convert odd drivers like cciss/cpqarray/xsysace to use >blk_end_request() > 5/7: convert ide-cd (cdrom_newpc_intr) to use blk_end_request() > 6/7: remove/unexport no longer needed end_that_request_* > 7/7: change rq->end_io to cover request completion as a whole > > I have tested the patch on two machines, ia64+QLA1280+QLA2200 and > x86_64+SATA+IDE-CDROM. > I can't test other device drivers for which I don't have hardware. > So testing help and any comments would be very much appreciated. > > The interface change causes code modifications of *ALL DEVICE DRIVERS* > which are using end_that_request_{first/chunk/last} to complete request. > But it should not affect the behavior. > > Please review and apply if no problem. > This patch-set should be applied on top of 2.6.23-rc3-mm1. > > > BACKGROUND > == > The patch is necessary to allow device stacking at request level, that > is request-based device-mapper multipath. > Currently, device-mapper is implemented as a stacking block device at > BIO level. OTOH, request-based DM will stack at request level to > allow better multipathing decision. > To allow device stacking at request level, the completion procedure > need to provide a hook for it. > For example, dm-multipath has to check errors and retry with other > paths if necessary before returning the I/O result to upper layer. > struct request has 'end_io' hook currently. But it's called at the > very late stage of completion handling where the I/O result is already > returned to the upper layer. > So we need something here. > > The first approach to hook in completion of each chunk of request was > adding a new rq->end_io_first() hook and calling it on the top of > __end_that_request_first(). > - http://marc.theaimsgroup.com/?l=linux-scsi&m=115520444515914&w=2 > - http://marc.theaimsgroup.com/?l=linux-kernel&m=116656637425880&w=2 > However, Jens pointed out that redesigning rq->end_io() as a full > completion handler would be better: > > On Thu, 21 Dec 2006 08:49:47 +0100, Jens Axboe <[EMAIL PROTECTED]> wrote: > > Ok, I see what you are getting at. The current ->end_io() is called > > when the request has fully completed, you want notification for each > > chunk potentially completed. > > > > I think a better design here would be to use ->end_io() as the full > > completion handler, similar to how bio->bi_end_io() works. A request > > originating from __make_request() would set something ala: > . > > instead of calling the functions manually. That would allow you to > > get notification right at the beginning and do what you need, > > without adding a special hook for this. > > I thought his comment was reasonable. > So I modified the patches based on his suggestion. > > > WHAT IS CHANGED > === > The change is basically illustlated by the following pseudo code: > > [Before] > if (end_that_request_{first/chunk} succeeds) { <-- completes bios > > end_that_request_last() <-- calls end_io() > > } else { > > } > > [After] > if (blk_end_request() succeeds) { <-- calls end_io(), completes bios > > } else { > > } > > > In detail, request completion procedures are changed like below. > > [Before] > o 2 steps completion using end_that_request_{first/chunk} > and end_that_request_last(). > o Device drivers have ownership of a request until they > call end_that_request_last(). > o rq->end_io() is called at the last stage of > end_that_request_last() for some block layer codes need > specific request handling when completing it. > > [After] > o 1 step completion using blk_end_request(). > (end_that_request_* are no longer used from device drivers.) > o Device drivers give over ownership of a request > when call
Re: [PATCH 0/7] blk_end_request: full I/O completion handler
On Tue, Feb 05 2008, S, Chandrakala (STSD) wrote: > Hello, > > We would like to know in which kernel version these patches are > available. They were merged after 2.6.24 was released, so they will show up in the 2.6.25 kernel. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/24][RFC] dpt_i2o: Use new scsi_eh_cpy_sense()
On Mon, Feb 04 2008 at 20:32 +0200, "Salyzyn, Mark" <[EMAIL PROTECTED]> wrote: > ACK with condition that community accepts the RFC's entire premise. > > The removed code that shunted the REQUEST_SENSE was based on the assumption > that the sense data in the current scsi command packet was left over from the > previous command's execution with a check condition as the scsi command > packet > is reused to issue the REQUEST_SENSE. For a new, or second from the target's > point > of view, request sense to the target issued by these older kernels would > always > return an erased sense. The dpt_i2o driver does not itself maintain the sense > history, > nor does the Firmware. This behavior, I believe, is not the case for current > kernels so > the code fragment made little sense (pun not intended). If my historical > knowledge is > correct, this (now removed) workaround makes no more sense because the scsi > layer correctly > manages adapters that produce auto-request sense and does not ever turn > around the command > and send a second request for sense information. > Given this understanding, I have no problem with the removed fragment of > REQUEST_SENSE shunting. > However, I do urge some target error recovery testing, tape drives being the > likely type of target > affected by this change. I have no such hardware to confirm... > Sincerely -- Mark Salyzyn I have removed this test because the midlayer does a scsi_eh_reset_sense() just before the new invocation of a command. So even if the second bad REQUEST_SENSE comes this will not filter it out anymore. If such a thing still happens? A driver state machine must be used to filter it out, or of course midlayer should be fixed. Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bugfix for an underflow condition in usb storage & isd200.c
On Mon, Feb 04 2008 at 22:05 +0200, Alan Stern <[EMAIL PROTECTED]> wrote: > On Sun, 3 Feb 2008, Matthew Dharm wrote: > >> But, the modifications to usb_stor_access_xfer_buf() look good -- no >> request from a sub-driver should be allowed to scribble into memory. The >> current code does make the implicit assumption that there is enough >> storage, and will walk right off the end of the sg list if there isn't. >> >> I'm not sure I like the mods to usb_stor_set_xfer_buf(). Any place we set >> a status that we know is going to be thrown away is an invitation for a >> problem later if someone changes the code to preserve that status. It's a >> jack-in-the-box, waiting to spring open in our face later. The limit check >> (which mirrors the usb_stor_access_xfer_buf modification) and WARN_ON() are >> probably good. >> >> In a strictly technical sense, the change to protocol.c are sufficient. >> That is, they will prevent a serious error. There is a justification tho >> to fix all of the users of usb_stor_access_buf() to not attempt to use more >> SCSI buffer than exists. >> >> My opinion is this: Let's make the protocol.c mods (modulo my comments >> about setting useless status bits) now. Then, let's decide if we're going >> to patch all the other users of the usb_stor_*_xfer_buf() functions as a >> separate discussion. > > I think the correct approach is to modify those routines so that they > will never overrun the s-g buffer (like Boaz has done), and _document_ > this behavior. Then the callers can feel free to try and transfer as > much as they want, knowing that an overrun can't occur. There won't > be any need for a WARN_ON or anything else. > > However the interface to usb_stor_access_xfer_buf() will have to change > slightly. Right now if it sees that *sgptr is NULL, it assumes this > means it should start at the beginning of the s-g buffer. But with > Boaz's change, *sgptr == NULL means the transfer has reached the end of > the buffer. So I'll have to go through and audit all the callers. > > Alan Stern > > - No it does not, this as not changed. Please look again. Note that this patch was tested and working. It is a bug in v2.2.24 and it should be accepted already. One way or the other. Callers of usb_stor_access_xfer_buf() need not change. Matthew Dharm should decide if he wants the WARN_ON in usb_stor_set_xfer_buf() or not and be done with it. I have found and fixed the bug, but it is not a SCSI related bug, and it is not do to any scsi changes. It is a bug from the SG changes of early 2.6.24. Please take it through the USB tree. Feel free to change it the way you like it, and submit it. Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
On Feb 4, 2008 11:57 PM, Jeff Garzik <[EMAIL PROTECTED]> wrote: > Networked block devices are attractive because the concepts and > implementation are more simple than networked filesystems... but usually > you want to run some sort of filesystem on top. At that point you might > as well run NFS or [gfs|ocfs|flavor-of-the-week], and ditch your > networked block device (and associated complexity). Running a filesystem on top of iSCSI results in better performance than NFS, especially if the NFS client conforms to the NFS standard (=synchronous writes). By searching the web search for the keywords NFS, iSCSI and performance I found the following (6 years old) document: http://www.technomagesinc.com/papers/ip_paper.html. A quote from the conclusion: Our results, generated by running some of industry standard benchmarks, show that iSCSI significantly outperforms NFS for situations when performing streaming, database like accesses and small file transactions. Bart Van Assche. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 08/13] sg: nopage
On Mon, 04 Feb 2008 23:53:21 -0800 [EMAIL PROTECTED] wrote: > From: Nick Piggin <[EMAIL PROTECTED]> > > Convert SG from nopage to fault. > Please give this some additional attention. We'd like to remove vm_operations_struct.nopage() altogether and we can't do that while it's hanging around in various subsystems. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][drivers/scsi/u14-34f.c] duplicate test 'SCpnt->sc_data_direction == DMA_FROM_DEVICE'
[EMAIL PROTECTED] wrote: > Good to know that somebody still uses the Ultrastor 14f board :). > Yes, this typo was introduced by somebody doing massive editing to all > scsi drivers long ago. > Cheers, > --db Actually, I do not own a Ultrastor 14f board. I found this by searching for if (test) ... else if (exactly same test) ... Thanks, Roel - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH][drivers/scsi/u14-34f.c] duplicate test 'SCpnt->sc_data_direction == DMA_FROM_DEVICE'
Good to know that somebody still uses the Ultrastor 14f board :). Yes, this typo was introduced by somebody doing massive editing to all scsi drivers long ago. Cheers, --db -Original Message- From: Roel Kluin [mailto:[EMAIL PROTECTED] Sent: Monday, February 04, 2008 11:37 PM To: Ballabio, Dario Cc: linux-scsi@vger.kernel.org; lkml Subject: [PATCH][drivers/scsi/u14-34f.c] duplicate test 'SCpnt->sc_data_direction == DMA_FROM_DEVICE' It should be like this I guess? this patch was not yet tested, please confirm. -- Note the duplicate test 'SCpnt->sc_data_direction == DMA_FROM_DEVICE' from Documentation/DMA-API.txt: DMA_TO_DEVICE = PCI_DMA_TODEVICE data is going from the memory to the device DMA_FROM_DEVICE = PCI_DMA_FROMDEVICEdata is coming from the device to the Signed-off-by: Roel Kluin <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/u14-34f.c b/drivers/scsi/u14-34f.c index 662c004..1e704f9 100644 --- a/drivers/scsi/u14-34f.c +++ b/drivers/scsi/u14-34f.c @@ -1208,15 +1208,15 @@ static void scsi_to_dev_dir(unsigned int i, unsigned int j) { }; struct mscp *cpp; struct scsi_cmnd *SCpnt; cpp = &HD(j)->cp[i]; SCpnt = cpp->SCpnt; - if (SCpnt->sc_data_direction == DMA_FROM_DEVICE) { + if (SCpnt->sc_data_direction == DMA_TO_DEVICE) { cpp->xdir = DTD_IN; return; } else if (SCpnt->sc_data_direction == DMA_FROM_DEVICE) { cpp->xdir = DTD_OUT; return; } - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html