Re: Wrong free clusters count on FAT32
DervishD wrote: Hi all :) I have a portable device with a FAT32 formatted hard disk in it, and everytime I delete a file in the device *using the device itself to do it* the device increases its count of free space and if I plug the device in a Windows system, Windows agrees on the free space. Linux doesn't. Linux believes that the files are still there ocuppying space, and I have to run fsck.vfat to fix the problem. As far as I've seen, the device is probably not updating correctly the list of free clusters, but it doesn't seem to worry about it, neither does Windows. So my question is: is there any way of making Linux bug compatible with Windows? If Windows itself don't worry about the free cluster count and computes the free space in some other way, then it can be done. I haven't seen anything in mount(8) neither googling. Apart from not using the device itself to delete files (and probably not using Windows for that, either) and to run fsck.vfat now and then, is anything I can do to avoid this problem? Thanks a lot in advance :) Raúl Núñez de Arenas Coronado Not that I know how to fix it. But have you tried running chkdsk on a device deleted files in windows. Just that I know, that windows would silently ignore fat errors and will only report them in chkdsk. Deleting files in windows works fine. There is no problems with it and windows updates everything. I do that all the time. (I would even theorize that if you delete files on the device and than farther delete more files on windows, than windows will fix the problem on the fly) So it is bug compatible with the device, and error recovery compatible with windows. Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mcdx -- do_request(): non-read command to cd!!
Pekka J Enberg wrote: On Mon, 2 Apr 2007, Jens Axboe wrote: But as I wrote earlier, it'll take lots more to make this driver close to functional. Heh, looking at the driver, that's quite an understatement =). Rene, here's an untested attempt to use a mutex instead of the horribly broken waitqueue synchronization in mcdx. It may or may not help so give it a spin if you want. Pekka --- drivers/cdrom/mcdx.c | 121 ++- 1 file changed, 44 insertions(+), 77 deletions(-) Index: 2.6/drivers/cdrom/mcdx.c === --- 2.6.orig/drivers/cdrom/mcdx.c 2007-04-02 11:50:40.0 +0300 +++ 2.6/drivers/cdrom/mcdx.c 2007-04-02 11:51:20.0 +0300 @@ -58,6 +58,7 @@ = $Id: mcdx.c,v 1.21 1997/01/26 07: #include linux/module.h +#include linux/delay.h #include linux/errno.h #include linux/interrupt.h #include linux/fs.h @@ -151,15 +152,14 @@ struct s_version { /* Per drive/controller stuff **/ struct s_drive_stuff { + struct mutex mutex; + /* waitqueues */ wait_queue_head_t busyq; - wait_queue_head_t lockq; - wait_queue_head_t sleepq; /* flags */ volatile int introk;/* status of last irq operation */ volatile int busy; /* drive performs an operation */ - volatile int lock; /* exclusive usage */ /* cd infos */ struct s_diskinfo di; @@ -266,7 +266,6 @@ static unsigned int uint2bcd(unsigned in static unsigned int bcd2uint(unsigned char); static unsigned port(int *); static int irq(int *); -static void mcdx_delay(struct s_drive_stuff *, long jifs); static int mcdx_transfer(struct s_drive_stuff *, char *buf, int sector, int nr_sectors); static int mcdx_xfer(struct s_drive_stuff *, char *buf, int sector, @@ -287,7 +286,7 @@ static int mcdx_requestmultidiskinfo(str static int mcdx_requesttocdata(struct s_drive_stuff *, struct s_diskinfo *, int); static int mcdx_getstatus(struct s_drive_stuff *, int); -static int mcdx_getval(struct s_drive_stuff *, int to, int delay, char *); +static int mcdx_getval(struct s_drive_stuff *, int to, int delay_sec, char *); static int mcdx_talk(struct s_drive_stuff *, const unsigned char *cmd, size_t, void *buffer, size_t size, unsigned int timeout, int); @@ -577,56 +576,57 @@ static void do_mcdx_request(request_queu if (!req) return; + if (!blk_fs_request(req)) { + end_request(req, 0); + goto again; + } + stuffp = req-rq_disk-private_data; if (!stuffp-present) { xwarn(do_request(): bad device: %s\n,req-rq_disk-disk_name); xtrace(REQUEST, end_request(0): bad device\n); end_request(req, 0); - return; + goto again; } if (stuffp-audio) { xwarn(do_request() attempt to read from audio cd\n); xtrace(REQUEST, end_request(0): read from audio\n); end_request(req, 0); - return; + goto again; } xtrace(REQUEST, do_request() (%lu + %lu)\n, req-sector, req-nr_sectors); - if (req-cmd != READ) { + if (rq_data_dir(req) != READ) { xwarn(do_request(): non-read command to cd!!\n); xtrace(REQUEST, end_request(0): write\n); end_request(req, 0); - return; - } - else { - stuffp-status = 0; - while (req-nr_sectors) { - int i; - - i = mcdx_transfer(stuffp, - req-buffer, - req-sector, - req-nr_sectors); - - if (i == -1) { - end_request(req, 0); - goto again; - } - req-sector += i; - req-nr_sectors -= i; - req-buffer += (i * 512); - } - end_request(req, 1); goto again; - - xtrace(REQUEST, end_request(1)\n); - end_request(req, 1); } + stuffp-status = 0; + while (req-nr_sectors) { + int i; + + spin_unlock_irq(q-queue_lock); + i = mcdx_transfer(stuffp, + req-buffer, + req-sector, + req-nr_sectors); + spin_lock_irq(q-queue_lock); + + if (i == -1) { + end_request(req, 0); + goto again; + } + req-sector +=
Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd
On Mon, Jan 07 2008 at 8:53 +0200, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Sun, 23 Dec 2007 13:09:05 +0200 Boaz Harrosh [EMAIL PROTECTED] wrote: On Fri, Dec 21 2007 at 4:30 +0200, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: The sense buffer ins scsi_cmnd can nowadays be DMA'ed into directly by some low level drivers (that typically happens with USB mass storage). This is a problem on non cache coherent architectures such as embedded PowerPCs where the sense buffer can share cache lines with other structure members, which leads to various forms of corruption. This uses the newly defined __dma_buffer annotation to enforce that on such platforms, the sense_buffer is contained within its own cache line. This has no effect on cache coherent architectures. Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED] --- include/scsi/scsi_cmnd.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-merge.orig/include/scsi/scsi_cmnd.h 2007-12-21 13:07:14.0 +1100 +++ linux-merge/include/scsi/scsi_cmnd.h2007-12-21 13:07:29.0 +1100 @@ -88,7 +88,7 @@ struct scsi_cmnd { working on */ #define SCSI_SENSE_BUFFERSIZE 96 - unsigned char sense_buffer[SCSI_SENSE_BUFFERSIZE]; + unsigned char sense_buffer[SCSI_SENSE_BUFFERSIZE] __dma_buffer; /* obtained by REQUEST SENSE when * CHECK CONDITION is received on original * command (auto-sense) */ This has the potential of leaving a big fat ugly hole in the middle of scsi_cmnd. I would suggest of *just* moving the sense_buffer array to be the *first member* of struct scsi_cmnd. The command itself is already cache aligned, allocated by the proper flags to it's slab. And put a fat comment near it's definition. This is until a proper fix is sent. I have in my Q a proposition for a more prominent solution, which I will send next month. Do to short comings in the sense handling and optimizations, but should definitely cover this problem. The code should have time to be discussed and tested, so it is only 2.6.26 material. For the duration of the 2.6.25 kernel we can live with a reorder of scsi_cmnd members, if it solves such a grave bug for some ARCHs. Boaz [RFD below] My proposed solution will be has follows: 1. Since the scsi protocol mandates an immediate REQUEST_SENSE after an error in effect the Q is frozen until the REQUEST_SENSE command returns. 2. The scsi-ml needs the sense buffer for its normal operation, independent from the ULD's request of the sence_buffer or not at request-sense. But in effect, 90% of all scsi-requests come with ULD's allocated buffer for sense, that is copied to, on command completion. 3. 99% of all commands complete successfully, so if an optimization is proposed for the successful case, sacrificing a few cycles for the error case than thats a good thing. My suggestion is to have a per Q, driver-overridable, sense buffer that is DMAed/written to by drivers. At the end of the REQUEST_SENSE command one of 2 things will be done. Either copy the sense to the ULD's supplied buffer, or if not available, allocate one from a dedicated mem_cache pool. So we are completely saving 92 bytes from scsi_cmnd by replacing the buffer with a pointer. We can always read the sense into a per Q buffer. And 10% of %1 of the times we will need to allocate a sense buffer from a dedicated mem_cache I would say thats a nice optimization. The changes to scsi_error/scsi_cmnd and friends, is pretty strait forward. But it depends on a conversion of 4/5 drivers to the new scsi_eh API for REQUEST_SENSE. I have only converted these drivers that interfered with the accessors effort + 1 other places. But there are a few more places that are not converted. Once done. The implementation can easily change with no affect on drivers. I think that removing the sense_buffer array from scsi_cmnd effects lots of LLDs. As I wrote in other mail, many LLDs assume that scsi_cmnd:sense_buffer is always available. Another big task is to take care about auto sense. Have you already had some patches? I've just started to work on this and I'd like to push that fix into 2.6.25. Tomo Hi. Since you ask to push this into 2.6.25, I have ask permission to prioritize this effort, as until now it was on a back burner. I have only done 3 drivers up to now. (out of something like 15) I have seen 4 patterns of sense use. 1. Driver allocated sense that is memcpy'ed in interrupt time to cmnd-sense. The sense is automatically fetched by controller and is usually pre-mapped. 2. dma-map of cmnd-sense prior to each command. similar to case-1 but DMAed directly into cmnd-sense buffer. (AutoSense) 3. Synchronous request for sense, like the places I sent patches
Re: [patch 1/1] Switch ioctl functions of drivers/scsi/sg.c to unlocked_ioctl
On Thu, Jan 10 2008 at 21:45 +0200, Andre Noll [EMAIL PROTECTED] wrote: On 20:29, Andi Kleen wrote: Sure, I can do that if James likes the idea. Since not all case statements need the BKL, we could add it only to those for which it isn't clear that it is unnecessary. And this would actually improve something. I still think it would be a good strategy to first add it to all (in a essentially nop semantics patch) and then later eliminate it from the cases that obviously don't need it. James, would you accept such a patch? Andre Andre hi. All the scsi calls do not need any locks. The scsi LLDS never see these threads since commands are queued through the block layer. What's left is what you see, here in sg.c. you must have the best knowledge about the possible races between ioctl and open/release and probe/remove. And all these put_user() copy_user() etc... Why don't you have a hard look and fix them properly. please don't *lock_kernel();* for scsi's sake. Also note that sg is not a scsi device it is a block device. It used to Q commands directly to scsi but it does not do that any more. Again I think SCSI neto is safe and tested. My $0.02. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ata_sg_setup_one vs ata_sg_setup?
On Wed, Nov 14 2007 at 6:40 +0200, Rusty Russell [EMAIL PROTECTED] wrote: Hi Jeff, Was looking through libata, and it seems to me that ata_sg_setup is a superset of ata_sg_setup_one. Am I missing something? Seems like it could be simplified. My machine never seems to do an ata_sg_setup_one, so this patch isn't really tested... It's not only your machine, it's every ones machines. Since 2.6.18. But in 2.6.24 the last code user of ata_sg_setup_one was also removed from libata-scsi.c : (http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e10b8c3f5f23188e065b1845ba732570eca007fe) Please see in above commit's comment about this issue. Thanks, Rusty. --- Subject: libata: fold ata_queued_cmd single and sg logic libata separates the single buffer case from the scatterlist case internally. It's not clear that this is necessary. Remove the ATA_QCFLAG_SINGLE flag, and buf_virt pointer, and always initialize qc-nbytes in ata_sg_init(). It's possible that the ATA_QCFLAG_SG and ATA_QCFLAG_DMAMAP flags could be entirely removed, and we could use whether qc-__sg is NULL or not. Yes these flags can be removed and you will find that qc-__sg will never be NULL, unless it is a DMA_NONE command. Signed-off-by: Rusty Russell [EMAIL PROTECTED] diff -r 8b1075c7ad47 drivers/ata/libata-core.c --- a/drivers/ata/libata-core.c Tue Nov 13 21:00:47 2007 +1100 +++ b/drivers/ata/libata-core.c Wed Nov 14 15:31:07 2007 +1100 @@ -1648,16 +1648,8 @@ unsigned ata_exec_internal_sg(struct ata snip Thanks for doing this. Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1
On Thu, Nov 15 2007 at 19:15 +0200, Matthew Dharm [EMAIL PROTECTED] wrote: On Wed, Nov 14, 2007 at 10:23:09AM +0100, Gabriel C wrote: Matthew Dharm wrote: On Wed, Nov 14, 2007 at 06:33:39AM +0100, Gabriel C wrote: Matthew Dharm wrote: On Tue, Nov 13, 2007 at 07:49:24PM -0800, Greg KH wrote: Matt, are these the errors you were worried about with the patch we were just talking about tha tis in my tree? I can't tell from these logs. There is the dmesg with CONFIG_USB_STORAGE_DEBUG : http://194.231.229.228/dmesg-2.6.24-rc2-mm1 Good news: This isn't the bug Greg was worried about. Bad news: Something is seriously strange here. Note the following from the logs: Nov 14 06:07:43 lara [ 41.890614] usb-storage: Bulk Status S 0x53425355 T 0xd R 0 Stat 0x0 Nov 14 06:07:43 lara [ 41.890616] usb-storage: -- unexpectedly short transfer Note the 'R' value of zero -- this is the residue value. It indicates a complete transfer, and that matches the log lines immediately previous which indicate a 4K transfer which completed properly. If residue is zero, then srb-resid should be zero. Take a look in linux/usb/storage/transport.c in usb_stor_Bulk_transport() If srb-resid is zero, then you should NEVER get the unexpectedly short transfer message. Look at usb_stor_invoke_transport() in the same file. That code got replaced recently but I have no idea about it. ( http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=shortlog see the patches from Boaz Harrosh) srb-resid got replaced by scsi_get_resid() it I see that right. I'm CC'ing the author , he will know I think. The replacement looks, to my eye, to be logically correct. The patch was pretty clean. Then again, I haven't looked at what is under the hood of the accessor functions. Perhaps there is a side-effect somewhere in there? Perhaps a quick debugging test -- print the value of scsi_get_resid(srb) just after it's initialized to zero at the top of usb_stor_invoke_transport(), and then just after the call to us-transport(). I have found the bug. My bad sorry about that. Patch below It is because I switched from use of usb_stor_bulk_transfer_sg() to usb_stor_bulk_transfer_sglist, but forgot the residual handling. (Please send scsi bugs to scsi list. My lkml mental filters are much higher, Sorry for not seeing this yesterday) From: Boaz Harrosh [EMAIL PROTECTED] Date: Thu, 15 Nov 2007 20:07:56 +0200 Subject: [PATCH] Fix bug in last usb accessor patch Bad news: Something is seriously strange here. Note the following from the logs: Nov 14 06:07:43 lara [ 41.890614] usb-storage: Bulk Status S 0x53425355 T 0xd R 0 Stat 0x0 Nov 14 06:07:43 lara [ 41.890616] usb-storage: -- unexpectedly short transfer Note the 'R' value of zero -- this is the residue value. It indicates a complete transfer, and that matches the log lines immediately previous which indicate a 4K transfer which completed properly. If residue is zero, then srb-resid should be zero. Take a look in linux/usb/storage/transport.c in usb_stor_Bulk_transport() If srb-resid is zero, then you should NEVER get the unexpectedly short transfer message. Look at usb_stor_invoke_transport() in the same file. That code got replaced recently but I have no idea about it. wrong resid handling fixed Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/usb/storage/transport.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c index d3a84a2..d9f4912 100644 --- a/drivers/usb/storage/transport.c +++ b/drivers/usb/storage/transport.c @@ -465,11 +465,12 @@ static int usb_stor_bulk_transfer_sglist(struct us_data *us, unsigned int pipe, int usb_stor_bulk_srb(struct us_data* us, unsigned int pipe, struct scsi_cmnd* srb) { - int resid = scsi_get_resid(srb); + unsigned int partial; int result = usb_stor_bulk_transfer_sglist(us, pipe, scsi_sglist(srb), scsi_sg_count(srb), scsi_bufflen(srb), - resid); - scsi_set_resid(srb, resid); + partial); + + scsi_set_resid(srb, scsi_bufflen(srb) - partial); return result; } -- 1.5.3.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INITIO scsi driver fails to work properly
On Mon, Dec 17 2007 at 13:41 +0200, Filippos Papadopoulos [EMAIL PROTECTED] wrote: On Dec 17, 2007 1:18 PM, Andrew Morton [EMAIL PROTECTED] wrote: On Mon, 17 Dec 2007 11:39:47 +0200 Filippos Papadopoulos [EMAIL PROTECTED] wrote: Hi, I have got an INITIO 9100 UW SCSI Controller with an IBM IC35L036UWD210-0 scsi hard disk on a 32 bit x86 system. Currently i have SUSE 10.1 (Kernel 2.6.16). I tried to install OpenSUSE 10.3 (kernel 2.6.22.5) and the latest OpenSUSE 11.0 Alpha 0 (kernel 2.6.24-rc4) but although the initio driver gets loaded during the installation process, yast reports that no hard disk is found. I believe that this isnt a bug in suse's yast but a problem in the initio scsi driver because i also tried to install Fedora 8 (kernel 2.6.23) with the same problem. I have seen the relevant thread Conflict when loading initio driver and i suppose that the initio driver isnt fixed yet. I can help testing the new patches in the initio driver if someone is interested. initio doesn't seem to have a maintainer... Are you able to identify any earlier kernel which worked OK? I have this PC configuration since 2002. The initio driver worked perfectly with 2.4 kernel series. With the release of 2.6 kernel series the driver had been marked as BROKEN and fixed at 2.6.9 (see at http://www.gossamer-threads.com/lists/linux/kernel/482582?search_string=SCSI%20updates%20for%202.6.9;#482582 Christoph Hellwig -don't mark the initio 9100 driver broken) Maybe it's a new device? If you can get the `lspci -vvxx' output for that device we can take a look. No its not a new device. - I have found one problem. Please try patch [2] below and report. If it still fails try to enable debugging by setting with patch [1] these values at top of drivers/scsi/initio.c. And send dmsgs. Boaz patch [1] diff --git a/drivers/scsi/initio.c b/drivers/scsi/initio.c index 4c4465d..61edcd2 100644 --- a/drivers/scsi/initio.c +++ b/drivers/scsi/initio.c @@ -138,10 +138,10 @@ static struct pci_device_id i91u_pci_devices[] = { }; MODULE_DEVICE_TABLE(pci, i91u_pci_devices); -#define DEBUG_INTERRUPT 0 -#define DEBUG_QUEUE 0 -#define DEBUG_STATE 0 -#define INT_DISC 0 +#define DEBUG_INTERRUPT 1 +#define DEBUG_QUEUE 1 +#define DEBUG_STATE 1 +#define INT_DISC 1 /*--- forward references ---*/ static struct scsi_ctrl_blk *initio_find_busy_scb(struct initio_host * host, u16 tarlun); --- patch [2] --- git-diff --stat -p drivers/scsi/initio.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/initio.c b/drivers/scsi/initio.c index 4c4465d..61595f6 100644 --- a/drivers/scsi/initio.c +++ b/drivers/scsi/initio.c @@ -2616,6 +2616,7 @@ static void initio_build_scb(struct initio_host * host, struct scsi_ctrl_blk * c scsi_for_each_sg(cmnd, sglist, cblk-sglen, i) { sg-data = cpu_to_le32((u32)sg_dma_address(sglist)); total_len += sg-len = cpu_to_le32((u32)sg_dma_len(sglist)); + sg++; } cblk-buflen = (scsi_bufflen(cmnd) total_len) ? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INITIO scsi driver fails to work properly
On Mon, Dec 17 2007 at 14:18 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Mon, Dec 17 2007 at 13:41 +0200, Filippos Papadopoulos [EMAIL PROTECTED] wrote: On Dec 17, 2007 1:18 PM, Andrew Morton [EMAIL PROTECTED] wrote: On Mon, 17 Dec 2007 11:39:47 +0200 Filippos Papadopoulos [EMAIL PROTECTED] wrote: Hi, I have got an INITIO 9100 UW SCSI Controller with an IBM IC35L036UWD210-0 scsi hard disk on a 32 bit x86 system. Currently i have SUSE 10.1 (Kernel 2.6.16). I tried to install OpenSUSE 10.3 (kernel 2.6.22.5) and the latest OpenSUSE 11.0 Alpha 0 (kernel 2.6.24-rc4) but although the initio driver gets loaded during the installation process, yast reports that no hard disk is found. I believe that this isnt a bug in suse's yast but a problem in the initio scsi driver because i also tried to install Fedora 8 (kernel 2.6.23) with the same problem. I have seen the relevant thread Conflict when loading initio driver and i suppose that the initio driver isnt fixed yet. I can help testing the new patches in the initio driver if someone is interested. initio doesn't seem to have a maintainer... Are you able to identify any earlier kernel which worked OK? I have this PC configuration since 2002. The initio driver worked perfectly with 2.4 kernel series. With the release of 2.6 kernel series the driver had been marked as BROKEN and fixed at 2.6.9 (see at http://www.gossamer-threads.com/lists/linux/kernel/482582?search_string=SCSI%20updates%20for%202.6.9;#482582 Christoph Hellwig -don't mark the initio 9100 driver broken) Maybe it's a new device? If you can get the `lspci -vvxx' output for that device we can take a look. No its not a new device. - I have found one problem. Please try patch [2] below and report. If it still fails try to enable debugging by setting with patch [1] these values at top of drivers/scsi/initio.c. And send dmsgs. Boaz I forgot to ask. 2.6.22 should work. If still fails could you try this 2.6.22.xx kernel? Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INITIO scsi driver fails to work properly
On Mon, Dec 17 2007 at 15:05 +0200, Alan Cox [EMAIL PROTECTED] wrote: initio doesn't seem to have a maintainer... Are you able to identify any earlier kernel which worked OK? Maybe it's a new device? If you can get the `lspci -vvxx' output for that device we can take a look. If I remember rightly the fixes for this went into the scsi tree a couple of months ago. The patch is in the -mm tree as well. No idea why its gotten stuck as an obvious one liner. Alan - You mean this one: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=ba2c270154cc90c9a8bfc45b7bed4cca78c75aaf It's only queued for 2.6.25 via scsi-misc. I have found another bug. (See other mail in thread). I Will wait for testing and submit a proper patch. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INITIO scsi driver fails to work properly
On Mon, Dec 17 2007 at 17:03 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Mon, 2007-12-17 at 14:36 +, Alan Cox wrote: On Mon, 17 Dec 2007 16:40:53 +0200 Boaz Harrosh [EMAIL PROTECTED] wrote: On Mon, Dec 17 2007 at 15:05 +0200, Alan Cox [EMAIL PROTECTED] wrote: initio doesn't seem to have a maintainer... Are you able to identify any earlier kernel which worked OK? Maybe it's a new device? If you can get the `lspci -vvxx' output for that device we can take a look. If I remember rightly the fixes for this went into the scsi tree a couple of months ago. The patch is in the -mm tree as well. No idea why its gotten stuck as an obvious one liner. Alan - You mean this one: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=ba2c270154cc90c9a8bfc45b7bed4cca78c75aaf It's only queued for 2.6.25 via scsi-misc. I have found another bug. (See other mail in thread). I Will wait for testing and submit a proper patch. That one yes - which really should have gone straight into the main tree as the initio driver has been broken all the time it sits queued for future patches. It can't make the problem any worse - the driver does not work. Well, the change log isn't very committal for rush me immediately into main line plus, as far as I could dig out, there was no confirmation that it actually worked. This way, I can now say please try the current -mm kernel to the bug reporter and we get to see if this fixes the problem. James Below fixes a deadly typo. Might as well be included in 2.6.24 Boaz From fdf8ca414f9bb9a5a2cab602991cbac0b128ea65 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh [EMAIL PROTECTED] Date: Mon, 17 Dec 2007 18:04:11 +0200 Subject: [PATCH] initio: bugfix for accessors patch patch: [SCSI] initio: convert to use the data buffer accessors had a small but fatal bug. Fixed here. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/initio.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/initio.c b/drivers/scsi/initio.c index 769a7a8..01bf018 100644 --- a/drivers/scsi/initio.c +++ b/drivers/scsi/initio.c @@ -2616,6 +2616,7 @@ static void initio_build_scb(struct initio_host * host, struct scsi_ctrl_blk * c scsi_for_each_sg(cmnd, sglist, cblk-sglen, i) { sg-data = cpu_to_le32((u32)sg_dma_address(sglist)); total_len += sg-len = cpu_to_le32((u32)sg_dma_len(sglist)); + ++sg; } cblk-buflen = (scsi_bufflen(cmnd) total_len) ? -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INITIO scsi driver fails to work properly
On Mon, Dec 17 2007 at 18:20 +0200, Olivier Galibert [EMAIL PROTECTED] wrote: On Mon, Dec 17, 2007 at 06:08:59PM +0200, Boaz Harrosh wrote: Below fixes a deadly typo. Might as well be included in 2.6.24 You're sure ? scsi_for_each_sg includes a (sg)++ already... scsi_for_each_sg(cmnd, sglist, cblk-sglen, i) { sg-data = cpu_to_le32((u32)sg_dma_address(sglist)); total_len += sg-len = cpu_to_le32((u32)sg_dma_len(sglist)); +++sg; } OG. -- Don't mix up between the here sg that points to a driver specific struct sg_entry and the here sglist which points to struct scatterlist, and is named sg inside the scsi_for_each_sg() macro. Please inspect the full code, the patch does not show the complete information. I admit it's confusing. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: INITIO scsi driver fails to work properly
On Wed, Dec 19 2007 at 10:48 +0200, Filippos Papadopoulos [EMAIL PROTECTED] wrote: On Dec 17, 2007 2:18 PM, Boaz Harrosh [EMAIL PROTECTED] wrote: I have found one problem. Please try patch [2] below and report. If it still fails try to enable debugging by setting with patch [1] these values at top of drivers/scsi/initio.c. And send dmsgs. Boaz I tried patch[2] (addition of sg++) at 2.6.24-rc5-mm1 but the system hangs after some seconds when the initio driver loads. I will try patch[1] next week to see what happens. Please first pull from scsi-rc-fixes git-tree first. it has a couple of other fixes for initio plus patch[2] included. (maybe its already in -mm tree I'm not sure). If still don't work try enable debug with patch[1] and send messages Would it be better to open a bug report at bugzilla? I would prefer linux-scsi ml snip Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] sg_ring for scsi
On Thu, Dec 20 2007 at 9:58 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Thu, Dec 20 2007, Rusty Russell wrote: On Thursday 20 December 2007 18:07:41 FUJITA Tomonori wrote: On Thu, 20 Dec 2007 16:45:18 +1100 Rusty Russell [EMAIL PROTECTED] wrote: OK, some fixes since last time, as I wade through more SCSI drivers. Some drivers use use_sg as a flag to know whether the request_buffer is a scatterlist: I don't need the counter, but I still need the flag, so I fixed that in a more intuitive way (an explicit -sg pointer in the cmd). use_sg and the request_buffer will be removed shortly. http://marc.info/?l=linux-scsim=119754650614813w=2 Thanks! Is there a git tree somewhere with these changes? I think that we tried the similar idea before, scsi_sgtable, but we seem to settle in the current simple approach. Yes, a scsi-specific solution is a bad idea: other people use sg. Manipulating the magic chains is horrible; it looks simple to the places which simply want to iterate through it, but it's awful for code which wants to create them. The current code looks like that to minimize impact on 2.6.24, see this branch: http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=sg for how it folds into lib/sg.c and the magic disappears from SCSI. Rusty, nobody claimed the sg code in 2.6.24 is perfect. I like to get things incrementally there. Dear Jens. Is this code scheduled for 2.6.25? I cannot find it in mm tree. Because above code conflicts with the scsi_data_buffer patch, that is in mm tree and I hope will get accepted into 2.6.25. Now the concepts are not at all conflicting, only that they both bang in the same places. (And by the way it does not apply on scsi-misc either. And it did not compile in your tree, a missing bit in ide-scsi.c) I have rebase and fixed your code on top of scsi_data_buffer patchset. Please review. (Patchset posted as reply to this mail) They are totaly untested, based on -mm tree. We should decide the order of these patches and rebase accordingly. AND ... Please send, to-be-included-in-next-kernel patches to Morton. This way we can account for them. Also I do not see Ack-by: of the scsi maintainer in the scsi bits of your patches. Is it not a costume to Ack-on bits that belong to other maintainers, even for maintainers? Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers
Manually doing chained sg lists is not trivial, so add some helpers to make sure that drivers get it right. Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- include/linux/scatterlist.h | 125 --- lib/Makefile|2 +- lib/scatterlist.c | 281 +++ 3 files changed, 307 insertions(+), 101 deletions(-) create mode 100644 lib/scatterlist.c diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 416e000..c3ca848 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -7,6 +7,12 @@ #include linux/string.h #include asm/io.h +struct sg_table { + struct scatterlist *sgl;/* the list */ + unsigned int nents; /* number of mapped entries */ + unsigned int orig_nents;/* original size of list */ +}; + /* * Notes on SG table design. * @@ -106,31 +112,6 @@ static inline void sg_set_buf(struct scatterlist *sg, const void *buf, sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); } -/** - * sg_next - return the next scatterlist entry in a list - * @sg:The current sg entry - * - * Description: - * Usually the next entry will be @sg@ + 1, but if this sg element is part - * of a chained scatterlist, it could jump to the start of a new - * scatterlist array. - * - **/ -static inline struct scatterlist *sg_next(struct scatterlist *sg) -{ -#ifdef CONFIG_DEBUG_SG - BUG_ON(sg-sg_magic != SG_MAGIC); -#endif - if (sg_is_last(sg)) - return NULL; - - sg++; - if (unlikely(sg_is_chain(sg))) - sg = sg_chain_ptr(sg); - - return sg; -} - /* * Loop over each sg element, following the pointer to a new list if necessary */ @@ -138,40 +119,6 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg) for (__i = 0, sg = (sglist); __i (nr); __i++, sg = sg_next(sg)) /** - * sg_last - return the last scatterlist entry in a list - * @sgl: First entry in the scatterlist - * @nents: Number of entries in the scatterlist - * - * Description: - * Should only be used casually, it (currently) scan the entire list - * to get the last entry. - * - * Note that the @sgl@ pointer passed in need not be the first one, - * the important bit is that @nents@ denotes the number of entries that - * exist from @[EMAIL PROTECTED] - * - **/ -static inline struct scatterlist *sg_last(struct scatterlist *sgl, - unsigned int nents) -{ -#ifndef ARCH_HAS_SG_CHAIN - struct scatterlist *ret = sgl[nents - 1]; -#else - struct scatterlist *sg, *ret = NULL; - unsigned int i; - - for_each_sg(sgl, sg, nents, i) - ret = sg; - -#endif -#ifdef CONFIG_DEBUG_SG - BUG_ON(sgl[0].sg_magic != SG_MAGIC); - BUG_ON(!sg_is_last(ret)); -#endif - return ret; -} - -/** * sg_chain - Chain two sglists together * @prv: First scatterlist * @prv_nents: Number of entries in prv @@ -223,47 +170,6 @@ static inline void sg_mark_end(struct scatterlist *sg) } /** - * sg_init_table - Initialize SG table - * @sgl: The SG table - * @nents:Number of entries in table - * - * Notes: - * If this is part of a chained sg table, sg_mark_end() should be - * used only on the last table part. - * - **/ -static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents) -{ - memset(sgl, 0, sizeof(*sgl) * nents); -#ifdef CONFIG_DEBUG_SG - { - unsigned int i; - for (i = 0; i nents; i++) - sgl[i].sg_magic = SG_MAGIC; - } -#endif - sg_mark_end(sgl[nents - 1]); -} - -/** - * sg_init_one - Initialize a single entry sg list - * @sg: SG entry - * @buf:Virtual address for IO - * @buflen: IO length - * - * Notes: - * This should not be used on a single entry that is part of a larger - * table. Use sg_init_table() for that. - * - **/ -static inline void sg_init_one(struct scatterlist *sg, const void *buf, - unsigned int buflen) -{ - sg_init_table(sg, 1); - sg_set_buf(sg, buf, buflen); -} - -/** * sg_phys - Return physical address of an sg entry * @sg: SG entry * @@ -293,4 +199,23 @@ static inline void *sg_virt(struct scatterlist *sg) return page_address(sg_page(sg)) + sg-offset; } +struct scatterlist *sg_next(struct scatterlist *); +struct scatterlist *sg_last(struct scatterlist *s, unsigned int); +void sg_init_table(struct scatterlist *, unsigned int); +void sg_init_one(struct scatterlist *, const void *, unsigned int); + +typedef struct scatterlist *(sg_alloc_fn)(unsigned int, gfp_t); +typedef void (sg_free_fn)(struct scatterlist *, unsigned int); + +void __sg_free_table(struct sg_table *, sg_free_fn *); +void sg_free_table(struct sg_table *); +int __sg_alloc_table(struct sg_table *,
[PATCH 2/3] SG: Convert SCSI to use scatterlist helpers for sg chaining
From: Jens Axboe [EMAIL PROTECTED] Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- drivers/scsi/libsrp.c|2 +- drivers/scsi/scsi_error.c|4 +- drivers/scsi/scsi_lib.c | 150 +- drivers/usb/storage/isd200.c |4 +- include/scsi/scsi_cmnd.h |9 +-- 5 files changed, 24 insertions(+), 145 deletions(-) diff --git a/drivers/scsi/libsrp.c b/drivers/scsi/libsrp.c index 8a8562a..b81350d 100644 --- a/drivers/scsi/libsrp.c +++ b/drivers/scsi/libsrp.c @@ -427,7 +427,7 @@ int srp_cmd_queue(struct Scsi_Host *shost, struct srp_cmd *cmd, void *info, sc-SCp.ptr = info; memcpy(sc-cmnd, cmd-cdb, MAX_COMMAND_SIZE); sc-sdb.length = len; - sc-sdb.sglist = (void *) (unsigned long) addr; + sc-sdb.sgt.sgl = (void *) (unsigned long) addr; sc-tag = tag; err = scsi_tgt_queue_command(sc, itn_id, (struct scsi_lun *)cmd-lun, cmd-tag); diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 5c8ba6a..1fd2a8c 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -629,9 +629,9 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, sizeof(scmd-sense_buffer), sense_bytes); sg_init_one(ses-sense_sgl, scmd-sense_buffer, scmd-sdb.length); - scmd-sdb.sglist = ses-sense_sgl; + scmd-sdb.sgt.sgl = ses-sense_sgl; scmd-sc_data_direction = DMA_FROM_DEVICE; - scmd-sdb.sg_count = 1; + scmd-sdb.sgt.nents = 1; memset(scmd-cmnd, 0, sizeof(scmd-cmnd)); scmd-cmnd[0] = REQUEST_SENSE; scmd-cmnd[4] = scmd-sdb.length; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index a6aae56..c7107f1 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -750,136 +750,15 @@ static inline unsigned int scsi_sgtable_index(unsigned short nents) return index; } -static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, - unsigned short sg_count, gfp_t gfp_mask) +static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask) { - struct scsi_host_sg_pool *sgp; - struct scatterlist *sgl, *prev, *ret; - unsigned int index; - int this, left; - - left = sg_count; - ret = prev = NULL; - do { - this = left; - if (this SCSI_MAX_SG_SEGMENTS) { - this = SCSI_MAX_SG_SEGMENTS - 1; - index = SG_MEMPOOL_NR - 1; - } else - index = scsi_sgtable_index(this); - - left -= this; - - sgp = scsi_sg_pools + index; - - sgl = mempool_alloc(sgp-pool, gfp_mask); - if (unlikely(!sgl)) - goto enomem; - - sg_init_table(sgl, sgp-size); - - /* -* first loop through, set initial index and return value -*/ - if (!ret) - ret = sgl; - - /* -* chain previous sglist, if any. we know the previous -* sglist must be the biggest one, or we would not have -* ended up doing another loop. -*/ - if (prev) - sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl); - - /* -* if we have nothing left, mark the last segment as -* end-of-list -*/ - if (!left) - sg_mark_end(sgl[this - 1]); - - /* -* don't allow subsequent mempool allocs to sleep, it would -* violate the mempool principle. -*/ - gfp_mask = ~__GFP_WAIT; - gfp_mask |= __GFP_HIGH; - prev = sgl; - } while (left); - - /* -* -use_sg may get modified after dma mapping has potentially -* shrunk the number of segments, so keep a copy of it for free. -*/ - sdb-alloc_sg_count = sdb-sg_count = sg_count; - sdb-sglist = ret; - return 0; -enomem: - if (ret) { - /* -* Free entries chained off ret. Since we were trying to -* allocate another sglist, we know that all entries are of -* the max size. -*/ - sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1; - prev = ret; - ret = ret[SCSI_MAX_SG_SEGMENTS - 1]; - - while ((sgl = sg_chain_ptr(ret)) != NULL) { - ret = sgl[SCSI_MAX_SG_SEGMENTS - 1]; - mempool_free(sgl, sgp-pool); - } - - mempool_free(prev, sgp-pool); -
[PATCH 3/3] SG: Update ide/ to use sg_table
From: Jens Axboe [EMAIL PROTECTED] Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- drivers/ide/arm/icside.c |6 +++--- drivers/ide/cris/ide-cris.c |2 +- drivers/ide/ide-dma.c |8 drivers/ide/ide-io.c |2 +- drivers/ide/ide-probe.c |6 +- drivers/ide/ide-taskfile.c|2 +- drivers/ide/ide.c |2 +- drivers/ide/mips/au1xxx-ide.c |8 drivers/ide/pci/sgiioc4.c |4 ++-- drivers/ide/ppc/pmac.c|6 +++--- drivers/scsi/ide-scsi.c |2 +- include/linux/ide.h |2 +- 12 files changed, 23 insertions(+), 27 deletions(-) diff --git a/drivers/ide/arm/icside.c b/drivers/ide/arm/icside.c index 93f71fc..a48a6bd 100644 --- a/drivers/ide/arm/icside.c +++ b/drivers/ide/arm/icside.c @@ -210,7 +210,7 @@ static void icside_build_sglist(ide_drive_t *drive, struct request *rq) { ide_hwif_t *hwif = drive-hwif; struct icside_state *state = hwif-hwif_data; - struct scatterlist *sg = hwif-sg_table; + struct scatterlist *sg = hwif-sg_table.sgl; ide_map_sg(drive, rq); @@ -319,7 +319,7 @@ static int icside_dma_end(ide_drive_t *drive) disable_dma(ECARD_DEV(state-dev)-dma); /* Teardown mappings after DMA has completed. */ - dma_unmap_sg(state-dev, hwif-sg_table, hwif-sg_nents, + dma_unmap_sg(state-dev, hwif-sg_table.sgl, hwif-sg_nents, hwif-sg_dma_direction); return get_dma_residue(ECARD_DEV(state-dev)-dma) != 0; @@ -373,7 +373,7 @@ static int icside_dma_setup(ide_drive_t *drive) * Tell the DMA engine about the SG table and * data direction. */ - set_dma_sg(ECARD_DEV(state-dev)-dma, hwif-sg_table, hwif-sg_nents); + set_dma_sg(ECARD_DEV(state-dev)-dma, hwif-sg_table.sgl, hwif-sg_nents); set_dma_mode(ECARD_DEV(state-dev)-dma, dma_mode); drive-waiting_for_dma = 1; diff --git a/drivers/ide/cris/ide-cris.c b/drivers/ide/cris/ide-cris.c index 476e0d6..3701aca 100644 --- a/drivers/ide/cris/ide-cris.c +++ b/drivers/ide/cris/ide-cris.c @@ -919,7 +919,7 @@ static int cris_ide_build_dmatable (ide_drive_t *drive) unsigned int count = 0; int i = 0; - sg = hwif-sg_table; + sg = hwif-sg_table.sgl; ata_tot_size = 0; diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c index 4703837..33a2f56 100644 --- a/drivers/ide/ide-dma.c +++ b/drivers/ide/ide-dma.c @@ -190,7 +190,7 @@ static int ide_dma_good_drive(ide_drive_t *drive) int ide_build_sglist(ide_drive_t *drive, struct request *rq) { ide_hwif_t *hwif = HWIF(drive); - struct scatterlist *sg = hwif-sg_table; + struct scatterlist *sg = hwif-sg_table.sgl; BUG_ON((rq-cmd_type == REQ_TYPE_ATA_TASKFILE) rq-nr_sectors 256); @@ -234,7 +234,7 @@ int ide_build_dmatable (ide_drive_t *drive, struct request *rq) if (!i) return 0; - sg = hwif-sg_table; + sg = hwif-sg_table.sgl; while (i) { u32 cur_addr; u32 cur_len; @@ -293,7 +293,7 @@ int ide_build_dmatable (ide_drive_t *drive, struct request *rq) printk(KERN_ERR %s: empty DMA table?\n, drive-name); use_pio_instead: pci_unmap_sg(hwif-pci_dev, -hwif-sg_table, +hwif-sg_table.sgl, hwif-sg_nents, hwif-sg_dma_direction); return 0; /* revert to PIO for this request */ @@ -315,7 +315,7 @@ EXPORT_SYMBOL_GPL(ide_build_dmatable); void ide_destroy_dmatable (ide_drive_t *drive) { struct pci_dev *dev = HWIF(drive)-pci_dev; - struct scatterlist *sg = HWIF(drive)-sg_table; + struct scatterlist *sg = HWIF(drive)-sg_table.sgl; int nents = HWIF(drive)-sg_nents; pci_unmap_sg(dev, sg, nents, HWIF(drive)-sg_dma_direction); diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c index bef781f..638e2db 100644 --- a/drivers/ide/ide-io.c +++ b/drivers/ide/ide-io.c @@ -819,7 +819,7 @@ static ide_startstop_t do_special (ide_drive_t *drive) void ide_map_sg(ide_drive_t *drive, struct request *rq) { ide_hwif_t *hwif = drive-hwif; - struct scatterlist *sg = hwif-sg_table; + struct scatterlist *sg = hwif-sg_table.sgl; if (hwif-sg_mapped)/* needed by ide-scsi */ return; diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c index 2994523..770f8cf 100644 --- a/drivers/ide/ide-probe.c +++ b/drivers/ide/ide-probe.c @@ -1312,15 +1312,11 @@ static int hwif_init(ide_hwif_t *hwif) if (!hwif-sg_max_nents) hwif-sg_max_nents = PRD_ENTRIES; - hwif-sg_table = kmalloc(sizeof(struct scatterlist)*hwif-sg_max_nents, -GFP_KERNEL); - if (!hwif-sg_table) { + if (sg_alloc_table(hwif-sg_table, hwif-sg_max_nents, GFP_KERNEL)) { printk(KERN_ERR %s: unable to allocate SG
Re: [PATCH 1/3] SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers
A small comment in body On Thu, Dec 20 2007 at 16:00 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: Manually doing chained sg lists is not trivial, so add some helpers to make sure that drivers get it right. Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- include/linux/scatterlist.h | 125 --- lib/Makefile|2 +- lib/scatterlist.c | 281 +++ 3 files changed, 307 insertions(+), 101 deletions(-) create mode 100644 lib/scatterlist.c diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 416e000..c3ca848 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -7,6 +7,12 @@ #include linux/string.h #include asm/io.h +struct sg_table { + struct scatterlist *sgl;/* the list */ + unsigned int nents; /* number of mapped entries */ + unsigned int orig_nents;/* original size of list */ +}; + /* * Notes on SG table design. * @@ -106,31 +112,6 @@ static inline void sg_set_buf(struct scatterlist *sg, const void *buf, sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); } -/** - * sg_next - return the next scatterlist entry in a list - * @sg: The current sg entry - * - * Description: - * Usually the next entry will be @sg@ + 1, but if this sg element is part - * of a chained scatterlist, it could jump to the start of a new - * scatterlist array. - * - **/ -static inline struct scatterlist *sg_next(struct scatterlist *sg) -{ -#ifdef CONFIG_DEBUG_SG - BUG_ON(sg-sg_magic != SG_MAGIC); -#endif - if (sg_is_last(sg)) - return NULL; - - sg++; - if (unlikely(sg_is_chain(sg))) - sg = sg_chain_ptr(sg); - - return sg; -} - /* * Loop over each sg element, following the pointer to a new list if necessary */ @@ -138,40 +119,6 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg) for (__i = 0, sg = (sglist); __i (nr); __i++, sg = sg_next(sg)) /** - * sg_last - return the last scatterlist entry in a list - * @sgl: First entry in the scatterlist - * @nents: Number of entries in the scatterlist - * - * Description: - * Should only be used casually, it (currently) scan the entire list - * to get the last entry. - * - * Note that the @sgl@ pointer passed in need not be the first one, - * the important bit is that @nents@ denotes the number of entries that - * exist from @[EMAIL PROTECTED] - * - **/ -static inline struct scatterlist *sg_last(struct scatterlist *sgl, - unsigned int nents) -{ -#ifndef ARCH_HAS_SG_CHAIN - struct scatterlist *ret = sgl[nents - 1]; -#else - struct scatterlist *sg, *ret = NULL; - unsigned int i; - - for_each_sg(sgl, sg, nents, i) - ret = sg; - -#endif -#ifdef CONFIG_DEBUG_SG - BUG_ON(sgl[0].sg_magic != SG_MAGIC); - BUG_ON(!sg_is_last(ret)); -#endif - return ret; -} - -/** * sg_chain - Chain two sglists together * @prv: First scatterlist * @prv_nents: Number of entries in prv @@ -223,47 +170,6 @@ static inline void sg_mark_end(struct scatterlist *sg) } /** - * sg_init_table - Initialize SG table - * @sgl:The SG table - * @nents: Number of entries in table - * - * Notes: - * If this is part of a chained sg table, sg_mark_end() should be - * used only on the last table part. - * - **/ -static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents) -{ - memset(sgl, 0, sizeof(*sgl) * nents); -#ifdef CONFIG_DEBUG_SG - { - unsigned int i; - for (i = 0; i nents; i++) - sgl[i].sg_magic = SG_MAGIC; - } -#endif - sg_mark_end(sgl[nents - 1]); -} - -/** - * sg_init_one - Initialize a single entry sg list - * @sg: SG entry - * @buf: Virtual address for IO - * @buflen: IO length - * - * Notes: - * This should not be used on a single entry that is part of a larger - * table. Use sg_init_table() for that. - * - **/ -static inline void sg_init_one(struct scatterlist *sg, const void *buf, -unsigned int buflen) -{ - sg_init_table(sg, 1); - sg_set_buf(sg, buf, buflen); -} - -/** * sg_phys - Return physical address of an sg entry * @sg: SG entry * @@ -293,4 +199,23 @@ static inline void *sg_virt(struct scatterlist *sg) return page_address(sg_page(sg)) + sg-offset; } +struct scatterlist *sg_next(struct scatterlist *); I don't like that this became exported. I would prefer if it could stay inline. if at all possible? +struct scatterlist *sg_last(struct scatterlist *s, unsigned int); +void sg_init_table(struct scatterlist *, unsigned int); +void sg_init_one(struct scatterlist *, const void
Re: [PATCH 2/3] SG: Convert SCSI to use scatterlist helpers for sg chaining
On Thu, Dec 20 2007 at 16:03 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: From: Jens Axboe [EMAIL PROTECTED] Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- drivers/scsi/libsrp.c|2 +- drivers/scsi/scsi_error.c|4 +- drivers/scsi/scsi_lib.c | 150 +- drivers/usb/storage/isd200.c |4 +- include/scsi/scsi_cmnd.h |9 +-- 5 files changed, 24 insertions(+), 145 deletions(-) diff --git a/drivers/scsi/libsrp.c b/drivers/scsi/libsrp.c index 8a8562a..b81350d 100644 --- a/drivers/scsi/libsrp.c +++ b/drivers/scsi/libsrp.c @@ -427,7 +427,7 @@ int srp_cmd_queue(struct Scsi_Host *shost, struct srp_cmd *cmd, void *info, sc-SCp.ptr = info; memcpy(sc-cmnd, cmd-cdb, MAX_COMMAND_SIZE); sc-sdb.length = len; - sc-sdb.sglist = (void *) (unsigned long) addr; + sc-sdb.sgt.sgl = (void *) (unsigned long) addr; sc-tag = tag; err = scsi_tgt_queue_command(sc, itn_id, (struct scsi_lun *)cmd-lun, cmd-tag); diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 5c8ba6a..1fd2a8c 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -629,9 +629,9 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, sizeof(scmd-sense_buffer), sense_bytes); sg_init_one(ses-sense_sgl, scmd-sense_buffer, scmd-sdb.length); - scmd-sdb.sglist = ses-sense_sgl; + scmd-sdb.sgt.sgl = ses-sense_sgl; scmd-sc_data_direction = DMA_FROM_DEVICE; - scmd-sdb.sg_count = 1; + scmd-sdb.sgt.nents = 1; memset(scmd-cmnd, 0, sizeof(scmd-cmnd)); scmd-cmnd[0] = REQUEST_SENSE; scmd-cmnd[4] = scmd-sdb.length; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index a6aae56..c7107f1 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -750,136 +750,15 @@ static inline unsigned int scsi_sgtable_index(unsigned short nents) return index; } -static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, - unsigned short sg_count, gfp_t gfp_mask) +static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask) { - struct scsi_host_sg_pool *sgp; - struct scatterlist *sgl, *prev, *ret; - unsigned int index; - int this, left; - - left = sg_count; - ret = prev = NULL; - do { - this = left; - if (this SCSI_MAX_SG_SEGMENTS) { - this = SCSI_MAX_SG_SEGMENTS - 1; - index = SG_MEMPOOL_NR - 1; - } else - index = scsi_sgtable_index(this); - - left -= this; - - sgp = scsi_sg_pools + index; - - sgl = mempool_alloc(sgp-pool, gfp_mask); - if (unlikely(!sgl)) - goto enomem; - - sg_init_table(sgl, sgp-size); - - /* - * first loop through, set initial index and return value - */ - if (!ret) - ret = sgl; - - /* - * chain previous sglist, if any. we know the previous - * sglist must be the biggest one, or we would not have - * ended up doing another loop. - */ - if (prev) - sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl); - - /* - * if we have nothing left, mark the last segment as - * end-of-list - */ - if (!left) - sg_mark_end(sgl[this - 1]); - - /* - * don't allow subsequent mempool allocs to sleep, it would - * violate the mempool principle. - */ - gfp_mask = ~__GFP_WAIT; - gfp_mask |= __GFP_HIGH; - prev = sgl; - } while (left); - - /* - * -use_sg may get modified after dma mapping has potentially - * shrunk the number of segments, so keep a copy of it for free. - */ - sdb-alloc_sg_count = sdb-sg_count = sg_count; - sdb-sglist = ret; - return 0; -enomem: - if (ret) { - /* - * Free entries chained off ret. Since we were trying to - * allocate another sglist, we know that all entries are of - * the max size. - */ - sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1; - prev = ret; - ret = ret[SCSI_MAX_SG_SEGMENTS - 1]; - - while ((sgl = sg_chain_ptr(ret)) != NULL) { - ret = sgl[SCSI_MAX_SG_SEGMENTS - 1]; - mempool_free(sgl, sgp-pool
Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd
On Fri, Dec 21 2007 at 4:30 +0200, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: The sense buffer ins scsi_cmnd can nowadays be DMA'ed into directly by some low level drivers (that typically happens with USB mass storage). This is a problem on non cache coherent architectures such as embedded PowerPCs where the sense buffer can share cache lines with other structure members, which leads to various forms of corruption. This uses the newly defined __dma_buffer annotation to enforce that on such platforms, the sense_buffer is contained within its own cache line. This has no effect on cache coherent architectures. Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED] --- include/scsi/scsi_cmnd.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-merge.orig/include/scsi/scsi_cmnd.h 2007-12-21 13:07:14.0 +1100 +++ linux-merge/include/scsi/scsi_cmnd.h 2007-12-21 13:07:29.0 +1100 @@ -88,7 +88,7 @@ struct scsi_cmnd { working on */ #define SCSI_SENSE_BUFFERSIZE96 - unsigned char sense_buffer[SCSI_SENSE_BUFFERSIZE]; + unsigned char sense_buffer[SCSI_SENSE_BUFFERSIZE] __dma_buffer; /* obtained by REQUEST SENSE when * CHECK CONDITION is received on original * command (auto-sense) */ This has the potential of leaving a big fat ugly hole in the middle of scsi_cmnd. I would suggest of *just* moving the sense_buffer array to be the *first member* of struct scsi_cmnd. The command itself is already cache aligned, allocated by the proper flags to it's slab. And put a fat comment near it's definition. This is until a proper fix is sent. I have in my Q a proposition for a more prominent solution, which I will send next month. Do to short comings in the sense handling and optimizations, but should definitely cover this problem. The code should have time to be discussed and tested, so it is only 2.6.26 material. For the duration of the 2.6.25 kernel we can live with a reorder of scsi_cmnd members, if it solves such a grave bug for some ARCHs. Boaz [RFD below] My proposed solution will be has follows: 1. Since the scsi protocol mandates an immediate REQUEST_SENSE after an error in effect the Q is frozen until the REQUEST_SENSE command returns. 2. The scsi-ml needs the sense buffer for its normal operation, independent from the ULD's request of the sence_buffer or not at request-sense. But in effect, 90% of all scsi-requests come with ULD's allocated buffer for sense, that is copied to, on command completion. 3. 99% of all commands complete successfully, so if an optimization is proposed for the successful case, sacrificing a few cycles for the error case than thats a good thing. My suggestion is to have a per Q, driver-overridable, sense buffer that is DMAed/written to by drivers. At the end of the REQUEST_SENSE command one of 2 things will be done. Either copy the sense to the ULD's supplied buffer, or if not available, allocate one from a dedicated mem_cache pool. So we are completely saving 92 bytes from scsi_cmnd by replacing the buffer with a pointer. We can always read the sense into a per Q buffer. And 10% of %1 of the times we will need to allocate a sense buffer from a dedicated mem_cache I would say thats a nice optimization. The changes to scsi_error/scsi_cmnd and friends, is pretty strait forward. But it depends on a conversion of 4/5 drivers to the new scsi_eh API for REQUEST_SENSE. I have only converted these drivers that interfered with the accessors effort + 1 other places. But there are a few more places that are not converted. Once done. The implementation can easily change with no affect on drivers. The reason I've started with this work is because the SCSI standard permits up to 260 bytes of sense, which you guest right, is needed by the OSD command set. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3-mm2: Result: hostbyte=0x01 driverbyte=0x00\nend_request: I/O error
On Thu, Nov 29 2007 at 1:36 +0200, Andrew Morton [EMAIL PROTECTED] wrote: On Wed, 28 Nov 2007 16:14:21 -0700 Matthew Wilcox [EMAIL PROTECTED] wrote: On Wed, Nov 28, 2007 at 01:40:36PM -0800, Andrew Morton wrote: On Wed, 28 Nov 2007 23:01:31 +0300 Alexey Dobriyan [EMAIL PROTECTED] wrote: Reliably spams dmesg with end_request() horrors. This happens when git starts checking out linux tree to fresh ext2 partition. Disk is several month old and there were no prolems with, say, 2.6.24-rc3: Could you try reverting 6f5391c283d7fdcf24bf40786ea79061919d1e1d and see if the problem still exists? That's not completely trivial.. I did a hand-made revert against 2.6.24-rc3-mm2 (below) but some other patch in there causes: drivers/scsi/scsi_lib.c: In function 'scsi_blk_pc_done': drivers/scsi/scsi_lib.c:1251: error: 'struct scsi_cmnd' has no member named 'request_bufflen' That would be the bidi patches. You need to use scsi_bufflen(cmd) instead of cmd-request_bufflen. (See below) Do you need that I send a patch? --- a/drivers/scsi/scsi.c~revert-6f5391c283d7fdcf24bf40786ea79061919d1e1d +++ a/drivers/scsi/scsi.c @@ -59,7 +59,6 @@ #include scsi/scsi_cmnd.h #include scsi/scsi_dbg.h #include scsi/scsi_device.h -#include scsi/scsi_driver.h #include scsi/scsi_eh.h #include scsi/scsi_host.h #include scsi/scsi_tcq.h @@ -379,8 +378,9 @@ void scsi_log_send(struct scsi_cmnd *cmd scsi_print_command(cmd); if (level 3) { printk(KERN_INFO buffer = 0x%p, bufflen = %d, - queuecommand 0x%p\n, + done = 0x%p, queuecommand 0x%p\n, scsi_sglist(cmd), scsi_bufflen(cmd), + cmd-done, cmd-device-host-hostt-queuecommand); } @@ -667,12 +667,6 @@ void __scsi_done(struct scsi_cmnd *cmd) blk_complete_request(rq); } -/* Move this to a header if it becomes more generally useful */ -static struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd) -{ - return *(struct scsi_driver **)cmd-request-rq_disk-private_data; -} - /** * scsi_finish_command - cleanup and pass command back to upper layer * @cmd: the command @@ -685,8 +679,6 @@ void scsi_finish_command(struct scsi_cmn { struct scsi_device *sdev = cmd-device; struct Scsi_Host *shost = sdev-host; - struct scsi_driver *drv; - unsigned int good_bytes; scsi_device_unbusy(sdev); @@ -712,13 +704,7 @@ void scsi_finish_command(struct scsi_cmn Notifying upper driver of completion (result %x)\n, cmd-result)); - good_bytes = scsi_bufflen(cmd); -if (cmd-request-cmd_type != REQ_TYPE_BLOCK_PC) { - drv = scsi_cmd_to_driver(cmd); - if (drv-done) - good_bytes = drv-done(cmd); - } - scsi_io_completion(cmd, good_bytes); + cmd-done(cmd); } EXPORT_SYMBOL(scsi_finish_command); diff -puN drivers/scsi/scsi_error.c~revert-6f5391c283d7fdcf24bf40786ea79061919d1e1d drivers/scsi/scsi_error.c --- a/drivers/scsi/scsi_error.c~revert-6f5391c283d7fdcf24bf40786ea79061919d1e1d +++ a/drivers/scsi/scsi_error.c @@ -1697,6 +1697,7 @@ scsi_reset_provider(struct scsi_device * scmd-scsi_done = scsi_reset_provider_done_command; memset(scmd-sdb, 0, sizeof(scmd-sdb)); + scmd-done = NULL; scmd-cmd_len = 0; diff -puN drivers/scsi/scsi_lib.c~revert-6f5391c283d7fdcf24bf40786ea79061919d1e1d drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c~revert-6f5391c283d7fdcf24bf40786ea79061919d1e1d +++ a/drivers/scsi/scsi_lib.c @@ -944,6 +944,7 @@ void scsi_end_bidi_request(struct scsi_c scsi_finalize_request(cmd, 1); } +EXPORT_SYMBOL(scsi_io_completion); /* * Function:scsi_io_completion() @@ -1238,6 +1239,18 @@ static struct scsi_cmnd *scsi_get_cmd_fr return cmd; } +static void scsi_blk_pc_done(struct scsi_cmnd *cmd) +{ + BUG_ON(!blk_pc_request(cmd-request)); + /* + * This will complete the whole command with uptodate=1 so + * as far as the block layer is concerned the command completed + * successfully. Since this is a REQ_BLOCK_PC command the + * caller should check the request's errors value + */ + scsi_io_completion(cmd, cmd-request_bufflen); + scsi_io_completion(cmd, scsi_bufflen(cmd)); +} + int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req) { struct scsi_cmnd *cmd; @@ -1285,6 +1298,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_d cmd-transfersize = req-data_len; cmd-allowed = req-retries; cmd-timeout_per_command = req-timeout; + cmd-done =
Re: [PATCH 27/28] blk_end_request: changing scsi mid-layer for bidi (take 3)
On Sat, Dec 01 2007 at 1:35 +0200, Kiyoshi Ueda [EMAIL PROTECTED] wrote: This patch converts bidi of scsi mid-layer to use blk_end_request(). rq-next_rq represents a pair of bidi requests. (There are no other use of 'next_rq' of struct request.) For both requests in the pair, end_that_request_chunk() should be called before end_that_request_last() is called for one of them. Since the calls to end_that_request_first()/chunk() and end_that_request_last() are packaged into blk_end_request(), the handling of next_rq completion has to be moved into blk_end_request(), too. Bidi sets its specific value to rq-data_len before the request is completed so that upper-layer can read it. This setting must be between end_that_request_chunk() and end_that_request_last(), because rq-data_len may be used in end_that_request_chunk() by blk_trace and so on. To satisfy the requirement, use blk_end_request_callback() which is added in PATCH 25 only for the tricky drivers. If bidi didn't reuse rq-data_len and added new members to request for the specific value, it could set before end_that_request_chunk() and use the standard blk_end_request() like below. void scsi_end_bidi_request(struct scsi_cmnd *cmd) { struct request *req = cmd-request; rq-resid = scsi_out(cmd)-resid; rq-next_rq-resid = scsi_in(cmd)-resid; if (blk_end_request(req, 1, req-data_len)) BUG(); scsi_release_buffers(cmd); scsi_next_command(cmd); } Signed-off-by: Kiyoshi Ueda [EMAIL PROTECTED] Signed-off-by: Jun'ichi Nomura [EMAIL PROTECTED] --- block/ll_rw_blk.c | 18 + drivers/scsi/scsi_lib.c | 66 2 files changed, 52 insertions(+), 32 deletions(-) Index: 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c === --- 2.6.24-rc3-mm2.orig/drivers/scsi/scsi_lib.c +++ 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c @@ -629,28 +629,6 @@ void scsi_run_host_queues(struct Scsi_Ho scsi_run_queue(sdev-request_queue); } -static void scsi_finalize_request(struct scsi_cmnd *cmd, int uptodate) -{ - struct request_queue *q = cmd-device-request_queue; - struct request *req = cmd-request; - unsigned long flags; - - add_disk_randomness(req-rq_disk); - - spin_lock_irqsave(q-queue_lock, flags); - if (blk_rq_tagged(req)) - blk_queue_end_tag(q, req); - - end_that_request_last(req, uptodate); - spin_unlock_irqrestore(q-queue_lock, flags); - - /* - * This will goose the queue request function at the end, so we don't - * need to worry about launching another command. - */ - scsi_next_command(cmd); -} - /* * Function:scsi_end_request() * @@ -921,6 +899,20 @@ void scsi_release_buffers(struct scsi_cm EXPORT_SYMBOL(scsi_release_buffers); /* + * Called from blk_end_request_callback() after all DATA in rq and its next_rq + * are completed before rq is completed/freed. + */ +static int scsi_end_bidi_request_cb(struct request *rq) +{ + struct scsi_cmnd *cmd = rq-special; + + rq-data_len = scsi_out(cmd)-resid; + rq-next_rq-data_len = scsi_in(cmd)-resid; + + return 0; +} + +/* * Bidi commands Must be complete as a whole, both sides at once. * If part of the bytes were written and lld returned * scsi_in()-resid and/or scsi_out()-resid this information will be left @@ -931,22 +923,32 @@ void scsi_end_bidi_request(struct scsi_c { struct request *req = cmd-request; - end_that_request_chunk(req, 1, req-data_len); - req-data_len = scsi_out(cmd)-resid; - - end_that_request_chunk(req-next_rq, 1, req-next_rq-data_len); - req-next_rq-data_len = scsi_in(cmd)-resid; - - scsi_release_buffers(cmd); - /* *FIXME: If ll_rw_blk.c is changed to also put_request(req-next_rq) - * in end_that_request_last() then this WARN_ON must be removed. + * in blk_end_request() then this WARN_ON must be removed. * for now, upper-driver must have registered an end_io. */ WARN_ON(!req-end_io); - scsi_finalize_request(cmd, 1); + /* + * blk_end_request() family take care of data completion of next_rq. + * + * req-data_len and req-next_rq-data_len must be set after + * all data are completed, since they may be referenced during + * the data completion process. + * So use the callback feature of blk_end_request() here. + * + * NOTE: If bidi doesn't reuse the data_len field for upper-layer's + * reference (e.g. adds new members for it to struct request), + * we can use the standard blk_end_request() interface here. + */ + if (blk_end_request_callback(req, 1, req-data_len, + scsi_end_bidi_request_cb)) + /*
Re: [PATCH 01/28] blk_end_request: add new request completion interface (take 3)
On Sat, Dec 01 2007 at 1:24 +0200, Kiyoshi Ueda [EMAIL PROTECTED] wrote: This patch adds 2 new interfaces for request completion: o blk_end_request() : called without queue lock o __blk_end_request() : called with queue lock held Some device drivers call some generic functions below between end_that_request_{first/chunk} and end_that_request_last(). o add_disk_randomness() o blk_queue_end_tag() o blkdev_dequeue_request() These are called in the blk_end_request() as a part of generic request completion. So all device drivers become to call above functions. Normal drivers can be converted to use blk_end_request() in a standard way shown below. a) end_that_request_{chunk/first} spin_lock_irqsave() (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request()) end_that_request_last() spin_unlock_irqrestore() = blk_end_request() b) spin_lock_irqsave() end_that_request_{chunk/first} (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request()) end_that_request_last() spin_unlock_irqrestore() = spin_lock_irqsave() __blk_end_request() spin_unlock_irqsave() c) end_that_request_last() = __blk_end_request() Signed-off-by: Kiyoshi Ueda [EMAIL PROTECTED] Signed-off-by: Jun'ichi Nomura [EMAIL PROTECTED] comments in body --- block/ll_rw_blk.c | 67 + include/linux/blkdev.h |2 + 2 files changed, 69 insertions(+) Index: 2.6.24-rc3-mm2/block/ll_rw_blk.c === --- 2.6.24-rc3-mm2.orig/block/ll_rw_blk.c +++ 2.6.24-rc3-mm2/block/ll_rw_blk.c @@ -3769,6 +3769,73 @@ void end_request(struct request *req, in } EXPORT_SYMBOL(end_request); +static void complete_request(struct request *rq, int uptodate) +{ + if (blk_rq_tagged(rq)) + blk_queue_end_tag(rq-q, rq); + + /* rq-queuelist of dequeued request should be list_empty() */ + if (!list_empty(rq-queuelist)) + blkdev_dequeue_request(rq); + + end_that_request_last(rq, uptodate); +} + +/** + * blk_end_request - Helper function for drivers to complete the request. + * @rq: the request being processed + * @uptodate: 1 for success, 0 for I/O error, 0 for specific error + * @nr_bytes: number of bytes to complete + * + * Description: + * Ends I/O on a number of bytes attached to @rq. + * If @rq has leftover, sets it up for the next range of segments. + * + * Return: + * 0 - we are done with this request + * 1 - still buffers pending for this request + **/ +int blk_end_request(struct request *rq, int uptodate, int nr_bytes) I always hated that uptodate boolean with possible negative error value. I guess it was done for backward compatibility of then users of end_that_request_first(). But since you are introducing a new API then this is not the case. Just have regular status int where 0 means ALL_OK and negative value means error code. Just my $0.02. +{ + struct request_queue *q = rq-q; + unsigned long flags = 0UL; + + if (blk_fs_request(rq) || blk_pc_request(rq)) { + if (__end_that_request_first(rq, uptodate, nr_bytes)) + return 1; + } + + add_disk_randomness(rq-rq_disk); + + spin_lock_irqsave(q-queue_lock, flags); + complete_request(rq, uptodate); + spin_unlock_irqrestore(q-queue_lock, flags); + + return 0; +} +EXPORT_SYMBOL_GPL(blk_end_request); + +/** + * __blk_end_request - Helper function for drivers to complete the request. + * + * Description: + * Must be called with queue lock held unlike blk_end_request(). + **/ +int __blk_end_request(struct request *rq, int uptodate, int nr_bytes) +{ + if (blk_fs_request(rq) || blk_pc_request(rq)) { + if (__end_that_request_first(rq, uptodate, nr_bytes)) + return 1; + } + + add_disk_randomness(rq-rq_disk); + + complete_request(rq, uptodate); + + return 0; +} +EXPORT_SYMBOL_GPL(__blk_end_request); + static void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio) { Index: 2.6.24-rc3-mm2/include/linux/blkdev.h === --- 2.6.24-rc3-mm2.orig/include/linux/blkdev.h +++ 2.6.24-rc3-mm2/include/linux/blkdev.h @@ -725,6 +725,8 @@ static inline void blk_run_address_space * for parts of the original function. This prevents * code duplication in drivers. */ +extern int blk_end_request(struct request *rq, int uptodate, int nr_bytes); +extern int __blk_end_request(struct request *rq, int uptodate, int nr_bytes); extern int end_that_request_first(struct request *, int, int); extern int end_that_request_chunk(struct request *, int, int); extern void end_that_request_last(struct request *, int); I
Re: [PATCH 00/28] blk_end_request: full I/O completion handler (take 3)
On Tue, Dec 04 2007 at 14:16 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Fri, Nov 30 2007, Kiyoshi Ueda wrote: Hello Jens, The following is the updated patch-set for blk_end_request(). Changes since the last version are only minor updates to catch up with the base kernel changes. Do you agree the implementation of blk_end_request()? If there's no problem, could you merge it to your tree? Or does it have to be merged to -mm tree first? Boaz, Could you review the newly added PATCH 27 which converts the bidi part, and give me your comments? It uses blk_end_request_callback() in PATCH 25, which was only for the tricky ide-cd driver. If bidi added a 'resid' member to struct request instead of reusing 'data_len' for the other purpose, it could use the standard blk_end_request() instead. -- Changes from the previous post - Changes between take2 and take3: o Rebased on top of 2.6.24-rc3-mm2 OK, so this means that I can't apply it unfortunately. It depends on other patches in -mm (bidi). SCSI sits on block, so the best approach imho is to base this patchset on mainline so I can include the block bits. I was wishing that the bidi work can go into 2.6.25, I guess that's James to say. If so than it is not important what order they go in. Or the patchset can be submitted in two parts, with scsi and remove-of- old-API in a later stage. Or *rant* Boaz can just rebase his work again *rant*. Kiyoshi, It's OK, if you have a maintainer that is willing to accept your work then go head, My code can wait, no problem. Just resolve the resid issue in scsi_io_completion() (See my other mail) Thanks for doing this Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 27/28] blk_end_request: changing scsi mid-layer for bidi (take 3)
On Thu, Dec 06 2007 at 2:26 +0200, Kiyoshi Ueda [EMAIL PROTECTED] wrote: Hi Boaz, On Tue, 04 Dec 2007 15:39:12 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Sat, Dec 01 2007 at 1:35 +0200, Kiyoshi Ueda [EMAIL PROTECTED] wrote: This patch converts bidi of scsi mid-layer to use blk_end_request(). rq-next_rq represents a pair of bidi requests. (There are no other use of 'next_rq' of struct request.) For both requests in the pair, end_that_request_chunk() should be called before end_that_request_last() is called for one of them. Since the calls to end_that_request_first()/chunk() and end_that_request_last() are packaged into blk_end_request(), the handling of next_rq completion has to be moved into blk_end_request(), too. Bidi sets its specific value to rq-data_len before the request is completed so that upper-layer can read it. This setting must be between end_that_request_chunk() and end_that_request_last(), because rq-data_len may be used in end_that_request_chunk() by blk_trace and so on. To satisfy the requirement, use blk_end_request_callback() which is added in PATCH 25 only for the tricky drivers. If bidi didn't reuse rq-data_len and added new members to request for the specific value, it could set before end_that_request_chunk() and use the standard blk_end_request() like below. void scsi_end_bidi_request(struct scsi_cmnd *cmd) { struct request *req = cmd-request; rq-resid = scsi_out(cmd)-resid; rq-next_rq-resid = scsi_in(cmd)-resid; if (blk_end_request(req, 1, req-data_len)) BUG(); scsi_release_buffers(cmd); scsi_next_command(cmd); } ... snip ... rq-data_len = scsi_out(cmd)-resid is Not Just a problem of bidi it is a General problem of scsi residual handling, and user code. Even today before any bidi. at scsi_lib.c at scsi_io_completion() we do req-data_len = scsi_get_resid(cmd); ( or: req-data_len = cmd-resid; depends which version you look) And then call scsi_end_request() which calls __end_that_request_first/last So it is assumed even today that req-data_len is not touched by __end_that_request_first/last unless __end_that_request_first returned that there is more work to do and the command is resubmitted in which case the resid information is discarded. So if the regular resid handling is acceptable - Set req-data_len before the call to __end_that_request_first/last, or blk_end_request() in your case, then here goes your second client of the _callback and it can be removed. But if it is found that req-data_len is touched and the resid information gets lost, than it should be fixed for the common uni-io case, by - for example - pass resid to the blk_end_request() function. (So in any way the _callback can go) Thank you for the explanation of scsi's rq-data_len usage. I see that scsi usually uses rq-data_len for cmd-resid. I have investigated the possibility of setting data_len before the call to blk_end_request. But no matter whether data_len is touched or not, we need a callback for bidi. So I would like to go with the current patch. I explained the reason and some details below. As far as I can see, rq-data_len is just referenced by blk_add_trace_rq() in __end_that_request_first(), not modified. And I don't change any logic around there in the block-layer. So there shouldn't be any critical problem for scsi residual handing. (although I'm not sure that scsi expectes cmd-resid to be traced by blk_trace.) Anyway, I see that it is no critical problem for bidi to set cmd-resid to rq-data_len before blk_end_request() call. But if I do that, blk_end_request() can't get the next_rq's size to complete in its code below. +/* Bidi request must be completed as a whole */ +if (blk_bidi_rq(rq) +__end_that_request_first(rq-next_rq, uptodate, + blk_rq_bytes(rq-next_rq))) +return 1; So I will have to move next_rq completion to bidi and use _callback() anyway like the following. - static int dummy_cb(struct request *rq) { return 1; } void scsi_end_bidi_request(struct scsi_cmnd *cmd) { struct request *req = cmd-request; unsigned int dlen = req-data_len; unsigned int next_dlen = req-next_rq-data_len; req-data_len = scsi_out(cmd)-resid; req-next_rq-data_len = scsi_in(cmd)-resid; /* Complete only DATA of next_rq using _callback and dummy function */ if (!blk_end_request_callback(req-next_rq, 1, next_dlen, dummy_cb)) BUG(); if (blk_end_request(req, 1, dlen)) BUG(); scsi_release_buffers(cmd); scsi_next_command(cmd); } - I prefer the current patch rather than the code like above, since the code calls
Re: 2.6.24-rc3-mm2
On Wed, Nov 28 2007 at 13:41 +0200, Andrew Morton [EMAIL PROTECTED] wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm2/ - All patches against subsystem trees were recently sent to the relevant maintainers. Many (probably most) were ignored. I don't know why this happens. - First bug report: after ten minutes happily compiling kernels my 2.6.24-rc3-mm2 x86_64 box spontaneously rebooted. - s390 won't build due to a large clash with the driver tree which I didn't fix. - the unprivileged mounts and revoke patchsets were dropped - they were getting in the way of other work and were somewhat out of date. - I won't be paying much attention to feature patches for the rest of the 2.6.24 development cycle. We need to get the existing queue stabilised and I left this too late in 2.6.23. This only affects the patches which are only in -mm: expect ongoing mayhem in the various subsystem trees. snip Boilerplate: Changes since 2.6.24-rc3-mm1: snip git-scsi-misc.patch git-scsi-rc-fixes.patch snip git trees snip All patches: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm2/patch-list from above url bidi-support-scsi_data_buffer-broke-qla1280.patch bidi-support-scsi_data_buffer-broke-qla1280 bidi-support-scsi_data_buffer-broke-lots-of-stuff.patch bidi-support-scsi_data_buffer-broke-lots-of-stuff Andrew hi. Above 2 patches are no longer needed as they are fixed. qla1280 - is in git-scsi-rc-fixes.patch and ppa imm - are in git-scsi-misc.patch on the other hand arm-scsi is broken and its patch is in scsi-pending. I have attached a scsi-pending-arm-convert-to-accessors.patch for your convenience. Thanks Boaz From 8f21118911efce786fc1707644a4e6323521a92f Mon Sep 17 00:00:00 2001 From: Boaz Harrosh [EMAIL PROTECTED] Date: Sun, 9 Sep 2007 21:31:21 +0300 Subject: [PATCH] [SCSI] arm: convert to accessors and !use_sg cleanup - convert to accessors and !use_sg cleanup Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] Cc: Russell King [EMAIL PROTECTED] Signed-off-by: James Bottomley [EMAIL PROTECTED] --- drivers/scsi/arm/acornscsi.c | 14 +++--- drivers/scsi/arm/scsi.h | 34 +++--- 2 files changed, 18 insertions(+), 30 deletions(-) diff --git a/drivers/scsi/arm/acornscsi.c b/drivers/scsi/arm/acornscsi.c index eceacf6..3bedf24 100644 --- a/drivers/scsi/arm/acornscsi.c +++ b/drivers/scsi/arm/acornscsi.c @@ -1790,7 +1790,7 @@ int acornscsi_starttransfer(AS_Host *host) return 0; } -residual = host-SCpnt-request_bufflen - host-scsi.SCp.scsi_xferred; +residual = scsi_bufflen(host-SCpnt) - host-scsi.SCp.scsi_xferred; sbic_arm_write(host-scsi.io_port, SBIC_SYNCHTRANSFER, host-device[host-SCpnt-device-id].sync_xfer); sbic_arm_writenext(host-scsi.io_port, residual 16); @@ -2270,7 +2270,7 @@ intr_ret_t acornscsi_sbicintr(AS_Host *host, int in_irq) case 0x4b: /* - PHASE_STATUSIN */ case 0x8b: /* - PHASE_STATUSIN */ /* DATA IN - STATUS */ - host-scsi.SCp.scsi_xferred = host-SCpnt-request_bufflen - + host-scsi.SCp.scsi_xferred = scsi_bufflen(host-SCpnt) - acornscsi_sbic_xfcount(host); acornscsi_dma_stop(host); acornscsi_readstatusbyte(host); @@ -2281,7 +2281,7 @@ intr_ret_t acornscsi_sbicintr(AS_Host *host, int in_irq) case 0x4e: /* - PHASE_MSGOUT */ case 0x8e: /* - PHASE_MSGOUT */ /* DATA IN - MESSAGE OUT */ - host-scsi.SCp.scsi_xferred = host-SCpnt-request_bufflen - + host-scsi.SCp.scsi_xferred = scsi_bufflen(host-SCpnt) - acornscsi_sbic_xfcount(host); acornscsi_dma_stop(host); acornscsi_sendmessage(host); @@ -2291,7 +2291,7 @@ intr_ret_t acornscsi_sbicintr(AS_Host *host, int in_irq) case 0x4f: /* message in */ case 0x8f: /* message in */ /* DATA IN - MESSAGE IN */ - host-scsi.SCp.scsi_xferred = host-SCpnt-request_bufflen - + host-scsi.SCp.scsi_xferred = scsi_bufflen(host-SCpnt) - acornscsi_sbic_xfcount(host); acornscsi_dma_stop(host); acornscsi_message(host);/* - PHASE_MSGIN, PHASE_DISCONNECT */ @@ -2319,7 +2319,7 @@ intr_ret_t acornscsi_sbicintr(AS_Host *host, int in_irq) case 0x4b: /* - PHASE_STATUSIN */ case 0x8b: /* - PHASE_STATUSIN
Re: [PATCH 27/28] blk_end_request: changing scsi mid-layer for bidi (take 3)
On Thu, Dec 06 2007 at 21:44 +0200, Kiyoshi Ueda [EMAIL PROTECTED] wrote: Hi Boaz, Jens, On Thu, 06 Dec 2007 11:24:44 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: Index: 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c snip No I don't like it. The only client left for blk_end_request_callback() is bidi, ide-cd (cdrom_newpc_intr) is another client. So I can't drop blk_end_request_callback() even if bidi doesn't use it. It looks like all is needed for the ide-cd case is a flag that says don't complete the request. And the call-back is not actually used. (Inspecting the last: [PATCH 26/28] blk_end_request: changing ide-cd (take 3)) The same exact flag could also help the bidi case. Perhaps have an API for that? snip Index: 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c === --- 2.6.24-rc3-mm2.orig/drivers/scsi/scsi_lib.c +++ 2.6.24-rc3-mm2/drivers/scsi/scsi_lib.c @@ -629,28 +629,6 @@ void scsi_run_host_queues(struct Scsi_Ho scsi_run_queue(sdev-request_queue); } -static void scsi_finalize_request(struct scsi_cmnd *cmd, int uptodate) -{ - struct request_queue *q = cmd-device-request_queue; - struct request *req = cmd-request; - unsigned long flags; - - add_disk_randomness(req-rq_disk); - - spin_lock_irqsave(q-queue_lock, flags); - if (blk_rq_tagged(req)) - blk_queue_end_tag(q, req); - - end_that_request_last(req, uptodate); - spin_unlock_irqrestore(q-queue_lock, flags); - - /* - * This will goose the queue request function at the end, so we don't - * need to worry about launching another command. - */ - scsi_next_command(cmd); -} - /* * Function:scsi_end_request() * @@ -930,23 +908,25 @@ EXPORT_SYMBOL(scsi_release_buffers); void scsi_end_bidi_request(struct scsi_cmnd *cmd) { struct request *req = cmd-request; + unsigned int dlen = req-data_len; + unsigned int next_dlen = req-next_rq-data_len; - end_that_request_chunk(req, 1, req-data_len); req-data_len = scsi_out(cmd)-resid; - - end_that_request_chunk(req-next_rq, 1, req-next_rq-data_len); req-next_rq-data_len = scsi_in(cmd)-resid; - scsi_release_buffers(cmd); - /* *FIXME: If ll_rw_blk.c is changed to also put_request(req-next_rq) - * in end_that_request_last() then this WARN_ON must be removed. + * in blk_end_bidi_request() then this WARN_ON must be removed. * for now, upper-driver must have registered an end_io. */ WARN_ON(!req-end_io); - scsi_finalize_request(cmd, 1); + if (blk_end_bidi_request(req, 1, dlen, next_dlen)) + /* req has not been completed */ + BUG(); + + scsi_release_buffers(cmd); + scsi_next_command(cmd); } /* Index: 2.6.24-rc3-mm2/block/ll_rw_blk.c === --- 2.6.24-rc3-mm2.orig/block/ll_rw_blk.c +++ 2.6.24-rc3-mm2/block/ll_rw_blk.c @@ -3792,24 +3792,17 @@ static void complete_request(struct requ if (!list_empty(rq-queuelist)) blkdev_dequeue_request(rq); + if (blk_bidi_rq(rq) !rq-end_io) + __blk_put_request(rq-next_rq-q, rq-next_rq); + end_that_request_last(rq, uptodate); } -/** - * blk_end_request - Helper function for drivers to complete the request. - * @rq: the request being processed - * @uptodate: 1 for success, 0 for I/O error, 0 for specific error - * @nr_bytes: number of bytes to complete - * - * Description: - * Ends I/O on a number of bytes attached to @rq. - * If @rq has leftover, sets it up for the next range of segments. - * - * Return: - * 0 - we are done with this request - * 1 - still buffers pending for this request - **/ -int blk_end_request(struct request *rq, int uptodate, int nr_bytes) +/* + * Internal function + */ +static int blk_end_io(struct request *rq, int uptodate, int nr_bytes, + int bidi_bytes, int (drv_callback)(struct request *)) { struct request_queue *q = rq-q; unsigned long flags = 0UL; @@ -3817,8 +3810,17 @@ int blk_end_request(struct request *rq, if (blk_fs_request(rq) || blk_pc_request(rq)) { if (__end_that_request_first(rq, uptodate, nr_bytes)) return 1; + + /* Bidi request must be completed as a whole */ + if (blk_bidi_rq(rq) + __end_that_request_first(rq-next_rq, uptodate, bidi_bytes)) + return 1; } + /* Special feature for tricky drivers */ + if (drv_callback drv_callback(rq)) + return 1; + add_disk_randomness(rq-rq_disk); spin_lock_irqsave(q-queue_lock, flags); @@ -3827,6 +3829,32 @@ int blk_end_request(struct request *rq, return 0; } + +int blk_end_bidi_request
Re: [PATCH 20/30] blk_end_request: changing xsysace (take 4)
Kiyoshi Ueda wrote: This patch converts xsysace to use blk_end_request interfaces. Related 'uptodate' arguments are converted to 'error'. xsysace is a little bit different from normal drivers. xsysace driver has a state machine in it. It calls end_that_request_first() and end_that_request_last() from different states. (ACE_FSM_STATE_REQ_TRANSFER and ACE_FSM_STATE_REQ_COMPLETE, respectively.) However, those states are consecutive and without any interruption inbetween. So we can just follow the standard conversion rule (b) mentioned in the patch subject [PATCH 01/30] blk_end_request: add new request completion interface. Cc: Grant Likely [EMAIL PROTECTED] Signed-off-by: Kiyoshi Ueda [EMAIL PROTECTED] Signed-off-by: Jun'ichi Nomura [EMAIL PROTECTED] --- drivers/block/xsysace.c |5 + 1 files changed, 1 insertion(+), 4 deletions(-) Index: 2.6.24-rc4/drivers/block/xsysace.c === --- 2.6.24-rc4.orig/drivers/block/xsysace.c +++ 2.6.24-rc4/drivers/block/xsysace.c @@ -703,7 +703,7 @@ static void ace_fsm_dostate(struct ace_d /* bio finished; is there another one? */ i = ace-req-current_nr_sectors; - if (end_that_request_first(ace-req, 1, i)) { + if (__blk_end_request(ace-req, 0, i)) { end_that_request_first() took sectors __blk_end_request() now takes bytes /* dev_dbg(ace-dev, next block; h=%li c=%i\n, * ace-req-hard_nr_sectors, * ace-req-current_nr_sectors); @@ -718,9 +718,6 @@ static void ace_fsm_dostate(struct ace_d break; case ACE_FSM_STATE_REQ_COMPLETE: - /* Complete the block request */ - blkdev_dequeue_request(ace-req); - end_that_request_last(ace-req, 1); ace-req = NULL; /* Finished request; go to idle state */ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3-mm1
On Tue, Dec 11 2007 at 18:33 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Mon, 2007-11-26 at 22:15 -0800, Andrew Morton wrote: OK, thanks. I'll assume that James and Hannes have this in hand (or will have, by mid-week) and I won't do anything here. Just to confirm what I think I'm going to be doing: rebasing the scsi-misc tree to remove this commit: commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 Author: Hannes Reinecke [EMAIL PROTECTED] Date: Tue Nov 6 09:23:40 2007 +0100 [SCSI] Do not requeue requests if REQ_FAILFAST is set And its allied fix ups: commit 983289045faa96fba8841d3c51b98bb8623d9504 Author: James Bottomley [EMAIL PROTECTED] Date: Sat Nov 24 19:47:25 2007 +0200 [SCSI] fix up REQ_FASTFAIL not to fail when state is QUIESCE commit 9dd15a13b332e9f5c8ee752b1ccd9b84cb5bdf17 Author: James Bottomley [EMAIL PROTECTED] Date: Sat Nov 24 19:55:53 2007 +0200 [SCSI] fix domain validation to work again James The problems caused by this patch where nagging me at the back of my head from the begging. Why should we fail on a check of FAIL_FAST in all kind of weird places like boots, when the only place that should ever set the flag should be one of the multi-path drivers. finally it struck me: It might be a bug in ll_rw_blk at blk_rq_bio_prep() there is this: static void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio) { /* first two bits are identical in rq-cmd_flags and bio-bi_rw */ rq-cmd_flags |= (bio-bi_rw 3); ... Now this is no longer true and is a bug. Second bit of bio-bi_rw defined in bio.h is: #define BIO_RW_AHEAD1 but Second bit of rq-cmd_flags is __REQ_FAILFAST so maybe we are getting FAILFAST in the wrong places? (I will look for an old patch I sent a year ago that fixes this bug) Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] REQ-flags to/from BIO-flags bugfix
- BIO flags bio-bi_rw and REQ flags req-cmd_flags no longer match. Remove comments and do a proper translation between the 2 systems. (Please look in ll_rw_blk.c/blk_rq_bio_prep() below if we need more flags) Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- block/ll_rw_blk.c| 23 +-- include/linux/blktrace_api.h |8 +++- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index 8b91994..c6a84bb 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -1990,10 +1990,6 @@ blk_alloc_request(struct request_queue *q, int rw, int priv, gfp_t gfp_mask) if (!rq) return NULL; - /* -* first three bits are identical in rq-cmd_flags and bio-bi_rw, -* see bio.h and blkdev.h -*/ rq-cmd_flags = rw | REQ_ALLOCED; if (priv) { @@ -3772,8 +3768,23 @@ EXPORT_SYMBOL(end_request); static void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio) { - /* first two bits are identical in rq-cmd_flags and bio-bi_rw */ - rq-cmd_flags |= (bio-bi_rw 3); + if (bio_data_dir(bio)) + rq-cmd_flags |= REQ_RW; + else + rq-cmd_flags = ~REQ_RW; + + if (bio-bi_rw (1BIO_RW_SYNC)) + rq-cmd_flags |= REQ_RW_SYNC; + else + rq-cmd_flags = ~REQ_RW_SYNC; + /* FIXME: what about other flags, should we sync these too? */ + /* + BIO_RW_AHEAD== ?? + BIO_RW_BARRIER == REQ_SOFTBARRIER/REQ_HARDBARRIER + BIO_RW_FAILFAST == REQ_FAILFAST + BIO_RW_SYNC == REQ_RW_SYNC + BIO_RW_META == REQ_RW_META + */ rq-nr_phys_segments = bio_phys_segments(q, bio); rq-nr_hw_segments = bio_hw_segments(q, bio); diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 7e11d23..9e7ce65 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -165,7 +165,13 @@ static inline void blk_add_trace_rq(struct request_queue *q, struct request *rq, u32 what) { struct blk_trace *bt = q-blk_trace; - int rw = rq-cmd_flags 0x03; + /* blktrace.c prints them according to bio flags */ + int rw = (((rq_rw_dir(rq) == WRITE) BIO_RW) | + (((rq-cmd_flags (REQ_SOFTBARRIER|REQ_HARDBARRIER)) != 0) + BIO_RW_BARRIER) | + (((rq-cmd_flags REQ_FAILFAST) != 0) BIO_RW_FAILFAST) | + (((rq-cmd_flags REQ_RW_SYNC) != 0) BIO_RW_SYNC) | + (((rq-cmd_flags REQ_RW_META) != 0) BIO_RW_META)); if (likely(!bt)) return; -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] REQ-flags to/from BIO-flags bugfix
On Wed, Dec 12 2007 at 17:18 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote: On Wed, Dec 12, 2007 at 01:03:10PM +0200, Boaz Harrosh wrote: - BIO flags bio-bi_rw and REQ flags req-cmd_flags no longer match. Remove comments and do a proper translation between the 2 systems. I'd rather see them resynchronised ... in a way that makes it obvious that they should be desynced again: I don't know whether BIO_RW_BARRIER is __REQ_SOFTBARRIER or __REQ_HARDBARRIER, so I didn't include that in this patch. There also doesn't seem to be a __REQ equivalent to BIO_RW_AHEAD, but we can do the other four bits (and leave gaps for those two). diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d18ee67..6aef34b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -167,11 +167,13 @@ enum { }; /* - * request type modified bits. first three bits match BIO_RW* bits, important + * request type modified bits. Don't change without looking at bi_rw flags */ enum rq_flag_bits { - __REQ_RW, /* not set, read. set, write */ - __REQ_FAILFAST, /* no low level driver retries */ + __REQ_RW = BIO_RW, /* not set, read. set, write */ + __REQ_FAILFAST = BIO_RW_FAILFAST, /* no low level driver retries */ + __REQ_RW_SYNC = BIO_RW_SYNC,/* request is sync (O_DIRECT) */ + __REQ_RW_META = BIO_RW_META,/* metadata io request */ __REQ_SORTED, /* elevator knows about this request */ __REQ_SOFTBARRIER, /* may not be passed by ioscheduler */ __REQ_HARDBARRIER, /* may not be passed by drive either */ @@ -185,9 +187,7 @@ enum rq_flag_bits { __REQ_QUIET,/* don't worry about errors */ __REQ_PREEMPT, /* set for ide_preempt requests */ __REQ_ORDERED_COLOR,/* is before or after barrier */ - __REQ_RW_SYNC, /* request is sync (O_DIRECT) */ __REQ_ALLOCED, /* request came from our alloc pool */ - __REQ_RW_META, /* metadata io request */ __REQ_NR_BITS, /* stops here */ }; Thats not enough You still need to fix code in ll_rw_blk(), I would define a rq_flags_bio_match_mask = 0xf for that. (and also add what Jens called needed with the BIO_RW_AHEAD selects REQ_FAILFAST.) And I still don't understand why, for example, Domain Validation fails with the original patch. What sets BIO_RW_FAILFAST and than panics on Errors? (All I see is this flag set in dm/multipath.c dm-mpath.c) Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 20/30] blk_end_request: changing xsysace (take 4)
On Wed, Dec 12 2007 at 19:06 +0200, Kiyoshi Ueda [EMAIL PROTECTED] wrote: Hi, On Wed, 12 Dec 2007 11:09:12 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: Index: 2.6.24-rc4/drivers/block/xsysace.c === --- 2.6.24-rc4.orig/drivers/block/xsysace.c +++ 2.6.24-rc4/drivers/block/xsysace.c @@ -703,7 +703,7 @@ static void ace_fsm_dostate(struct ace_d /* bio finished; is there another one? */ i = ace-req-current_nr_sectors; - if (end_that_request_first(ace-req, 1, i)) { + if (__blk_end_request(ace-req, 0, i)) { end_that_request_first() took sectors __blk_end_request() now takes bytes Thank you for pointing it out! And I'm very sorry for the bug. I have checked all conversions between sectors and bytes through all patches again, and I found no other miss conversions. Below is the revised patch for xsysace. Thanks, Kiyoshi Ueda NP, I know how it is, after you rebased your work for the who's counting times, you become blind. You need fresh eyes to look at it. Thanks for doing all this. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] qgit-2.1 and qgit-1.5.8
On Mon, Dec 31 2007 at 13:58 +0200, Marco Costalba [EMAIL PROTECTED] wrote: On Dec 31, 2007 11:59 AM, Filippo Zangheri [EMAIL PROTECTED] wrote: Hi, I git-cloned qgit-2.1 from your repository, then ran `qmake qgit.pro`, but `make` gave me errors. Yes, you need qmake of Qt4 not the Qt3 one. snip A Theoretical question. Is it possible to compile Qt4 app all statically linked and run it on a Qt3 based KDE machine. Some thing like the windows installation, where every thing is self-contained? Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] qgit-2.1 and qgit-1.5.8
On Mon, Dec 31 2007 at 20:07 +0200, Marco Costalba [EMAIL PROTECTED] wrote: On Dec 31, 2007 6:47 PM, Boaz Harrosh [EMAIL PROTECTED] wrote: On Mon, Dec 31 2007 at 13:58 +0200, Marco Costalba [EMAIL PROTECTED] wrote: On Dec 31, 2007 11:59 AM, Filippo Zangheri [EMAIL PROTECTED] wrote: Hi, I git-cloned qgit-2.1 from your repository, then ran `qmake qgit.pro`, but `make` gave me errors. Yes, you need qmake of Qt4 not the Qt3 one. snip A Theoretical question. Is it possible to compile Qt4 app all statically linked and run it on a Qt3 based KDE machine. Some thing like the windows installation, where every thing is self-contained? It's also very practical...I have _only_ KDE 3 installed, not KDE 4 ;-) I have both Qt4 and Qt3 development (shared) libraries installed and there is absolutely no compatibility problem, the only thing you have to remember is when running qmake qgit.pro the first time, you need to be sure is the Qt4 qmake, not the Qt3. Because I have Qt3 qmake in path, not the Qt4 one, I need to explicitly give the whole path the first time I configure the sources, something like /usr/lib/qt4/bin/qmake qgit.pro Then no other settings are needed, when you call make, the Makefiles are already built by qmake to search for the correct libraries. Hope this helps... Marco Thanks because of your help I was brave enough to install qt4 and compile qgit. It works. We use it a lot here. When the guys make a mess, and you need to figure what happened than qgit is your only friend. I intend to hack some extra stuff that we need often. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
WARNING: do not add new typedefs - is that for real?
I have this code: c_code /* * osd-r10 4.12.5 Data-In and Data-Out buffer offsets * byte offset = mantissa * (2^(exponent+8)) */ typedef __be32 osd_cdb_offset; osd_cdb_offset __osd_encode_offset(u64 offset, unsigned *padding, int min_shift, int max_shift); struct osd_attributes_list_mode { __be32 get_attr_desc_bytes; osd_cdb_offset get_attr_desc_offset; __be32 get_attr_alloc_length; osd_cdb_offset get_attr_offset; __be32 set_attr_bytes; osd_cdb_offset set_attr_offset; __be32 not_used; }; /c_code the osd_cdb_offset above is this special OSD-standard floating-point-like special type. It is of size 32 bit in special network order. What should I do then: __be32 __osd_encode_offset(u64 offset, unsigned *padding, int min_shift, int max_shift); But it is not a __be32. It is this special floating-point-like thingy!!!? How was __be32 defined with a #define???!! Come on guys, it is not checkpatch.pl place to complain about good language constructs that can be misused. This is the maintainers and reviewers job to say that a: typedef struct foo Foo; is bad practice and we don't like it, but it can not be left to a script. typedefs should be used where they should be used. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: WARNING: do not add new typedefs - is that for real?
On Wed, Jan 02 2008 at 12:08 +0200, Christoph Hellwig [EMAIL PROTECTED] wrote: On Tue, Jan 01, 2008 at 06:15:46PM +0200, Boaz Harrosh wrote: I have this code: c_code /* * osd-r10 4.12.5 Data-In and Data-Out buffer offsets * byte offset = mantissa * (2^(exponent+8)) */ typedef __be32 osd_cdb_offset; Given that you can't do normal arithmetic on this type it should't really be a __be32 but it's own __bitwise type with proper accessors. There are all the proper accessors, and arithmetic is certainly not possible. But yes, this is one of the rare cases where a typedef makes sense, but �'d call it osd_off_t or something like that. You mean osd_cdb_offset_t. I thought of dropping that _t, I hate it, just a personal preference. Point taken about the typedef + __bitwise. Because with __bitwise we tell the compiler that the new type is assembly equivalent to some type but otherwise un-mixable and unique type. Checkpatch actually allows it. Code fixed! Thanks Christoph. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/3] CONFIG_HIGHPTE vs. sub-page page tables.
On Thu, Jan 03 2008 at 15:12 +0200, Andi Kleen [EMAIL PROTECTED] wrote: Can we please just nuke CONFIG_HIGHPTE? There's only been a small amount of 32bit machines It's unfortunately a larger amount :/ And for unknown reasons a lot of people still install 32bit kernels on new perfectly capable 64bit systems even if they have a lot of memory. Yes I've seen that too many times. When I comment about it, people say: What this 'core 2' supports AMD64 ??!! I think this is the distros fault. 1. Most call it AMD64 and not, i don't know what it should be called. 2. They put the first and default installation as i386 and not the x64, the later is only farther down or on a different link. 3. My Fedora x86_64 installation has a full blown 32bit installation inside. In fact, every time I yum install this or that, I get installed by default both the 64bit and 32bit libraries. I don't even know how to turn the 32bit off. This means that Distros can supply one x86 installation for both flavors and decide at setup time what to install. By this, follow the Kernel in observing that it is the same-ARCH same-INSTALL. 4. Some binary modules like Flash, media-codecs acrobat-reader etc... think (That snow ball effect here) that 32bit is much more common and only provide that. So people think that for best compatibility they should stick with 32bit. But this is not true. They all work perfectly here in 64bit land. Distros should both confirm on that publicly. And also make extensive tests for 32bit compatibility on 64bit machines, even for these binary only bad guys. 5. Distros should jump on the 64bit bang-wagon, to stand it-self apart from Window 32bit-ness. It has bin proven more than once that a Huge Linux 64bit machine, with lots of CPUs and memory, can be the best Windows 32bit performer in existence. Both under KVM/VMWARE or Wine. 64bit Vista can never get close, not to speak of the none-existent 64bit XP. Just my $0.02 Boaz I don't think removing CONFIG_HIGHPTE will be an option any time soon. -Andi -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] scsi/seagate.c: remove dead code
On Sun, Oct 28 2007 at 17:51 +0200, Adrian Bunk [EMAIL PROTECTED] wrote: This patch removes obviously dead code. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- drivers/scsi/seagate.c |2 -- 1 file changed, 2 deletions(-) Actually I have sent a patch to completely remove this driver. http://www.spinics.net/lists/linux-scsi/msg20308.html Does any one objects? Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] SCSI/gdth: kill unneeded 'irq' argument
On Fri, Oct 26 2007 at 11:40 +0200, Jeff Garzik [EMAIL PROTECTED] wrote: Neither gdth_get_status() nor __gdth_interrupt() need their 'irq' argument, so remove it. Signed-off-by: Jeff Garzik [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 21 + 1 files changed, 9 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index b253b8c..65c21b1 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -141,7 +141,7 @@ static void gdth_delay(int milliseconds); static void gdth_eval_mapping(ulong32 size, ulong32 *cyls, int *heads, int *secs); static irqreturn_t gdth_interrupt(int irq, void *dev_id); -static irqreturn_t __gdth_interrupt(gdth_ha_str *ha, int irq, +static irqreturn_t __gdth_interrupt(gdth_ha_str *ha, int gdth_from_wait, int* pIndex); static int gdth_sync_event(gdth_ha_str *ha, int service, unchar index, Scsi_Cmnd *scp); @@ -165,7 +165,6 @@ static int gdth_internal_cache_cmd(gdth_ha_str *ha, Scsi_Cmnd *scp); static int gdth_fill_cache_cmd(gdth_ha_str *ha, Scsi_Cmnd *scp, ushort hdrive); static void gdth_enable_int(gdth_ha_str *ha); -static unchar gdth_get_status(gdth_ha_str *ha, int irq); static int gdth_test_busy(gdth_ha_str *ha); static int gdth_get_cmd_index(gdth_ha_str *ha); static void gdth_release_event(gdth_ha_str *ha); @@ -1334,14 +1333,12 @@ static void __init gdth_enable_int(gdth_ha_str *ha) } /* return IStatus if interrupt was from this card else 0 */ -static unchar gdth_get_status(gdth_ha_str *ha, int irq) +static unchar gdth_get_status(gdth_ha_str *ha) { unchar IStatus = 0; -TRACE((gdth_get_status() irq %d ctr_count %d\n, irq, gdth_ctr_count)); +TRACE((gdth_get_status() irq %d ctr_count %d\n, ha-irq, gdth_ctr_count)); -if (ha-irq != (unchar)irq) /* check IRQ */ -return false; if (ha-type == GDT_EISA) IStatus = inb((ushort)ha-bmic + EDOORREG); else if (ha-type == GDT_ISA) @@ -1523,7 +1520,7 @@ static int gdth_wait(gdth_ha_str *ha, int index, ulong32 time) return 1; /* no wait required */ do { -__gdth_interrupt(ha, (int)ha-irq, true, wait_index); +__gdth_interrupt(ha, true, wait_index); if (wait_index == index) { answer_found = TRUE; break; @@ -3036,7 +3033,7 @@ static void gdth_clear_events(void) /* SCSI interface functions */ -static irqreturn_t __gdth_interrupt(gdth_ha_str *ha, int irq, +static irqreturn_t __gdth_interrupt(gdth_ha_str *ha, int gdth_from_wait, int* pIndex) { gdt6m_dpram_str __iomem *dp6m_ptr = NULL; @@ -3054,7 +3051,7 @@ static irqreturn_t __gdth_interrupt(gdth_ha_str *ha, int irq, int act_int_coal = 0; #endif -TRACE((gdth_interrupt() IRQ %d\n,irq)); +TRACE((gdth_interrupt() IRQ %d\n, ha-irq)); /* if polling and not from gdth_wait() - return */ if (gdth_polling) { @@ -3067,7 +3064,7 @@ static irqreturn_t __gdth_interrupt(gdth_ha_str *ha, int irq, spin_lock_irqsave(ha-smp_lock, flags); /* search controller */ -if (0 == (IStatus = gdth_get_status(ha, irq))) { +if (0 == (IStatus = gdth_get_status(ha))) { /* spurious interrupt */ if (!gdth_polling) spin_unlock_irqrestore(ha-smp_lock, flags); @@ -3294,9 +3291,9 @@ static irqreturn_t __gdth_interrupt(gdth_ha_str *ha, int irq, static irqreturn_t gdth_interrupt(int irq, void *dev_id) { - gdth_ha_str *ha = (gdth_ha_str *)dev_id; + gdth_ha_str *ha = dev_id; - return __gdth_interrupt(ha, irq, false, NULL); + return __gdth_interrupt(ha, false, NULL); } static int gdth_sync_event(gdth_ha_str *ha, int service, unchar index, ACK I've done the last heavy lifting of gdth_interrupt. And I second this patch. the irq was just redundant information to the ha pointer. Good riddance. Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] [CRYPTO] tcrypt: Move sg_init_table out of timing loops
On Mon, Oct 29 2007 at 22:16 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Fri, Oct 26 2007, Herbert Xu wrote: [CRYPTO] tcrypt: Move sg_init_table out of timing loops This patch moves the sg_init_table out of the timing loops for hash algorithms so that it doesn't impact on the speed test results. Wouldn't it be better to just make sg_init_one() call sg_init_table? diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 4571231..ccc55a6 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -202,28 +202,6 @@ static inline void __sg_mark_end(struct scatterlist *sg) } /** - * sg_init_one - Initialize a single entry sg list - * @sg: SG entry - * @buf: Virtual address for IO - * @buflen: IO length - * - * Notes: - * This should not be used on a single entry that is part of a larger - * table. Use sg_init_table() for that. - * - **/ -static inline void sg_init_one(struct scatterlist *sg, const void *buf, -unsigned int buflen) -{ - memset(sg, 0, sizeof(*sg)); -#ifdef CONFIG_DEBUG_SG - sg-sg_magic = SG_MAGIC; -#endif - sg_mark_end(sg, 1); - sg_set_buf(sg, buf, buflen); -} - -/** * sg_init_table - Initialize SG table * @sgl:The SG table * @nents: Number of entries in table @@ -247,6 +225,24 @@ static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents) } /** + * sg_init_one - Initialize a single entry sg list + * @sg: SG entry + * @buf: Virtual address for IO + * @buflen: IO length + * + * Notes: + * This should not be used on a single entry that is part of a larger + * table. Use sg_init_table() for that. + * + **/ +static inline void sg_init_one(struct scatterlist *sg, const void *buf, +unsigned int buflen) +{ + sg_init_table(sg, 1); + sg_set_buf(sg, buf, buflen); +} + +/** * sg_phys - Return physical address of an sg entry * @sg: SG entry * Yes please submit this patch. scsi-ml is full of sg_init_one, specially on the error recovery path. Thanks Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] killing sg_last(), and discussion
On Wed, Oct 31 2007 at 10:49 +0200, Jeff Garzik [EMAIL PROTECTED] wrote: I looked into killing sg_last(), but really, this is the best its gonna get (moving sg_last to libata-core.c). You could maybe kill one use with caching, but in the other sg_last() callsites there isn't another s/g loop we can stick a last_sg = sg; into. libata is stuck because we undertake the highly unusual operation of fiddling with the final S/G element, to enforce 32-bit alignment. Of course we could eliminate all that nasty fiddling/padding completely, including sg_last(), if other areas of the kernel would guarantee ahead of time that buffer lengths are always a multiple of 4 Jeff OK Now I'm confused. I thought that ULD's can give you SG's that are actually longer than bufflen and that, at the end, the bufflen should govern the transfer length. Now FS_PC commands are sector aligned so you do not have problems with that. The BLOCK_PC commands have 2 main sources that I know of one is sg bsg from user mode that can easily enforce 4 bytes alignment. The second is kernel services which 80% of these are done by scsi_execute(). All These can be found and fixed. Starting with scsi_execute(). Another place can be blk_rq_map_sg(), since all IO's are bio based. It can enforce alignment too. I would start by sticking a WARN_ON(qc-pad_len) and see if it triggers, what are the sources of that. Please note that the code already has a 4 bytes alignment assumption about the start of the transfer, other wise the first SG can also have a none aligned length, which is not checked for. Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] killing sg_last(), and discussion
On Wed, Oct 31 2007 at 12:29 +0200, Jeff Garzik [EMAIL PROTECTED] wrote: Boaz Harrosh wrote: On Wed, Oct 31 2007 at 10:49 +0200, Jeff Garzik [EMAIL PROTECTED] wrote: I looked into killing sg_last(), but really, this is the best its gonna get (moving sg_last to libata-core.c). You could maybe kill one use with caching, but in the other sg_last() callsites there isn't another s/g loop we can stick a last_sg = sg; into. libata is stuck because we undertake the highly unusual operation of fiddling with the final S/G element, to enforce 32-bit alignment. Of course we could eliminate all that nasty fiddling/padding completely, including sg_last(), if other areas of the kernel would guarantee ahead of time that buffer lengths are always a multiple of 4 Jeff OK Now I'm confused. I thought that ULD's can give you SG's that are actually longer than bufflen and that, at the end, the bufflen should govern the transfer length. Now FS_PC commands are sector aligned so you do not have problems with that. The BLOCK_PC commands have 2 main sources that I know of one is sg bsg from user mode that can easily enforce 4 bytes alignment. The second is kernel services which 80% of these are done by scsi_execute(). All These can be found and fixed. Starting with scsi_execute(). Another place can be blk_rq_map_sg(), since all IO's are bio based. It can enforce alignment too. I would start by sticking a WARN_ON(qc-pad_len) and see if it triggers, what are the sources of that. The whole qc-pad_len etc. machinery was added because it solved problems in the field with ATAPI devices. So sr or some userland application is sending lengths that are not padded to 32-bit boundary, probably because plenty of trivial commands can send or return odd amounts of data. This used to be irrelevant, but now with SATA, even PIO data xfer (normally what is used for non-READ/WRITE CDBs) must be 32-bit aligned because both SATA DMA and SATA PIO are converted into dword-based SATA FIS's on the wire. Jeff 2 things 1. Than why not fix blk_rq_map_sg() to enforce the alignment. Also I bet that these problems in the field are from pre 2.6.18 kernels, and this is no longer the case. Why not put that WARN_ON(qc-pad_len) and prove me wrong. 2. Just checking bufflen is enough. Since you are already assuming that first SG's offset is aligned, than if last SG's length is odd than so is bufflen. (You are already assuming that SG's total length matches bufflen) Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Use relative symbolic links for bzImge in x86 ARCHs
OK One more problem with a patch this time --- From: Benny Halevy [EMAIL PROTECTED] use relative paths in the symlink to bzImage to make it NFS export safe. Signed-off-by: Benny Halevy [EMAIL PROTECTED] diff --git a/arch/i386/Makefile b/arch/i386/Makefile index f036d2d..396146e 100644 --- a/arch/i386/Makefile +++ b/arch/i386/Makefile @@ -132,7 +132,7 @@ zImage zlilo zdisk: KBUILD_IMAGE := arch/x86/boot/zImage zImage bzImage: vmlinux $(Q)mkdir -p $(objtree)/arch/i386/boot - $(Q)ln -fsn $(objtree)/arch/x86/boot/bzImage $(objtree)/arch/i386/boot/bzImage + $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/i386/boot/bzImage $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE) compressed: zImage diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile index 9daa32d..8048f4f 100644 --- a/arch/x86_64/Makefile +++ b/arch/x86_64/Makefile @@ -98,7 +98,7 @@ KBUILD_IMAGE := $(BOOTIMAGE) bzImage: vmlinux $(Q)mkdir -p $(objtree)/arch/x86_64/boot - $(Q)ln -fsn $(objtree)/arch/x86/boot/bzImage $(objtree)/arch/x86_64/boot/bzImage + $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/x86_64/boot/bzImage $(Q)$(MAKE) $(build)=$(boot) $(BOOTIMAGE) bzlilo: vmlinux - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Use relative symbolic links for bzImge in x86 ARCHs
On Thu, Oct 18 2007 at 17:54 +0200, Sam Ravnborg [EMAIL PROTECTED] wrote: Hi Benny. use relative paths in the symlink to bzImage to make it NFS export safe. Signed-off-by: Benny Halevy [EMAIL PROTECTED] Your patch looks perfect - but you are a few hours too late. See: http://git.kernel.org/?p=linux/kernel/git/tglx/linux-2.6-x86.git;a=commit;h=a0075a509bd955ff6fc6e071efabb926e14bf19c This is in the mm branch awaiting some more testing before Thomas push it to Linus. Sam NP, as long as it's fixed I'm happy Tested-by: Boaz Harrosh [EMAIL PROTECTED] Thanks - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Query about Bio submission / Direct IO
On Fri, Oct 19 2007 at 8:17 +0200, Shreyansh Jain [EMAIL PROTECTED] wrote: Hi List, I have tried this on kernelnewbies http://article.gmane.org/gmane.linux.kernel.kernelnewbies/23166 but it seems I am not good at explaining well. trying here again. Question: Is there a limitation (some kind of caveat) when direct IO pages are added to a bio (multiple pages per bio) and if the total length or offset of some of the vectors is not aligned (PAGE_SIZE or sector size)? My scenario: 1. I have a filesystem in which I create bio for submission to a SCSI disk connected in a SAN via a FC switch. 2. This is a direct IO request where in multiple pages are being added in a bio (bio_map_user). Also, as the user buffer I get is unaligned, I am changing bio's first vector offset (as well as len, bi_size accordingly). What I observe is: 1. IO is performed perfectly (as expected) in case there are no SCSI failure. end_bio gets called as expected and the user app is happy. 2. In case I disconnect the SCSI device from switch, my end_bio function keeps on getting sub-completion bio (bi_size 0 and uptodate cleared, error set to -5). 3. For capturing and ignoring sub-completion returns from block layer, using other kernel end_bio implementation as example I had a code like if(bio-bi_size 0) return 1; as the starting lines of the end_bio routine. Thus, my user space goes off to disk sleep and kernel log keeps on spweing messages like ... Oct 17 21:56:07 mgs02 kernel: end_request: I/O error, dev sdl, sector 33639 Oct 17 21:56:07 mgs02 kernel: sd 2:0:0:8: SCSI error: return code = 0x1 Oct 17 21:56:07 mgs02 kernel: end_request: I/O error, dev sdl, sector 33639 Oct 17 21:56:07 mgs02 kernel: sd 2:0:0:8: SCSI error: return code = 0x1 Oct 17 21:56:07 mgs02 kernel: end_request: I/O error, dev sdl, sector 33639 ... till i connect the switch back - even if that is one hour after disconnecting it. Best part - it works amazingly well in case I keep the offset of the first page aligned (somehow, at expense of corrupted data copy) - failure of SCSI disk (end_bio returns, bi_size=0, my code captures error) or no failure. I have tried this on 2.6.5 and 2.6.16.27 (I agree they are old, but that is what I have to work on :( ). It is a qlogic SCSI HBA. I have created the bio just like other kernel subsystems do. Only thing I am not sure is whether there is something special that needs to be done in case of direct IO which I am missing. biodoc.txt states that: Note: Right now the only user of bios with more than one page is ll_rw_kio, which in turn means that only raw I/O uses it (direct i/o may not work right now). Any idea why it doesn't work (though I am expecting that this doc is 2.6 starting era and things might have changed a lot now). Any help would be highly appreciated. Apologies if my channel of approach is wrong and if this post is huge (couldn't keep it shorter than this). Shreyansh [PS: Is this situation something to do with request flags REQ_PC / REQ_BLOCK_PC?? While searching code, I found these somewhere connected to direct IO (bio_add_pc_page) and could find more about them] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ There was a related fix to a similar problem by Mike Christie, in resent Kernels, If there is no chance for you working on newer kernels, than you will have to back port it. it is: commit bd441deaf341c524b28fd72831ebf6fef88f1c41 Author: Mike Christie [EMAIL PROTECTED] Date: Tue Mar 13 12:52:29 2007 -0500 [SCSI] fix write buffer length in scsi_req_map_sg() look it up the git trees on kernel.org Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/10] Change table chaining layout
On Mon, Oct 22 2007 at 23:47 +0200, Linus Torvalds [EMAIL PROTECTED] wrote: On Mon, 22 Oct 2007, Alan Cox wrote: For structures, not array elements or stack objects. Does gcc now get aligned correct as an attribute on a stack object ? I think m68k stack layout still guarantees 4-byte-alignment, no? Still doesn't answer the rather more important question - why not just stick a NULL on the end instead of all the nutty hacks ? You still do need one bit for the discontiguous case, so it's not like you can avoid the hacks anyway (unless you just blow up the structure entirely) and make it a separate member). So once you have that bit+pointer, using a separate NULL entry isn't exactly prettier. Especially as we actally want to see the difference between end-of-allocation and not yet filled in, so you shouldn't use NULL anyway, you should probably use something like all-ones. Linus - Every one is so hysterical about this sg-chaining problem. And massive patches produced, that when a simple none intrusive solution is proposed it is totally ignored because every one thinks, I can not be that stupid. Well Einstein said: Simplicity is the ultimate sophistication. So no one need to feel bad. I'm talking about Benney's Proposition of a while back. (I'm including it below, cause I can't bother with the stupid Archives broken search) What Benny was proposing is that the scatterlist pointer might not have the complete information about sizes and allocations, but the surrounding code always has, So why not just pass this information to the decision maker - sg_next() - so it can make the right choice, before a suicide. So sg_next() becomes: static inline struct scatterlist *sg_next(struct scatterlist *sg, int next, int nr) { return next nr ? sg_next_unsafe(sg) : NULL; } Where sg_next_unsafe(sg) is what the original sg_next() used to be. and a user code like for_each_sg() becomes: /* * Loop over each sg element, following the pointer to a new list if necessary */ #define for_each_sg(sglist, sg, nr, __i) \ for (__i = 0, sg = (sglist); sg; sg = sg_next(sg, ++__i, nr)) In his patch he shows examples of other uses of sg_next they all fit. The sg_next usage is new and in few places. Not like the sg-page all over the kernel. I know he has a patch for the complete kernel, (I know I helped a bit) and it is a fraction of the size of all the patches that where submitted after that. And it does not have all the problems that we still have now with slob allocators, stack, and such. OK Just my $0.2 Boaz Harrosh - diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 2dc7464..3a27e03 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -30,7 +30,7 @@ static inline void sg_init_one(struct scatterlist *sg, const void *buf, ((struct scatterlist *) ((unsigned long) (sg)-page ~0x01)) /** - * sg_next - return the next scatterlist entry in a list + * sg_next_unsafe - return the next scatterlist entry in a list * @sg:The current sg entry * * Usually the next entry will be @sg@ + 1, but if this sg element is part @@ -41,7 +41,7 @@ static inline void sg_init_one(struct scatterlist *sg, const void *buf, * the current entry, this function will NOT return NULL for an end-of-list. * */ -static inline struct scatterlist *sg_next(struct scatterlist *sg) +static inline struct scatterlist *sg_next_unsafe(struct scatterlist *sg) { sg++; @@ -51,11 +51,27 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg) return sg; } +/** + * sg_next - return the next scatterlist entry in a list + * @sg:The current sg entry + * @next: Index of next sg entry + * @nr:Number of sg entries in the list + * + * Note that the caller must ensure that there are further entries after + * the current entry, this function will NOT return NULL for an end-of-list. + * + */ +static inline struct scatterlist *sg_next(struct scatterlist *sg, + int next, int nr) +{ + return next nr ? sg_next_unsafe(sg) : NULL; +} + /* * Loop over each sg element, following the pointer to a new list if necessary */ #define for_each_sg(sglist, sg, nr, __i) \ - for (__i = 0, sg = (sglist); __i (nr); __i++, sg = sg_next(sg)) + for (__i = 0, sg = (sglist); sg; sg = sg_next(sg, ++__i, nr)) /** * sg_last - return the last scatterlist entry in a list diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 7238b2d..57cc1dd 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1165,7 +1165,7 @@ sg_vma_nopage(struct vm_area_struct *vma, unsigned long addr, int *type) sg = rsv_schp-buffer; sa = vma-vm_start; for (k = 0; (k rsv_schp-k_use_sg) (sa vma-vm_end); -++k, sg = sg_next(sg
Re: [PATCH 09/10] Change table chaining layout
On Tue, Oct 23 2007 at 11:41 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Tue, Oct 23 2007, Boaz Harrosh wrote: On Mon, Oct 22 2007 at 23:47 +0200, Linus Torvalds [EMAIL PROTECTED] wrote: On Mon, 22 Oct 2007, Alan Cox wrote: For structures, not array elements or stack objects. Does gcc now get aligned correct as an attribute on a stack object ? I think m68k stack layout still guarantees 4-byte-alignment, no? Still doesn't answer the rather more important question - why not just stick a NULL on the end instead of all the nutty hacks ? You still do need one bit for the discontiguous case, so it's not like you can avoid the hacks anyway (unless you just blow up the structure entirely) and make it a separate member). So once you have that bit+pointer, using a separate NULL entry isn't exactly prettier. Especially as we actally want to see the difference between end-of-allocation and not yet filled in, so you shouldn't use NULL anyway, you should probably use something like all-ones. Linus - Every one is so hysterical about this sg-chaining problem. And massive patches produced, that when a simple none intrusive solution is proposed it is totally ignored because every one thinks, I can not be that stupid. Well Einstein said: Simplicity is the ultimate sophistication. So no one need to feel bad. It's all about the end goal - having maintainable and resilient code. And I think the sg code will be better once we get past the next day or so, and it'll be more robust. That is what matters to me, not the simplicity of the patch itself. But that is exactly what his patch is. Much more robust. Because you do not relay on sglist content but on outside information, that you already have. Have you had an hard look at his solution? It just simply falls into place. Please try it out for yourself. I did, and it works. Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/10] Change table chaining layout
On Tue, Oct 23 2007 at 11:55 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Tue, Oct 23 2007, Boaz Harrosh wrote: On Tue, Oct 23 2007 at 11:41 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Tue, Oct 23 2007, Boaz Harrosh wrote: On Mon, Oct 22 2007 at 23:47 +0200, Linus Torvalds [EMAIL PROTECTED] wrote: On Mon, 22 Oct 2007, Alan Cox wrote: For structures, not array elements or stack objects. Does gcc now get aligned correct as an attribute on a stack object ? I think m68k stack layout still guarantees 4-byte-alignment, no? Still doesn't answer the rather more important question - why not just stick a NULL on the end instead of all the nutty hacks ? You still do need one bit for the discontiguous case, so it's not like you can avoid the hacks anyway (unless you just blow up the structure entirely) and make it a separate member). So once you have that bit+pointer, using a separate NULL entry isn't exactly prettier. Especially as we actally want to see the difference between end-of-allocation and not yet filled in, so you shouldn't use NULL anyway, you should probably use something like all-ones. Linus - Every one is so hysterical about this sg-chaining problem. And massive patches produced, that when a simple none intrusive solution is proposed it is totally ignored because every one thinks, I can not be that stupid. Well Einstein said: Simplicity is the ultimate sophistication. So no one need to feel bad. It's all about the end goal - having maintainable and resilient code. And I think the sg code will be better once we get past the next day or so, and it'll be more robust. That is what matters to me, not the simplicity of the patch itself. But that is exactly what his patch is. Much more robust. Because you do not relay on sglist content but on outside information, that you already have. Have you had an hard look at his solution? It just simply falls into place. Please try it out for yourself. I did, and it works. Sure, I looked at it, it's not exactly rocket science, I do understand what it achieves. I don't think the patch is bad as such, I'm merely trying to state that I think the end code AND interface will be much nicer with the current direction that the sg helpers are moving. It does rely on outside context, because you need to pass in the sglist number. In my opinion, this patch would be a bandaid for the original chain code until we got around to fixing the PAGEALLOC crash. Which we did, it's now merged. The patch doesn't make the code cleaner, it makes it uglier. It'll work, but that still doesn't mean I have to agree it's a nice design. A nice design is to have an struct like BIO. That holds a pointer to the array of scatterlists, size, ..., and a next and prev pointers to the next chunks. Than have all kernel code that now accepts scatterlist* and size accept a pointer to such structure. And all is clear and defined. But since we do not do that, and every single API in the kernel that receives a scatterlist pointer also receives an sg_count parameter, than I do not see what is so hacky about giving that sg_count parameter to the one that needs it the most. sg_next(); OK I guess this is all a matter of taste so there is no point arguing about it any more. I can see your view, and the work has been done so I guess there is no point going back. If it all works than it's for the best. Thanks Jens for doing all this, The performance gain is substantial and we will all enjoy it. Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/10] Change table chaining layout
On Mon, Oct 22 2007 at 20:11 +0200, Jens Axboe [EMAIL PROTECTED] wrote: Change the page member of the scatterlist structure to be an unsigned long, and encode more stuff in the lower bits: - Bits 0 and 1 zero: this is a normal sg entry. Next sg entry is located at sg + 1. - Bit 0 set: this is a chain entry, the next real entry is at -page_link with the two low bits masked off. - Bit 1 set: this is the final entry in the sg entry. sg_next() will return NULL when passed such an entry. It's thus important that sg table users use the proper accessors to get and set the page member. Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- snip /** * sg_set_page - Set sg entry to point at given page @@ -20,11 +37,20 @@ **/ static inline void sg_set_page(struct scatterlist *sg, struct page *page) { - sg-page = page; + unsigned long page_link = sg-page_link 0x3; + You might want to put a BUG_ON(page 0x3); Make sure you're not loosing information. (The m68k problem) + sg-page_link = page_link | (unsigned long) page; } -#define sg_page(sg) ((sg)-page) +#define sg_page(sg) ((struct page *) ((sg)-page_link ~0x3)) snip Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: HIGHMEM64G Kernel (2.6.23.1) makes system crawl
On Wed, Oct 24 2007 at 12:26 +0200, Rajkumar S [EMAIL PROTECTED] wrote: Hello, I am using a Core 2 Duo E6750 CPU on an intel DG33FB mother board with 4GB Ram, running Debian Lenny. Since the box has 4 GB ram I compiled a big mem kernel, but the machine is very slow while running big mem kernel. It takes about 37 minutes to compile the intel e1000 driver (e1000-7.6.5.tar.gz) from intel site. But it's performing normally when using a non big mem kernel. The diff of the .config between working and non working is as follows. --- config-nobigmem 2007-10-24 12:51:42.0 +0530 +++ config-bigmem 2007-10-24 12:48:56.0 +0530 @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23.1 -# Wed Oct 24 12:50:21 2007 +# Tue Oct 23 18:08:30 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y @@ -189,10 +189,11 @@ CONFIG_DELL_RBU=m CONFIG_DCDBAS=m CONFIG_DMIID=y -CONFIG_NOHIGHMEM=y +# CONFIG_NOHIGHMEM is not set # CONFIG_HIGHMEM4G is not set -# CONFIG_HIGHMEM64G is not set +CONFIG_HIGHMEM64G=y CONFIG_PAGE_OFFSET=0xC000 +CONFIG_HIGHMEM=y CONFIG_X86_PAE=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y @@ -211,6 +212,7 @@ CONFIG_BOUNCE=y CONFIG_NR_QUICK=1 CONFIG_VIRT_TO_BUS=y +# CONFIG_HIGHPTE is not set # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y CONFIG_EFI=y @@ -223,11 +225,13 @@ # CONFIG_HZ_1000 is not set CONFIG_HZ=250 CONFIG_KEXEC=y +# CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x10 # CONFIG_RELOCATABLE is not set CONFIG_PHYSICAL_ALIGN=0x10 CONFIG_HOTPLUG_CPU=y CONFIG_COMPAT_VDSO=y +CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y # # Power management options (ACPI, APM) @@ -1283,6 +1287,7 @@ CONFIG_I2O=m CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=y CONFIG_I2O_EXT_ADAPTEC=y +CONFIG_I2O_EXT_ADAPTEC_DMA64=y CONFIG_I2O_CONFIG=m CONFIG_I2O_CONFIG_OLD_IOCTL=y CONFIG_I2O_BUS=m @@ -3323,6 +3328,7 @@ # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set # CONFIG_DEBUG_KOBJECT is not set +# CONFIG_DEBUG_HIGHMEM is not set CONFIG_DEBUG_BUGVERBOSE=y # CONFIG_DEBUG_INFO is not set # CONFIG_DEBUG_VM is not set I was having this same issue with debian kernel also (2.6.22-2-686), ie stock kernel will work, but big mem is slow. Which is why I tried latest kernel. I am posting the full .config at http://pastebin.ca/747758 dmesg while booting big mem kernel is at http://pastebin.ca/747766 raj I thought that for 4GB of memory the CONFIG_HIGHMEM4G=y CONFIG_HIGHMEM64G is not set would be enough. From looking in source code the CONFIG_HIGHMEM64G is broken in many respects, in my opinion. It's really a 64-bit with 32-bit quirks enabled. Does it even run on None 64-bit machines? Not many! You have a magnificent 64-bit Machine, why? Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: HIGHMEM64G Kernel (2.6.23.1) makes system crawl
On Wed, Oct 24 2007 at 18:35 +0200, [EMAIL PROTECTED] (Lennart Sorensen) wrote: On Wed, Oct 24, 2007 at 04:41:16PM +0200, Boaz Harrosh wrote: I thought that for 4GB of memory the CONFIG_HIGHMEM4G=y CONFIG_HIGHMEM64G is not set would be enough. From looking in source code the CONFIG_HIGHMEM64G is broken in many respects, in my opinion. It's really a 64-bit with 32-bit quirks enabled. Does it even run on None 64-bit machines? Not many! You have a magnificent 64-bit Machine, why? CONFIG_HIGHMEM4G means 4GB address space. Since PCI and BIOS use some of the first 4GB address space, a machine with 4GB of actual RAM must map some of that RAM above the 4GB address space, so you access it all you need to enable PAE (address space extensions intel invented to give 36bit addressing on 32bit machines), which lets the OS map memory above 4GB (up to 64GB), to let applications use it. Applications are still limited to a 32bit address space. 64bit machines of course just use all memory directly without any silly extensions. The common intel BIOS bug relating to configuring the MTRR and such for caching of memory makes the top portion of memory uncached and hence very slow, which means any code placed in the top part of memory gets slow. The kernel likes to place itself at the top of memory which means the kernel gets slow and hence the whole system gets slow. Holefully intel will learn to make proper BIOSes real soon so that people with intel boards can stop having such slow machines when they have lots of ram. -- Len Sorensen So one thing I don't understand is the difference between CONFIG_HIGHMEM4G and regular 32-bit. I always thought that 32 bit means 4GB address space and that the HIGHMEM4G is using your above trick but with 32-bit dma_addr_t since there is only actual 4GB of real memory and the rest is mapping tricks. But I guess I was wrong. The other point I was making is why use 36 bit trickery when you have a machine that is capable of the full 64-bit, and runs much faster at that? Just a waste of a good machine, wont you say? Perhaps we learn to use x86_64 ARCHs before Intel fixes their BIOS, and they where right all along? Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scsi/arm/fas216.c compile error
On Sat, Feb 09 2008 at 2:04 +0200, Adrian Bunk [EMAIL PROTECTED] wrote: Commit 30b0c37b27485a9cb897bfe3824f6f517b8c80d6 causes the following compile error: -- snip -- ... CC drivers/scsi/arm/fas216.o /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/arm/fas216.c: In function 'fas216_std_done': /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/arm/fas216.c:2120: error: 'struct scsi_cmnd' has no member named 'request_bufflen' /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/arm/fas216.c:2122: error: 'struct scsi_cmnd' has no member named 'use_sg' make[4]: *** [drivers/scsi/arm/fas216.o] Error 1 -- snip -- cu Adrian It's in mainline 84ac86ca8c6787f9efff28bc04b1b65fe0a5c310 [SCSI] arm: convert to accessors and !use_sg cleanup Thanks for checking. This patch was in scsi-pending tree since forever, And we were unable to get a responsive maintainer to ACK on them. until the breakage cause went into mainline we finally managed a Tested-by:. I guess sometimes people are so busy, you need a bulldozer to shove 20 minutes into they're schedule. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scsi/arm/fas216.c compile error
On Sun, Feb 10 2008 at 15:58 +0200, Russell King [EMAIL PROTECTED] wrote: On Sun, Feb 10, 2008 at 03:07:09PM +0200, Boaz Harrosh wrote: It's in mainline 84ac86ca8c6787f9efff28bc04b1b65fe0a5c310 [SCSI] arm: convert to accessors and !use_sg cleanup Thanks for checking. This patch was in scsi-pending tree since forever, And we were unable to get a responsive maintainer to ACK on them. until the breakage cause went into mainline we finally managed a Tested-by:. I guess sometimes people are so busy, you need a bulldozer to shove 20 minutes into they're schedule. Oh, I was ill for most of December, particularly at the time that you sent the patch, and by the time I recovered, it was buried in my mailbox. Suggest you have some consideration for others who might not be able to do your beg and call at the immediate moment that you want it, and consider that their email management skills may not be as l33t as yours. Dear Russell. You are right. I apologize. I was too trigger happy. I should have resend the request. The patches were in scsi-pending and in -mm. And I assumed it was all good. But it was not accepted into scsi-misc and was somewhat forgotten. I assure you, my email management skills are just as laking as yours. Just that my responsibility's are few. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scsi/arm/fas216.c compile error
On Mon, Feb 11 2008 at 12:02 +0200, Russell King [EMAIL PROTECTED] wrote: On Mon, Feb 11, 2008 at 11:47:57AM +0200, Boaz Harrosh wrote: On Mon, Feb 11 2008 at 0:44 +0200, James Bottomley [EMAIL PROTECTED] wrote: Andrew this patch was in -mm for two month or so. I was under the impression that you have an arm cross compiler that tries to build every -mm kernel. Is it possible that for some reason this portion did not get compiled? Is there a place that one can inspect the output of -mm compilations, Specially for cross compiled ARCHs? Having a patch sit in -mm for ARM doesn't mean a lot since there's no guarantee that it'll get built, and that is because the ARM architecture is very diverse; it's not possible to build a single kernel to support everything. So, when akpm builds a kernel for ARM, it's normally centered around one particular ARM defconfig (maybe with allyconfig or allmodconfig afterwards) but even that won't build all ARM specific code. This is why we now have kautobuild - http://armlinux.simtec.co.uk/kautobuild/ That's the only way we can get decent compilation coverage. That system isn't publically accessible (it's not even accessible to me) and it only builds the mainline kernels. Adding -mm to it might be possible, but as I understand the situation, even though it uses things like ccache, it can take about 10 or so hours to complete a set of builds. Thanks Russell. That explains it. I wish they would include an -mm kernel once in a while. like 2-3 every kernel cycle. It is much nicer to find the problems before they are already in mainline. I would certainly sleep better. You have my vote. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scsi/arm/fas216.c compile error
On Mon, Feb 11 2008 at 0:44 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-02-10 at 22:02 +, Russell King wrote: On Sun, Feb 10, 2008 at 08:20:24AM -0600, James Bottomley wrote: On Sun, 2008-02-10 at 13:58 +, Russell King wrote: On Sun, Feb 10, 2008 at 03:07:09PM +0200, Boaz Harrosh wrote: It's in mainline 84ac86ca8c6787f9efff28bc04b1b65fe0a5c310 [SCSI] arm: convert to accessors and !use_sg cleanup Thanks for checking. This patch was in scsi-pending tree since forever, And we were unable to get a responsive maintainer to ACK on them. until the breakage cause went into mainline we finally managed a Tested-by:. I guess sometimes people are so busy, you need a bulldozer to shove 20 minutes into they're schedule. Oh, I was ill for most of December, particularly at the time that you sent the patch, and by the time I recovered, it was buried in my mailbox. Suggest you have some consideration for others who might not be able to do your beg and call at the immediate moment that you want it, and consider that their email management skills may not be as l33t as yours. OK, sorry about this, it's a bit of a cockup all around. The patch that fixes this problem is still in SCSI pending largely because it's patch description: [SCSI] fas216: Use scsi_eh API for REQUEST_SENSE invocation Doesn't lead one to think it might be build critical, so I concentrated on getting the other arm patch out. Russell, could you give it a quick test, and I'll put it in with a tested-by tag? It's not looking good: CC drivers/scsi/arm/fas216.o drivers/scsi/arm/fas216.c: In function `fas216_rq_sns_done': drivers/scsi/arm/fas216.c:2021: warning: passing arg 2 of `scsi_eh_restore_cmnd' from incompatible pointer type drivers/scsi/arm/fas216.c: In function `fas216_std_done': drivers/scsi/arm/fas216.c:2107: warning: passing arg 2 of `scsi_eh_prep_cmnd' from incompatible pointer type Since the second argument of scsi_eh_prep_cmnd is 'struct scsi_eh_save *ses' this patch is most definitely bad. Not even booted it. Yes, there looks to be a fatal screw up in the definition in FAS216_Info. Could you try this ... I think I've corrected it. James --- From 35be7297dc581b42c05ea7751ee595b0ce78f669 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh [EMAIL PROTECTED] Date: Mon, 10 Sep 2007 22:39:11 +0300 Subject: [SCSI] fas216: Use scsi_eh API for REQUEST_SENSE invocation - Use new scsi_eh_prep/restor_cmnd() for synchronous REQUEST_SENSE invocation. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] Cc: Russell King [EMAIL PROTECTED] Signed-off-by: James Bottomley [EMAIL PROTECTED] --- drivers/scsi/arm/fas216.c | 16 +++- drivers/scsi/arm/fas216.h |3 +++ 2 files changed, 6 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/arm/fas216.c b/drivers/scsi/arm/fas216.c index fb5f202..a715632 100644 --- a/drivers/scsi/arm/fas216.c +++ b/drivers/scsi/arm/fas216.c @@ -2018,6 +2018,7 @@ static void fas216_rq_sns_done(FAS216_Info *info, struct scsi_cmnd *SCpnt, * the upper layers to process. This would have been set * correctly by fas216_std_done. */ + scsi_eh_restore_cmnd(SCpnt, info-ses); SCpnt-scsi_done(SCpnt); } @@ -2103,23 +2104,12 @@ request_sense: if (SCpnt-cmnd[0] == REQUEST_SENSE) goto done; + scsi_eh_prep_cmnd(SCpnt, info-ses, NULL, 0, ~0); fas216_log_target(info, LOG_CONNECT, SCpnt-device-id, requesting sense); - memset(SCpnt-cmnd, 0, sizeof (SCpnt-cmnd)); - SCpnt-cmnd[0] = REQUEST_SENSE; - SCpnt-cmnd[1] = SCpnt-device-lun 5; - SCpnt-cmnd[4] = sizeof(SCpnt-sense_buffer); - SCpnt-cmd_len = COMMAND_SIZE(SCpnt-cmnd[0]); - SCpnt-SCp.buffer = NULL; - SCpnt-SCp.buffers_residual = 0; - SCpnt-SCp.ptr = (char *)SCpnt-sense_buffer; - SCpnt-SCp.this_residual = sizeof(SCpnt-sense_buffer); - SCpnt-SCp.phase = sizeof(SCpnt-sense_buffer); + init_SCp(SCpnt); SCpnt-SCp.Message = 0; SCpnt-SCp.Status = 0; - SCpnt-request_bufflen = sizeof(SCpnt-sense_buffer); - SCpnt-sc_data_direction = DMA_FROM_DEVICE; - SCpnt-use_sg = 0; SCpnt-tag = 0; SCpnt-host_scribble = (void *)fas216_rq_sns_done; diff --git a/drivers/scsi/arm/fas216.h b/drivers/scsi/arm/fas216.h index 00e5f05..b65f4cf 100644 --- a/drivers/scsi/arm/fas216.h +++ b/drivers/scsi/arm/fas216.h @@ -16,6 +16,8 @@ #define NO_IRQ 255 #endif +#include scsi/scsi_eh.h + #include queue.h #include msgqueue.h @@ -311,6 +313,7 @@ typedef struct { /* miscellaneous */ int internal_done; /* flag to indicate request done */ + struct scsi_eh_save ses;/* holds request sense restore info */ unsigned long magic_end; } FAS216_Info; Yes the pointer thing, Thanks James. Andrew this patch
[BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
gdth _exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. del_timer_sync the timer before we delete the cards. NOT YET TESTED Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..103280e 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; +BUG_ON(list_empty(gdth_instances)); + ha = list_first_entry(gdth_instances, gdth_ha_str, list); spin_lock_irqsave(ha-smp_lock, flags); @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha) ha-sdev = NULL; } - gdth_flush(ha); - if (shp-irq) free_irq(shp-irq,ha); @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void) { gdth_ha_str *ha; - list_for_each_entry(ha, gdth_instances, list) - gdth_remove_one(ha); + unregister_chrdev(major,gdth); + unregister_reboot_notifier(gdth_notifier); #ifdef GDTH_STATISTICS - del_timer(gdth_timer); + del_timer_sync(gdth_timer); #endif - unregister_chrdev(major,gdth); - unregister_reboot_notifier(gdth_notifier); + + list_for_each_entry(ha, gdth_instances, list) + gdth_remove_one(ha); } module_init(gdth_init); -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUGFIX 1/2] gdth: scan for scsi devices
The patch: gdth: switch to modern scsi host registration missed one simple fact when moving a way from scsi_module.c. That is to call scsi_scan_host() on the probed host. With this the gdth driver from 2.6.24 is again able to see drives and boot. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] Tested-by: Joerg Dorchain: [EMAIL PROTECTED] Tested-by: Stefan Priebe [EMAIL PROTECTED] Tested-by: Jon Chelton [EMAIL PROTECTED] --- drivers/scsi/gdth.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index b253b8c..8eb78be 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -4838,6 +4838,9 @@ static int __init gdth_isa_probe_one(ulong32 isa_bios) if (error) goto out_free_coal_stat; list_add_tail(ha-list, gdth_instances); + + scsi_scan_host(shp); + return 0; out_free_coal_stat: @@ -4965,6 +4968,9 @@ static int __init gdth_eisa_probe_one(ushort eisa_slot) if (error) goto out_free_coal_stat; list_add_tail(ha-list, gdth_instances); + + scsi_scan_host(shp); + return 0; out_free_ccb_phys: @@ -5102,6 +5108,9 @@ static int __init gdth_pci_probe_one(gdth_pci_str *pcistr, int ctr) if (error) goto out_free_coal_stat; list_add_tail(ha-list, gdth_instances); + + scsi_scan_host(shp); + return 0; out_free_coal_stat: -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage
On Thu, Jan 31 2008 at 12:08 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Wed, Jan 30 2008 at 21:47 +0200, Sven Köhler [EMAIL PROTECTED] wrote: Hi, so i have upgraded a system to kernel 2.6.24. After that, it failed to boot with the usual message telling, that the rootfs on device /dev/sda1 cannot be mounted (a raid1 run by the controller below). With 2.6.23.12, everything is working fine. # lspci -v: 03:01.0 RAID bus controller: Intel Corporation RAID Controller Subsystem: Intel Corporation Unknown device 01db Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17 Memory at ddffc000 (32-bit, prefetchable) [size=16K] [virtual] Expansion ROM at deef [disabled] [size=32K] Capabilities: [80] Power Management version 2 # GDT-related dmesg output (2.6.23.12): GDT-HA: Storage RAID Controller Driver. Version: 3.05 ACPI: PCI Interrupt :03:01.0[A] - GSI 24 (level, low) - IRQ 17 GDT-HA: Found 1 PCI Storage RAID Controllers Configuring GDT-PCI HA at 3/1 IRQ 17 GDT-HA 0: Name: SRCU42L scsi0 : SRCU42L scsi 0:0:0:0: Direct-Access IntelHost Drive #00 PQ: 0 ANSI: 2 scsi 0:2:6:0: Processor ESG-SHV SCA HSBP M29 1.06 PQ: 0 ANSI: 2 sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB) sd 0:0:0:0: [sda] Assuming Write Enabled sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB) sd 0:0:0:0: [sda] Assuming Write Enabled sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 sda2 sda5 sd 0:0:0:0: [sda] Attached SCSI disk # cat /boot/config-2.6.24 |grep GDT CONFIG_SCSI_GDTH=y Any ideas? http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2 show huge drivers/scsi/gdth* related changes. Can't test at the moment. System went production. Regards, Sven Hi Sven! snip With the kind help of: Joerg Dorchain: [EMAIL PROTECTED] Jon Chelton [EMAIL PROTECTED] Stefan Priebe [EMAIL PROTECTED] Which let me take up their machines their effort and their time I hope I'm able to fix the gdth driver for the 2.6.24 kernel and forward. Actually it was a simple miss by Christoph, but with my inexperience it took me a bisection and a while to get it. Both Joerg, and Stefan where able to boot with these patches and work on their machine. Jon Chelton's disks array should also work, we're testing. Submitted 2 patches. They should also be included after some testing into the 2.6.24.x stable releases. (Will be posted after some more testing) [PATCH 1/2] gdth: scan for scsi devices simple but must fatal. [PATCH 2/2] gdth: bugfix for the Timer at exit crash James please inspect and comment on this patch. It was not yet tested by the original bug submitter. In the original gdth series Christoph has forgotten to add the call to scsi_scan_host(). Jeff alternative patches did do the scan. After everything was probed, the code would loop on all found cards and scan. However I like to individually scan at each probe, because I think this way it is more ready for the hot-plug API where the discovery is done outside of the driver, and the probe is called on single host at a time. Is that right? please comment. Test away. Meany thanks to Joerg, Jon Stefan, Cheers. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.25-rc1-git2: GDT SCSI: change drivers/scsi/gdth.c into using pci_get device
On Tue, Feb 12 2008 at 18:22 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 11:31 -0300, Sergio Luis wrote: Fix compilation warning in drivers/scsi/gdth.c, using deprecated pci_find_device. Change it into using pci_get_device instead: drivers/scsi/gdth.c:645: warning: 'pci_find_device' is deprecated (declared at include/linux/pci.h:495) Signed-off-by: Sergio Luis [EMAIL PROTECTED] gdth.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -urN linux-2.6.25-rc1-git2.orig/drivers/scsi/gdth.c linux-2.6.25-rc1-git2/drivers/scsi/gdth.c --- linux-2.6.25-rc1-git2.orig/drivers/scsi/gdth.c 2008-02-12 09:26:14.0 -0300 +++ linux-2.6.25-rc1-git2/drivers/scsi/gdth.c2008-02-12 10:33:08.0 -0300 @@ -642,7 +642,7 @@ *cnt, vendor, device)); pdev = NULL; -while ((pdev = pci_find_device(vendor, device, pdev)) +while ((pdev = pci_get_device(vendor, device, pdev)) != NULL) { if (pci_enable_device(pdev)) continue; This can't be correct without a matching put in the error leg. The difference between the two APIs is that pci_get_device returns a pci_device with a reference taken and pci_find_device doesn't. However, pci_get_device does drop the reference again so as long as you never exit the loop until it returns NULL, it is OK ... it's the exits before pci_get_device() returns NULL that need the put. James Yes you are right I have just realized that. Reinspecting pci_get_device() Sergio will you do it? or should I have a try? Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.25-rc1-git2: GDT SCSI: change drivers/scsi/gdth.c into using pci_get device
On Wed, Feb 13 2008 at 2:17 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 20:48 -0300, Sergio Luis wrote: reposting an updated version of it. Please check if it's ok. Looks fine, thanks! You added an extra space at the end of while ((pdev = pci_get_device(vendor, device, pdev)) Which I fixed. Unfortunately checkpatch isn't very helpful for this driver since it uses spaces not tabs everywhere, which checkpatch really hates. James James hi This patch is now obsolete. After Jeff's patch to convert it to hot plug API. Jeffs patch looks very good to me. I will try to push it through the testers. I still don't have a card for testing myself. Again anyone wants to send me a card. Intel people anybody home? I will add the missing Kconfig hunk to jeff's patch and will push it after it is tested. Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 9:06 +0200, Stefan Priebe - allied internet ag [EMAIL PROTECTED] wrote: Hello! I've tested this patch now - and it works fine. Now rmmod, halt and reboot also works. Stefan Priebe This is grate news Stefan. Thank you very much for all your time and effort, with out we could not have fixed all this. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Tue, Feb 12 2008 at 19:40 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: gdth _exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. del_timer_sync the timer before we delete the cards. NOT YET TESTED Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] Tested-by: Stefan Priebe [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..103280e 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; +BUG_ON(list_empty(gdth_instances)); + ha = list_first_entry(gdth_instances, gdth_ha_str, list); spin_lock_irqsave(ha-smp_lock, flags); @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha) ha-sdev = NULL; } - gdth_flush(ha); - if (shp-irq) free_irq(shp-irq,ha); @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void) { gdth_ha_str *ha; - list_for_each_entry(ha, gdth_instances, list) - gdth_remove_one(ha); + unregister_chrdev(major,gdth); + unregister_reboot_notifier(gdth_notifier); #ifdef GDTH_STATISTICS - del_timer(gdth_timer); + del_timer_sync(gdth_timer); #endif - unregister_chrdev(major,gdth); - unregister_reboot_notifier(gdth_notifier); + + list_for_each_entry(ha, gdth_instances, list) + gdth_remove_one(ha); } module_init(gdth_init); James please put this patch in rc-fixes also. It has now been tested by few people, and it solves a reproducible problem in the unloading of the driver. It was not yet confirmed by Andrew's reporter with the: +if (list_empty(gdth_instances)) + return; at gdth_timer() In -mm tree. In my patch I have converted the if() to a BUG_ON because now it should not happen. But I figure it is not worse then what there is now, which is nothing. With your recommendation I will push both patches to the stable branches People have emailed me requesting it. Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] gdth: convert to PCI hotplug API
On Wed, Feb 13 2008 at 1:49 +0200, Jeff Garzik [EMAIL PROTECTED] wrote: Signed-off-by: Jeff Garzik [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 143 +++- 1 file changed, 86 insertions(+), 57 deletions(-) snip below is the same exact patch rebased on my last two bugfixes. (Already in scsi-rc-fixes) do you intend this to be pushed into 2.6.25-rcx or this is already for 2.6.26? Should we put this in -mm tree for testing? Boaz THIS IS NOT YET TESTED --- From: Jeff Garzik [EMAIL PROTECTED] Date: Wed, 13 Feb 2008 13:01:16 +0200 Subject: [PATCH] gdth: convert to PCI hotplug API Signed-off-by: Jeff Garzik [EMAIL PROTECTED] Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 143 ++ 1 files changed, 86 insertions(+), 57 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 103280e..107d2c7 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -596,85 +596,107 @@ static int __init gdth_search_isa(ulong32 bios_adr) #endif /* CONFIG_ISA */ #ifdef CONFIG_PCI -static void gdth_search_dev(gdth_pci_str *pcistr, ushort *cnt, -ushort vendor, ushort dev); +static gdth_pci_str gdth_pcistr[MAXHA]; +static int gdth_pci_cnt; +static bool gdth_pci_registered; -static int __init gdth_search_pci(gdth_pci_str *pcistr) +static bool __init gdth_search_vortex(ushort device) { -ushort device, cnt; - -TRACE((gdth_search_pci()\n)); - -cnt = 0; -for (device = 0; device = PCI_DEVICE_ID_VORTEX_GDT6555; ++device) -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_VORTEX, device); -for (device = PCI_DEVICE_ID_VORTEX_GDT6x17RP; - device = PCI_DEVICE_ID_VORTEX_GDTMAXRP; ++device) -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_VORTEX, device); -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_VORTEX, -PCI_DEVICE_ID_VORTEX_GDTNEWRX); -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_VORTEX, -PCI_DEVICE_ID_VORTEX_GDTNEWRX2); -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_INTEL, -PCI_DEVICE_ID_INTEL_SRC); -gdth_search_dev(pcistr, cnt, PCI_VENDOR_ID_INTEL, -PCI_DEVICE_ID_INTEL_SRC_XSCALE); -return cnt; + if (device = PCI_DEVICE_ID_VORTEX_GDT6555) + return true; + if (device = PCI_DEVICE_ID_VORTEX_GDT6x17RP + device = PCI_DEVICE_ID_VORTEX_GDTMAXRP) + return true; + if (device == PCI_DEVICE_ID_VORTEX_GDTNEWRX || + device == PCI_DEVICE_ID_VORTEX_GDTNEWRX2) + return true; + return false; } +static int gdth_pci_init_one(struct pci_dev *pdev, +const struct pci_device_id *ent); +static void gdth_pci_remove_one(struct pci_dev *pdev); +static void gdth_remove_one(gdth_ha_str *ha); + /* Vortex only makes RAID controllers. * We do not really want to specify all 550 ids here, so wildcard match. */ -static struct pci_device_id gdthtable[] __maybe_unused = { -{PCI_VENDOR_ID_VORTEX,PCI_ANY_ID,PCI_ANY_ID, PCI_ANY_ID}, -{PCI_VENDOR_ID_INTEL,PCI_DEVICE_ID_INTEL_SRC,PCI_ANY_ID,PCI_ANY_ID}, - {PCI_VENDOR_ID_INTEL,PCI_DEVICE_ID_INTEL_SRC_XSCALE,PCI_ANY_ID,PCI_ANY_ID}, -{0} +static struct pci_device_id gdthtable[] __devinitdata = { + { PCI_VDEVICE(VORTEX, PCI_ANY_ID) }, + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_SRC) }, + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_SRC_XSCALE) }, + { } /* terminate list */ +}; +MODULE_DEVICE_TABLE(pci, gdthtable); + +static struct pci_driver gdth_pci_driver = { + .name = gdth, + .id_table = gdthtable, + .probe = gdth_pci_init_one, + .remove = gdth_pci_remove_one, }; -MODULE_DEVICE_TABLE(pci,gdthtable); -static void __init gdth_search_dev(gdth_pci_str *pcistr, ushort *cnt, - ushort vendor, ushort device) +static void gdth_pci_remove_one(struct pci_dev *pdev) +{ + gdth_ha_str *ha = pci_get_drvdata(pdev); + + pci_set_drvdata(pdev, NULL); + + list_del(ha-list); + gdth_remove_one(ha); + + pci_disable_device(pdev); +} + +static int __devinit gdth_pci_init_one(struct pci_dev *pdev, + const struct pci_device_id *ent) { -ulong base0, base1, base2; -struct pci_dev *pdev; + ushort vendor = pdev-vendor; + ushort device = pdev-device; + ulong base0, base1, base2; + int rc; -TRACE((gdth_search_dev() cnt %d vendor %x device %x\n, - *cnt, vendor, device)); + TRACE((gdth_search_dev() cnt %d vendor %x device %x\n, + gdth_pci_cnt, vendor, device)); + + if (vendor == PCI_VENDOR_ID_VORTEX !gdth_search_vortex(device)) + return -ENODEV; + + rc = pci_enable_device(pdev); + if (rc) + return rc
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: -gdth_flush(ha); - This piece doesn't look right. gdth_flush() forces the internal cache to disk backing. If you remove it, you're taking the chance that the machine will be powered off without a writeback which can cause data corruption. James Yes. I have more problems reported, with exit, and am just sending one more patch that puts this back in. Which was tested. So I will resend this one plus one new one. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH ver2] gdth: bugfix for the at-exit problems
gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also a reboot notifier function was registered but is not needed anymore. And would crash. So remove gdth_halt and the reboot notifier registration. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 64 +++--- 1 files changed, 9 insertions(+), 55 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..8469fe6 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg); static void gdth_flush(gdth_ha_str *ha); -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf); static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *)); static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, struct gdth_cmndinfo *cmndinfo); @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd, #include gdth_proc.h #include gdth_proc.c -/* notifier block to get a notify on system shutdown/halt/reboot */ -static struct notifier_block gdth_notifier = { -gdth_halt, NULL, 0 -}; -static int notifier_disabled = 0; - static gdth_ha_str *gdth_find_ha(int hanum) { gdth_ha_str *ha; @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; +BUG_ON(list_empty(gdth_instances)); + ha = list_first_entry(gdth_instances, gdth_ha_str, list); spin_lock_irqsave(ha-smp_lock, flags); @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha) } } -/* shutdown routine */ -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) -{ -gdth_ha_str *ha; -#ifndef __alpha__ -gdth_cmd_strgdtcmd; -charcmnd[MAX_COMMAND_SIZE]; -#endif - -if (notifier_disabled) -return NOTIFY_OK; - -TRACE2((gdth_halt() event %d\n,(int)event)); -if (event != SYS_RESTART event != SYS_HALT event != SYS_POWER_OFF) -return NOTIFY_DONE; - -notifier_disabled = 1; -printk(GDT-HA: Flushing all host drives .. ); -list_for_each_entry(ha, gdth_instances, list) { -gdth_flush(ha); - -#ifndef __alpha__ -/* controller reset */ -memset(cmnd, 0xff, MAX_COMMAND_SIZE); -gdtcmd.BoardNode = LOCALBOARD; -gdtcmd.Service = CACHESERVICE; -gdtcmd.OpCode = GDT_RESET; -TRACE2((gdth_halt(): reset controller %d\n, ha-hanum)); -gdth_execute(ha-shost, gdtcmd, cmnd, 10, NULL); -#endif -} -printk(Done.\n); - -#ifdef GDTH_STATISTICS -del_timer(gdth_timer); -#endif -return NOTIFY_OK; -} - /* configure lun */ static int gdth_slave_configure(struct scsi_device *sdev) { @@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_remove_host(shp); + gdth_flush(ha); + if (ha-sdev) { scsi_free_host_dev(ha-sdev); ha-sdev = NULL; } - gdth_flush(ha); - if (shp-irq) free_irq(shp-irq,ha); @@ -5235,8 +5191,6 @@ static int __init gdth_init(void) add_timer(gdth_timer); #endif major = register_chrdev(0,gdth, gdth_fops); - notifier_disabled = 0; - register_reboot_notifier(gdth_notifier); gdth_polling = FALSE; return 0; } @@ -5245,14 +5199,14 @@ static void __exit gdth_exit(void) { gdth_ha_str *ha; - list_for_each_entry(ha, gdth_instances, list) - gdth_remove_one(ha); + unregister_chrdev(major,gdth); #ifdef GDTH_STATISTICS - del_timer(gdth_timer); + del_timer_sync(gdth_timer); #endif - unregister_chrdev(major,gdth); - unregister_reboot_notifier(gdth_notifier); + + list_for_each_entry(ha, gdth_instances, list) + gdth_remove_one(ha); } module_init(gdth_init); -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: - gdth_flush(ha); - This piece doesn't look right. gdth_flush() forces the internal cache to disk backing. If you remove it, you're taking the chance that the machine will be powered off without a writeback which can cause data corruption. James Yes. I have more problems reported, with exit, and am just sending one more patch that puts this back in. Which was tested. So I will resend this one plus one new one. Boaz The gdth driver would do a register_reboot_notifier(gdth_notifier); to a gdth_halt() function, which would then redo half of what gdth_exit does, and wrongly so, and crash. Are we guaranteed in todays kernel that modules .exit function be called on an halt or reboot? If so then there is no need for duplications and the gdth_halt() should go. Submitted a patch that replaces the previous one I submitted with a deeper fix. [PATCH] gdth: bugfix for the at-exit problems If you ask me this all gdth_flush() is a crackup. sd and scsi-ml are doing scsi FLUSH commands when ever is needed. The controller as no business caching data in memory longer then what is stated in standard. Raid controller or no raid controller. Virtual or not virtual device. Data on Plate means data on plate. What if there is a power outage? what the driver can do then? Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote: On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: - gdth_flush(ha); - This piece doesn't look right. gdth_flush() forces the internal cache to disk backing. If you remove it, you're taking the chance that the machine will be powered off without a writeback which can cause data corruption. James Yes. I have more problems reported, with exit, and am just sending one more patch that puts this back in. Which was tested. So I will resend this one plus one new one. Boaz The gdth driver would do a register_reboot_notifier(gdth_notifier); to a gdth_halt() function, which would then redo half of what gdth_exit does, and wrongly so, and crash. Are we guaranteed in todays kernel that modules .exit function be called on an halt or reboot? If so then there is no need for duplications and the gdth_halt() should go. No. The __exit section is actually discardable if you promise never to remove the module. I don't understand please explain. What does a driver need to do if it needs a consistent shutdown retine? module or built in? unload or shutdown? James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote: On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote: On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: - gdth_flush(ha); - This piece doesn't look right. gdth_flush() forces the internal cache to disk backing. If you remove it, you're taking the chance that the machine will be powered off without a writeback which can cause data corruption. James Yes. I have more problems reported, with exit, and am just sending one more patch that puts this back in. Which was tested. So I will resend this one plus one new one. Boaz The gdth driver would do a register_reboot_notifier(gdth_notifier); to a gdth_halt() function, which would then redo half of what gdth_exit does, and wrongly so, and crash. Are we guaranteed in todays kernel that modules .exit function be called on an halt or reboot? If so then there is no need for duplications and the gdth_halt() should go. No. The __exit section is actually discardable if you promise never to remove the module. I don't understand please explain. What does a driver need to do if it needs a consistent shutdown retine? module or built in? unload or shutdown? It needs to register a reboot notifier, which gdth does. However, the notifier is only called on reboot, so it also needs to clean up correctly on module exit as well. The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE Why would we think that the controller does not support this command is it not in the mandatory section of the standard? command. That's done by a shutdown notifier from sd, so the correct thing would always get done; however it does mean the driver has to be in a condition to process the last sync cache command. Why would it not be ready? what do other drivers do? The drivers is ready until the very last module's .exit. Is that good enough? For the quick fix, just keep the current infrastructure and put back the gdth_flush() command where it can be effective. Just did. But if needed I would prefer to emulate the SCSI SYNCHRONIZE CACHE command and not that boot notifier thing. Please advise. James Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote: On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote: On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: - gdth_flush(ha); - This piece doesn't look right. gdth_flush() forces the internal cache to disk backing. If you remove it, you're taking the chance that the machine will be powered off without a writeback which can cause data corruption. James Yes. I have more problems reported, with exit, and am just sending one more patch that puts this back in. Which was tested. So I will resend this one plus one new one. Boaz The gdth driver would do a register_reboot_notifier(gdth_notifier); to a gdth_halt() function, which would then redo half of what gdth_exit does, and wrongly so, and crash. Are we guaranteed in todays kernel that modules .exit function be called on an halt or reboot? If so then there is no need for duplications and the gdth_halt() should go. No. The __exit section is actually discardable if you promise never to remove the module. I don't understand please explain. What does a driver need to do if it needs a consistent shutdown retine? module or built in? unload or shutdown? It needs to register a reboot notifier, which gdth does. However, the notifier is only called on reboot, so it also needs to clean up correctly on module exit as well. The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE command. That's done by a shutdown notifier from sd, so the correct thing would always get done; however it does mean the driver has to be in a condition to process the last sync cache command. For the quick fix, just keep the current infrastructure and put back the gdth_flush() command where it can be effective. James Totally untested. --- From: Boaz Harrosh [EMAIL PROTECTED] Subject: [PATCH] gdth: bugfix for the at-exit problems gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So unify the exit and halt functions with a gdth_shutdown() that's called by both. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 99 -- 1 files changed, 40 insertions(+), 59 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..7bb9b45 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg); static void gdth_flush(gdth_ha_str *ha); -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf); static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *)); static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, struct gdth_cmndinfo *cmndinfo); @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd, #include gdth_proc.h #include gdth_proc.c -/* notifier block to get a notify on system shutdown/halt/reboot */ -static struct notifier_block gdth_notifier = { -gdth_halt, NULL, 0 -}; -static int notifier_disabled = 0; - static gdth_ha_str *gdth_find_ha(int hanum) { gdth_ha_str *ha; @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; +BUG_ON(list_empty(gdth_instances)); + ha = list_first_entry(gdth_instances, gdth_ha_str, list); spin_lock_irqsave(ha-smp_lock, flags); @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha) } } -/* shutdown routine */ -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) -{ -gdth_ha_str *ha; -#ifndef __alpha__ -gdth_cmd_strgdtcmd; -charcmnd[MAX_COMMAND_SIZE]; -#endif - -if (notifier_disabled) -return NOTIFY_OK; - -TRACE2((gdth_halt() event %d\n,(int)event)); -if (event != SYS_RESTART event != SYS_HALT event != SYS_POWER_OFF) -return NOTIFY_DONE; - -notifier_disabled = 1; -printk(GDT-HA: Flushing all host drives .. ); -list_for_each_entry(ha, gdth_instances, list) { -gdth_flush(ha); - -#ifndef __alpha__ -/* controller reset */ -memset(cmnd, 0xff, MAX_COMMAND_SIZE); -gdtcmd.BoardNode = LOCALBOARD; -gdtcmd.Service = CACHESERVICE; -gdtcmd.OpCode = GDT_RESET; -TRACE2((gdth_halt(): reset controller %d\n, ha-hanum)); -gdth_execute(ha-shost, gdtcmd
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 19:36 +0200, James Bottomley [EMAIL PROTECTED] wrote: snip --- From: Boaz Harrosh [EMAIL PROTECTED] Subject: [PATCH] gdth: bugfix for the at-exit problems gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So unify the exit and halt functions with a gdth_shutdown() that's called by both. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- snip +static struct notifier_block gdth_notifier = { +gdth_halt, NULL, 0 +}; + +bool gdth_shutdown_done; right forgot the static. But I use it in gdth_init(), so it must be external. Unless you promise me that gdth_init() will never ever be called after a call to shutdown. Any way the hot-plug patch changes all that. This is only for 2.6.24 bugfixs. Static police alert! Just make it static and move it into gdth_shutdown() +static void gdth_shutdown() +{ +gdth_ha_str *ha; +if (gdth_shutdown_done) +return; + +gdth_shutdown_done = true; +unregister_chrdev(major,gdth); +unregister_reboot_notifier(gdth_notifier); I'm not sure you can do this, aren't reboot notifiers called with the rwsem held? In which case the unregister which also takes the rwsem will hang the system. humm, can't remove a notifier from within the notifier. Thanks James for the catch, it's what happens when you don't test your own patches. I have moved unregister_reboot_notifier to gdth_exit. James Will send a new version for review. Please note that this is a bugfix patch on top of 2.6.24. It is not needed for Jeff's hot-plug path. There will be one more bugfix patch for a crash at the user-mode ioctl code. Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] gdth: bugfix for the at-exit problems
This is a bugfix for the 2.6.24.x stable releases. gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So unify the exit and halt functions with a gdth_shutdown() that's called by both. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/gdth.c | 90 -- 1 files changed, 36 insertions(+), 54 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..3828b23 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg); static void gdth_flush(gdth_ha_str *ha); -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf); static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *)); static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, struct gdth_cmndinfo *cmndinfo); @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd, #include gdth_proc.h #include gdth_proc.c -/* notifier block to get a notify on system shutdown/halt/reboot */ -static struct notifier_block gdth_notifier = { -gdth_halt, NULL, 0 -}; -static int notifier_disabled = 0; - static gdth_ha_str *gdth_find_ha(int hanum) { gdth_ha_str *ha; @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; +BUG_ON(list_empty(gdth_instances)); + ha = list_first_entry(gdth_instances, gdth_ha_str, list); spin_lock_irqsave(ha-smp_lock, flags); @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha) } } -/* shutdown routine */ -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) -{ -gdth_ha_str *ha; -#ifndef __alpha__ -gdth_cmd_strgdtcmd; -charcmnd[MAX_COMMAND_SIZE]; -#endif - -if (notifier_disabled) -return NOTIFY_OK; - -TRACE2((gdth_halt() event %d\n,(int)event)); -if (event != SYS_RESTART event != SYS_HALT event != SYS_POWER_OFF) -return NOTIFY_DONE; - -notifier_disabled = 1; -printk(GDT-HA: Flushing all host drives .. ); -list_for_each_entry(ha, gdth_instances, list) { -gdth_flush(ha); - -#ifndef __alpha__ -/* controller reset */ -memset(cmnd, 0xff, MAX_COMMAND_SIZE); -gdtcmd.BoardNode = LOCALBOARD; -gdtcmd.Service = CACHESERVICE; -gdtcmd.OpCode = GDT_RESET; -TRACE2((gdth_halt(): reset controller %d\n, ha-hanum)); -gdth_execute(ha-shost, gdtcmd, cmnd, 10, NULL); -#endif -} -printk(Done.\n); - -#ifdef GDTH_STATISTICS -del_timer(gdth_timer); -#endif -return NOTIFY_OK; -} - /* configure lun */ static int gdth_slave_configure(struct scsi_device *sdev) { @@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_remove_host(shp); + gdth_flush(ha); + if (ha-sdev) { scsi_free_host_dev(ha-sdev); ha-sdev = NULL; } - gdth_flush(ha); - if (shp-irq) free_irq(shp-irq,ha); @@ -5173,6 +5129,22 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_host_put(shp); } +static void gdth_shutdown(void); +static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) +{ + TRACE2((gdth_halt() event %d\n, (int)event)); + if (event != SYS_RESTART event != SYS_HALT event != SYS_POWER_OFF) + return NOTIFY_DONE; + + gdth_shutdown(); + return NOTIFY_OK; +} + +static struct notifier_block gdth_notifier = { +gdth_halt, NULL, 0 +}; +static bool gdth_shutdown_done; + static int __init gdth_init(void) { if (disable) { @@ -5185,6 +5157,7 @@ static int __init gdth_init(void) GDTH_VERSION_STR); /* initializations */ + gdth_shutdown_done = false; gdth_polling = TRUE; gdth_clear_events(); @@ -5235,23 +5208,32 @@ static int __init gdth_init(void) add_timer(gdth_timer); #endif major = register_chrdev(0,gdth, gdth_fops); - notifier_disabled = 0; register_reboot_notifier(gdth_notifier); gdth_polling = FALSE; return 0; } -static void __exit gdth_exit(void) +static void gdth_shutdown() { gdth_ha_str *ha; - list_for_each_entry(ha, gdth_instances, list) - gdth_remove_one(ha); + if (gdth_shutdown_done) + return; + + gdth_shutdown_done = true; + unregister_chrdev(major, gdth); #ifdef GDTH_STATISTICS - del_timer(gdth_timer); + del_timer_sync(gdth_timer); #endif - unregister_chrdev(major,gdth); + + list_for_each_entry(ha, gdth_instances
Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
On Wed, Feb 13 2008 at 21:38 +0200, Jan Engelhardt [EMAIL PROTECTED] wrote: On Feb 13 2008 11:03, Boaz Harrosh wrote: I've tested this patch now - and it works fine. Now rmmod, halt and reboot also works. Stefan Priebe This is grate news Stefan. Thank you very much for all your time and effort, with out we could not have fixed all this. Do you have a git tree with the latest pieces? No, scsi-misc I guess ;) I could put it here: git://git.bhalevy.com/open-osd gdth branch give me an hours Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] gdth: bugfix for the at-exit problems
On Thu, Feb 14 2008 at 18:10 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Thu, 2008-02-14 at 13:58 +0200, Boaz Harrosh wrote: This is a bugfix for the 2.6.24.x stable releases. gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So unify the exit and halt functions with a gdth_shutdown() that's called by both. The patch looks fine now, thanks. Can we actually get a tester just to make sure there's nothing I missed. James Yes, and the tester reported, a breakage. We are on it. Apparently, you cannot do a full deallocation of resources at reboot notifier, nor would you want to I guess. But you can do the flush. The exit call is never called on a reboot and the card access is valid to the end. Please comment? So I pretty much reverted that patch, but did leave some cleanups. Also we found the other problems reported with user-mode tools and cat /proc/sys/gdth/0 so 2 patches on the way above reverted. Give us a few ours to test every thing. Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] bdi: Track users that require stable page writes
On 11/01/2012 12:58 AM, Darrick J. Wong wrote: This creates a per-backing-device counter that tracks the number of users which require pages to be held immutable during writeout. Eventually it will be used to waive wait_for_page_writeback() if nobody requires stable pages. There is two things I do not like: 1. Please remind me why we need the users counter? If the device needs it, it will always be needed. The below queue_unrequire_stable_pages call at blk_integrity_unregister only happens at device destruction time, no? It was also said that maybe individual filesystems would need stable pages where other FSs using the same BDI do not. But since the FS is at the driving seat in any case, it can just do wait_for_writeback() regardless and can care less about the bdi flag and the other FSs. Actually all those FSs already do this this, and do not need any help. So this reason is mute. 2. I hate the atomic_read for every mkwrite. I think we can do better, since as you noted it can never be turned off, only on, at init time. And because of 1. above it is not dynamic. I think I like your previous simple bool better. But I do like a lot the new request_queue API you added here. Thanks Boaz Signed-off-by: Darrick J. Wong darrick.w...@oracle.com --- Documentation/ABI/testing/sysfs-class-bdi |7 + block/blk-integrity.c |4 +++ include/linux/backing-dev.h | 16 include/linux/blkdev.h| 10 mm/backing-dev.c | 38 + 5 files changed, 75 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-class-bdi b/Documentation/ABI/testing/sysfs-class-bdi index 5f50097..218a618 100644 --- a/Documentation/ABI/testing/sysfs-class-bdi +++ b/Documentation/ABI/testing/sysfs-class-bdi @@ -48,3 +48,10 @@ max_ratio (read-write) most of the write-back cache. For example in case of an NFS mount that is prone to get stuck, or a FUSE mount which cannot be trusted to play fair. + +stable_pages_required (read-write) + + If set, the backing device requires that all pages comprising a write + request must not be changed until writeout is complete. The system + administrator can turn this on if the hardware does not do so already. + However, once enabled, this flag cannot be disabled. diff --git a/block/blk-integrity.c b/block/blk-integrity.c index da2a818..cf2dd95 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -420,6 +420,8 @@ int blk_integrity_register(struct gendisk *disk, struct blk_integrity *template) } else bi-name = bi_unsupported_name; + queue_require_stable_pages(disk-queue); + return 0; } EXPORT_SYMBOL(blk_integrity_register); @@ -438,6 +440,8 @@ void blk_integrity_unregister(struct gendisk *disk) if (!disk || !disk-integrity) return; + queue_unrequire_stable_pages(disk-queue); + bi = disk-integrity; kobject_uevent(bi-kobj, KOBJ_REMOVE); diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 2a9a9ab..0554f5d 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -109,6 +109,7 @@ struct backing_dev_info { struct dentry *debug_dir; struct dentry *debug_stats; #endif + atomic_t stable_page_users; }; int bdi_init(struct backing_dev_info *bdi); @@ -307,6 +308,21 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout); int pdflush_proc_obsolete(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +static inline void bdi_require_stable_pages(struct backing_dev_info *bdi) +{ + atomic_inc(bdi-stable_page_users); +} + +static inline void bdi_unrequire_stable_pages(struct backing_dev_info *bdi) +{ + atomic_dec(bdi-stable_page_users); +} + +static inline bool bdi_cap_stable_pages_required(struct backing_dev_info *bdi) +{ + return atomic_read(bdi-stable_page_users) 0; +} + static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi) { return !(bdi-capabilities BDI_CAP_NO_WRITEBACK); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1756001..bf927c0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -458,6 +458,16 @@ struct request_queue { (1 QUEUE_FLAG_SAME_COMP)| \ (1 QUEUE_FLAG_ADD_RANDOM)) +static inline void queue_require_stable_pages(struct request_queue *q) +{ + bdi_require_stable_pages(q-backing_dev_info); +} + +static inline void queue_unrequire_stable_pages(struct request_queue *q) +{ + bdi_unrequire_stable_pages(q-backing_dev_info); +} + static inline void queue_lockdep_assert_held(struct request_queue *q) { if
Re: [PATCH 3/3] fs: Fix remaining filesystems to wait for stable page writeback
On 11/01/2012 12:58 AM, Darrick J. Wong wrote: Fix up the filesystems that provide their own -page_mkwrite handlers to provide stable page writes if necessary. Signed-off-by: Darrick J. Wong darrick.w...@oracle.com --- fs/9p/vfs_file.c |1 + fs/afs/write.c |4 ++-- fs/ceph/addr.c |1 + fs/cifs/file.c |1 + fs/ocfs2/mmap.c |1 + fs/ubifs/file.c |4 ++-- 6 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c index c2483e9..aa253f0 100644 --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -620,6 +620,7 @@ v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) lock_page(page); if (page-mapping != inode-i_mapping) goto out_unlock; + wait_on_stable_page_write(page); Good god thanks, yes please ;-) return VM_FAULT_LOCKED; out_unlock: diff --git a/fs/afs/write.c b/fs/afs/write.c index 9aa52d9..39eb2a4 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -758,7 +758,7 @@ int afs_page_mkwrite(struct vm_area_struct *vma, struct page *page) afs, is it not a network filesystem? which means that it has it's own emulated none-block-device BDI, registered internally. So if you do need stable pages someone should call bdi_require_stable_pages() But again since it is a network filesystem I don't see how it is needed, and/or it might be taken care of already. #ifdef CONFIG_AFS_FSCACHE fscache_wait_on_page_write(vnode-cache, page); #endif - + wait_on_stable_page_write(page); _leave( = 0); - return 0; + return VM_FAULT_LOCKED; } diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c CEPH for sure has it's own emulated none-block-device BDI. This one is also a pure networking filesystem. And it already does what it needs to do with wait_on_writeback(). So i do not think you should touch CEPH index 6690269..e9734bf 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1208,6 +1208,7 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) set_page_dirty(page); up_read(mdsc-snap_rwsem); ret = VM_FAULT_LOCKED; + wait_on_stable_page_write(page); } else { if (ret == -ENOMEM) ret = VM_FAULT_OOM; diff --git a/fs/cifs/file.c b/fs/cifs/file.c Cifs also self-BDI network filesystem, but index edb25b4..a8770bf 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2997,6 +2997,7 @@ cifs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) struct page *page = vmf-page; lock_page(page); It waits by locking the page, that's cifs naive way of waiting for writeback + wait_on_stable_page_write(page); Instead it could do better and not override page_mkwrite at all, and all it needs to do is call bdi_require_stable_pages() at it's own registered BDI return VM_FAULT_LOCKED; } diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c index 47a87dd..a0027b1 100644 --- a/fs/ocfs2/mmap.c +++ b/fs/ocfs2/mmap.c @@ -124,6 +124,7 @@ static int __ocfs2_page_mkwrite(struct file *file, struct buffer_head *di_bh, fsdata); BUG_ON(ret != len); ret = VM_FAULT_LOCKED; + wait_on_stable_page_write(page); out: return ret; } diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index 5bc7781..cb0d3aa 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1522,8 +1522,8 @@ static int ubifs_vm_page_mkwrite(struct vm_area_struct *vma, ubifs_release_dirty_inode_budget(c, ui); } - unlock_page(page); - return 0; + wait_on_stable_page_write(page); ubifs has it's special ubi block device. So someone needs to call bdi_require_stable_pages() for this to work. I think that here too. The existing code, like cifs, calls page_lock, as a way of waiting for writeback. So this is certainly not finished. + return VM_FAULT_LOCKED; out_unlock: unlock_page(page); Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] fs: Fix remaining filesystems to wait for stable page writeback
On 11/01/2012 01:22 PM, Jeff Layton wrote: Hmm...I don't know... I've never been crazy about using the page lock for this, but in the absence of a better way to guarantee stable pages, it was what I ended up with at the time. cifs_writepages will hold the page lock until kernel_sendmsg returns. At that point the TCP layer will have copied off the page data so it's safe to release it. With this change though, we're going to end up blocking until the writeback flag clears, right? And I think that will happen when the reply comes in? So, we'll end up blocking for much longer than is really necessary in page_mkwrite with this change. Hmm OK, that is a very good point. In that case it is just a simple nack on Darrick's hunk to cifs. cifs is fine and should not be touched Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] bdi: Track users that require stable page writes
On 11/01/2012 11:57 AM, Darrick J. Wong wrote: On Thu, Nov 01, 2012 at 11:21:22AM -0700, Boaz Harrosh wrote: On 11/01/2012 12:58 AM, Darrick J. Wong wrote: This creates a per-backing-device counter that tracks the number of users which require pages to be held immutable during writeout. Eventually it will be used to waive wait_for_page_writeback() if nobody requires stable pages. There is two things I do not like: 1. Please remind me why we need the users counter? If the device needs it, it will always be needed. The below queue_unrequire_stable_pages call at blk_integrity_unregister only happens at device destruction time, no? It was also said that maybe individual filesystems would need stable pages where other FSs using the same BDI do not. But since the FS is at the driving seat in any case, it can just do wait_for_writeback() regardless and can care less about the bdi flag and the other FSs. Actually all those FSs already do this this, and do not need any help. So this reason is mute. The counter exists so that a filesystem can forcibly enable stable page writes even if the underlying device doesn't require it, because the generic fs/mm waiting only happens if stable_pages_required=1. The idea here was to allow a filesystem that needs stable page writes for its own purposes (i.e. data block checksumming) to be able to enforce the requirement even if the disk doesn't care (or doesn't exist). But the filesystem does not need BDI flag to do that, It can just call wait_on_page_writeback() directly and or any other waiting like cifs does, and this way will not affect any other partitions of the same BDI. So this flag is never needed by the FS, it is always to service the device. But maybe there are no such filesystems? Exactly, all the FSs that do care, already take care of it. shrug I don't really like the idea that you can dirty a page during writeout, which means that the disk could write the before page, the after page, or some weird mix of the two, and all you get is a promise that the after page will get rewritten if the power doesn't go out. Not to mention that it could happen again. :) I guess that's life. But what is the difference between a modification of a page that comes 1 nanosecond before the end_write_back, and another one that came 1 nano after end_write_back. To the disk it looks like the same load pattern. I do have in mind a way that we can minimize these redirty-while-writeback by 90% but I don't care enough (And am not paid) to fix it, so the contention is currently worse then what it could be. 2. I hate the atomic_read for every mkwrite. I think we can do better, since as you noted it can never be turned off, only on, at init time. And because of 1. above it is not dynamic. I think I like your previous simple bool better. I doubt the counter would change much; I could probably change it to something less heavyweight if it's really a problem. I hope you are convinced that a counter is not needed, and a simple bool like before is enough Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/16] f2fs: add on-disk layout
On 10/05/2012 10:54 AM, Dan Luedtke wrote: On Fri, 2012-10-05 at 20:56 +0900, 김재극 wrote: +__le32 i_atime; /* Access time */ +__le32 i_ctime; /* inode Change time */ +__le32 i_mtime; /* Modification time */ +__le32 i_btime; /* file creation time*/ It's probably to late, but have you considered using 64 bit timestamps? In LanyFS we use 64 bit and I'm very interested in reasoning for 32 bit timestamps. Yes and please don't make them millisecond resolution. This is a real headache. From the like of make to NFS to reboot/corruption recovery... the list is endless Please make it nanoseconds resolution. Regards Dan Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] exofs: drop lock/unlock super
On 10/06/2012 03:38 AM, Marco Stornelli wrote: Removed lock/unlock super. Hi Marco I was sure you guys where pushing this patch through some vfs tree. (Hence my Acked-by below). I have just sent Linus a pull request for the 3.7 Kernel. I could perhaps append this one and resend. I do want this patch, and it is completely independent and can go through my tree. Please tell me what do you want to do with this patch, should I push it? Or are you pushing it through Al Thanks Boaz Acked-by: Artem Bityutskiy artem.bityuts...@linux.intel.com Acked-by: Boaz Harrosh bharr...@panasas.com Signed-off-by: Marco Stornelli marco.storne...@gmail.com --- fs/exofs/super.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/fs/exofs/super.c b/fs/exofs/super.c index 59e3bbf..5e59280 100644 --- a/fs/exofs/super.c +++ b/fs/exofs/super.c @@ -389,8 +389,6 @@ static int exofs_sync_fs(struct super_block *sb, int wait) if (unlikely(ret)) goto out; - lock_super(sb); - ios-length = offsetof(struct exofs_fscb, s_dev_table_oid); memset(fscb, 0, ios-length); fscb-s_nextid = cpu_to_le64(sbi-s_nextid); @@ -406,8 +404,6 @@ static int exofs_sync_fs(struct super_block *sb, int wait) if (unlikely(ret)) EXOFS_ERR(%s: ore_write failed.\n, __func__); - - unlock_super(sb); out: EXOFS_DBGMSG(s_nextid=0x%llx ret=%d\n, _LLU(sbi-s_nextid), ret); ore_put_io_state(ios); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] VFS: add config options to enable link restrictions
On 10/26/2012 01:23 PM, Kees Cook wrote: Every distro will ship with this enabled (except perhaps Damn Vulnerable Linux), so why make it harder? So please remind me why can't it be on by default in code. And the normal sysctl to turn it off for these who want to experiment with filesystem corruption. So the basic premise is that you must not have any filesystem corruption at the parts used by boot up until the init portion that turns filesystem corruption on -Kees Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] SUNRPC: Prevent kernel stack corruption on long values of flush
On 07/18/2012 01:55 PM, Jim Rees wrote: Dave Jones wrote: Unsigned long isn't necessarily 32 bits. On 64-bit systems %lu can be up to 18446744073709551615 Thanks. You caught me thinking Intel. How embarrassing. What? why even on Intel-64 long is 64bit. long is always the same or bigger then a pointer (A pointer must always fit in a long) On the other hand int is 32bit in Intel-64 unlike some other CPUs where int(s) may get to be 64bit as well. Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: howto boost write(2) performance?
On Tue, Oct 09 2007 at 16:56 +0200, Andi Kleen [EMAIL PROTECTED] wrote: Michael Stiller [EMAIL PROTECTED] writes: The write(2) performance is not good enough, the writer threads take to much time, and i ask you for ideas, howto to boost the write performance. You could use an O_DIRECT write if the data is suitably aligned and your IO sizes are big enough (O_DIRECT is usually a loss on small IOs). It will also be synchronous, but if you do it from a separate thread anyways that should be fine. -Andi If your target is a SCSI target you can gain up to 15% by using sg. Search on the net for the sg utils package. The source code of sg_dd and others are a grate example of how to do it. Other wise O_DIRECT is your friend. Also look for asynchronous I/O so you have a few pending IO buffers to keep the pipes full. Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] blk_end_request: changing normal drivers
On Sat, Sep 01 2007 at 1:42 +0300, Kiyoshi Ueda [EMAIL PROTECTED] wrote: This patch converts normal drivers, which complete request in a standard way shown below, to use blk_end_request(). a) end_that_request_{chunk/first} spin_lock_irqsave() (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request()) end_that_request_last() spin_unlock_irqrestore() = blk_end_request() b) spin_lock_irqsave() end_that_request_{chunk/first} (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request()) end_that_request_last() spin_unlock_irqrestore() = spin_lock_irqsave() __blk_end_request() spin_unlock_irqsave() c) end_that_request_last() = __blk_end_request() Signed-off-by: Kiyoshi Ueda [EMAIL PROTECTED] Signed-off-by: Jun'ichi Nomura [EMAIL PROTECTED] --- arch/arm/plat-omap/mailbox.c|9 ++--- arch/um/drivers/ubd_kern.c | 10 +- block/elevator.c|4 ++-- block/ll_rw_blk.c | 15 +-- drivers/block/DAC960.c |6 ++ drivers/block/floppy.c |8 +++- drivers/block/lguest_blk.c |5 + drivers/block/nbd.c |4 +--- drivers/block/ps3disk.c |6 +- drivers/block/sunvdc.c |5 + drivers/block/sx8.c |4 +--- drivers/block/ub.c |4 ++-- drivers/block/viodasd.c |5 + drivers/block/xen-blkfront.c|5 ++--- drivers/cdrom/viocd.c |5 + drivers/ide/ide-cd.c|6 +++--- drivers/ide/ide-io.c| 22 +++--- drivers/message/i2o/i2o_block.c |8 ++-- drivers/mmc/card/block.c| 24 +--- drivers/mmc/card/queue.c|4 ++-- drivers/s390/block/dasd.c |4 +--- drivers/s390/char/tape_block.c |3 +-- drivers/scsi/ide-scsi.c |8 drivers/scsi/scsi_lib.c | 13 ++--- 24 files changed, 57 insertions(+), 130 deletions(-) diff -rupN 02-sect2byte-macro/arch/arm/plat-omap/mailbox.c 03-normal-caller-change/arch/arm/plat-omap/mailbox.c --- 02-sect2byte-macro/arch/arm/plat-omap/mailbox.c 2007-08-13 00:25:24.0 -0400 +++ 03-normal-caller-change/arch/arm/plat-omap/mailbox.c 2007-08-23 17:51:33.0 -0400 @@ -117,7 +117,8 @@ static void mbox_tx_work(struct work_str ... Please Separate this patch to at least three patches. 1. Block layer - block/elevator.c block/ll_rw_blk.c 2. SCSI-ML - drivers/scsi/scsi_lib.c 3. Normal drivers, can also be separated into logical groups like scsi/block etc.. This is because these patches introduce conflicts to lots of queued work I have, and if you separate them it will help me with my giting. Also I think that this is logical. Block-layer and scsi_lib are not drivers, the Subject of the patch does not match. Thanks Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: fix crash in gdth_timeout()
On Mon, Oct 15 2007 at 19:57 +0200, Jeff Garzik [EMAIL PROTECTED] wrote: Linus Torvalds wrote: On Mon, 15 Oct 2007, Ingo Molnar wrote: A further problem is probably that the GDTH timer is not stopped by a failed GDTH probe? Indeed. Maybe this is a better fix? That driver is pretty messy, and this should have been found ealier. James? Boaz? FWIW, the gdth driver was super-messy. With this latest SCSI pull, that severity has been successfully downgraded to messy :) IMO some easy-to-fix breakage was inevitable with such a large volume of fundamental changes. Honestly, the driver is probably rarely run by people that lack the hardware, I bet... Jeff It was all flight by instruments only. I called for HW testers and none came forward. All these changes, apart from successful downgrade to messy where also needed in order to push important changes to scsi. But a little bird said that QEMU might simulate this HW. SO I guess it is QEMU time for me. Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PATCH] SCSI updates for 2.6.24
On Tue, Oct 16 2007 at 8:49 +0200, Geert Uytterhoeven [EMAIL PROTECTED] wrote: On Mon, 15 Oct 2007, James Bottomley wrote: This is the accumulated updates queued for 2.6.24. It contains the usual slew of driver updates, plus some gdth and advansys rewrites. We still have some outstanding bugs in gdth and fc4 for which I'm hoping to sweep fixes into the next update. Boaz Harrosh (12): NCR5380: Use scsi_eh API for REQUEST_SENSE invocation This change broke compilation of the mac_scsi.c driver: | linux/drivers/scsi/NCR5380.c: In function 'NCR5380_information_transfer': | linux/drivers/scsi/NCR5380.c:2282: error: 'struct NCR5380_hostdata' has no member named 'ses' | linux/drivers/scsi/NCR5380.c:2283: error: 'struct NCR5380_hostdata' has no member named 'ses' | linux/drivers/scsi/NCR5380.c:2284: error: 'struct NCR5380_hostdata' has no member named 'ses' | linux/drivers/scsi/NCR5380.c:2288: error: 'struct NCR5380_hostdata' has no member named 'ses' Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED] In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds fix below, Sorry. From 1e10cf1a39ff4909696a5c7950367261b38b3345 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh [EMAIL PROTECTED] Date: Tue, 16 Oct 2007 10:25:01 +0200 Subject: [PATCH] scsi_mac.h: Define AUTOSENSE before include of NCR5380.h - Previese patch to NCR5380 broke scsi_mac because AUTOSENSE was defined after the inclusion of NCR5380.h. Fix it Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/mac_scsi.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/mac_scsi.c b/drivers/scsi/mac_scsi.c index cdbcaa5..abe2bda 100644 --- a/drivers/scsi/mac_scsi.c +++ b/drivers/scsi/mac_scsi.c @@ -53,6 +53,11 @@ #include scsi.h #include scsi/scsi_host.h #include mac_scsi.h + +/* These control the behaviour of the generic 5380 core */ +#define AUTOSENSE +#define PSEUDO_DMA + #include NCR5380.h #if 0 @@ -571,10 +576,6 @@ static int macscsi_pwrite (struct Scsi_Host *instance, } -/* These control the behaviour of the generic 5380 core */ -#define AUTOSENSE -#define PSEUDO_DMA - #include NCR5380.c static struct scsi_host_template driver_template = { -- 1.5.3.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-git8 missing include file - [Resending the patch]
On Tue, Oct 16 2007 at 8:27 +0200, Kamalesh Babulal [EMAIL PROTECTED] wrote: Kamalesh Babulal wrote: Hi, The build fails with following error CC drivers/usb/storage/scsiglue.o CC drivers/usb/storage/protocol.o CC drivers/usb/storage/transport.o In file included from drivers/usb/storage/transport.c:53: include/scsi/scsi_eh.h:79: error: field 'sense_sgl' has incomplete type make[3]: *** [drivers/usb/storage/transport.o] Error 1 make[2]: *** [drivers/usb/storage] Error 2 make[1]: *** [drivers/usb] Error 2 make: *** [drivers] Error 2 snip Signed-off-by: Kamalesh Babulal [EMAIL PROTECTED] -- --- linux-2.6.23/include/scsi/scsi_cmnd.h 2007-10-16 09:58:30.0 +0530 +++ linux-2.6.23/include/scsi/~scsi_cmnd.h 2007-10-16 10:20:32.0 +0530 @@ -5,6 +5,7 @@ #include linux/list.h #include linux/types.h #include linux/timer.h +#include linux/scatterlist.h struct request; struct scatterlist; This was already fixed here: http://www.spinics.net/lists/linux-scsi/msg20177.html Thanks - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
slight annoyance with new x86 arch
Make system is a bit broken in the sense that it recompiles and links right after a build even if nothing changed. I have # CONFIG_LOCALVERSION_AUTO is not set make output after a fully built tree: { // these are expected off course GEN /usr0/export/dev/bharrosh/git/pub/linux-2.6-block/.build_i386/Makefile CHK include/linux/version.h CHK include/linux/utsrelease.h Using /usr0/export/dev/bharrosh/git/pub/linux-2.6-block as source for kernel CALL /usr0/export/dev/bharrosh/git/pub/linux-2.6-block/scripts/checksyscalls.sh CHK include/linux/compile.h } { // these cause a recompilation and relink AS arch/x86/kernel/vsyscall-int80_32.o AS arch/x86/kernel/vsyscall-sysenter_32.o SYSCALL arch/x86/kernel/vsyscall-syms.o SYSCALL arch/x86/kernel/vsyscall-int80_32.so SYSCALL arch/x86/kernel/vsyscall-sysenter_32.so AS arch/x86/kernel/vsyscall_32.o } LD arch/x86/kernel/built-in.o GEN .version ... CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 KSYM .tmp_kallsyms1.S AS .tmp_kallsyms1.o LD .tmp_vmlinux2 KSYM .tmp_kallsyms2.S AS .tmp_kallsyms2.o LD .tmp_vmlinux3 KSYM .tmp_kallsyms3.S AS .tmp_kallsyms3.o LD vmlinux.o MODPOST vmlinux.o I'm cross compiling on an x86_64 fedora7 machine. with make ARCH=i386 KBUILD_OUTPUT=.build_i386 allmodconfig and make ARCH=i386 KBUILD_OUTPUT=.build_i386 Other wise after an mrproper all is well and booting and the dir structure looks much better Thanks Boaz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: failed to fetch the osd tree
That old server has finally crapped out on me. It is being replaced by a new machine. It will take few days to set up and run. So expect failure for the next few days Thanks Boaz On 11/04/13 02:43, Stephen Rothwell wrote: HI Boaz, On Tue, 26 Mar 2013 10:17:17 +1100 Stephen Rothwell s...@canb.auug.org.au wrote: Fetching the osd tree (git://git.open-osd.org/linux-open-osd.git#linux-next) produces a time out and this error: fatal: read error: Connection reset by peer I will continue to use the last osd tree I fetched. This is still happening :-( -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] ubifs: Wait for page writeback to provide stable pages
On 02/21/2013 05:48 AM, Darrick J. Wong wrote: On Wed, Jan 23, 2013 at 01:43:12PM -0800, Andrew Morton wrote: On Fri, 18 Jan 2013 17:13:16 -0800 Darrick J. Wong darrick.w...@oracle.com wrote: When stable pages are required, we have to wait if the page is just going to disk and we want to modify it. Add proper callback to ubifs_vm_page_mkwrite(). CC: Artem Bityutskiy dedeki...@gmail.com CC: Adrian Hunter adrian.hun...@intel.com CC: linux-...@lists.infradead.org From: Jan Kara j...@suse.cz Signed-off-by: Jan Kara j...@suse.cz Signed-off-by: Darrick J. Wong darrick.w...@oracle.com A couple of these patches had this From:Jan strangely embedded in the signoff area. I have assumed that they were indeed authored by Jan. Please note that authorship is indicated by putting the From: line right at the start of the chagnelog. I grabbed the patches. They should appear in linux-next tomorrow if I can get the current pooppile to build. Well... these patches have been banging around in -next for a month or so now. As far as I know there haven't been any complaints. Can we push these for 3.9? Yes, please I'm waiting for these patches as well. Lets push them this merge window. I was sure they would get in at 3.8, but they didn't. What's the delay? [Using this I can fix a theoretical raid corruption in exofs local access, which no one really cared because exofs is always accessed via pnfs, which does not have that bug] Thanks Boaz --D -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 3/6] bidi support: bidirectional request
- Instantiate another request_io_part in request for bidi_read. - Define Implement new API for accessing bidi parts. - API to Build bidi requests and map to sglists. - Define new end_that_request_block() function to end a complete request. Signed-off-by: Benny Halevy [EMAIL PROTECTED] Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- block/elevator.c |7 +--- block/ll_rw_blk.c | 113 ++-- include/linux/blkdev.h | 49 - 3 files changed, 149 insertions(+), 20 deletions(-) diff --git a/block/elevator.c b/block/elevator.c index 0e9ea69..53b552e 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -741,14 +741,9 @@ struct request *elv_next_request(request rq = NULL; break; } else if (ret == BLKPREP_KILL) { - int nr_bytes = rq_uni(rq)-hard_nr_sectors 9; - - if (!nr_bytes) - nr_bytes = rq_uni(rq)-data_len; - blkdev_dequeue_request(rq); rq-cmd_flags |= REQ_QUIET; - end_that_request_chunk(rq, 0, nr_bytes); + end_that_request_block(rq, 0); end_that_request_last(rq, 0); } else { printk(KERN_ERR %s: bad return=%d\n, __FUNCTION__, diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index 7d46f6a..1358b35 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -261,6 +261,7 @@ static void rq_init(request_queue_t *q, rq-end_io_data = NULL; rq-completion_data = NULL; rq_init_io_part(rq-uni); + rq_init_io_part(rq-bidi_read); } /** @@ -1312,15 +1313,16 @@ static int blk_hw_contig_segment(request } /* - * map a request to scatterlist, return number of sg entries setup. Caller - * must make sure sg can hold rq-nr_phys_segments entries + * map a request_io_part to scatterlist, return number of sg entries setup. + * Caller must make sure sg can hold rq_io(rq, dir)-nr_phys_segments entries */ -int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg) +int blk_rq_map_sg_bidi(request_queue_t *q, struct request *rq, + struct scatterlist *sg, enum dma_data_direction dir) { struct bio_vec *bvec, *bvprv; struct bio *bio; int nsegs, i, cluster; - struct request_io_part* req_io = rq_uni(rq); + struct request_io_part* req_io = rq_io(rq, dir); nsegs = 0; cluster = q-queue_flags (1 QUEUE_FLAG_CLUSTER); @@ -1362,6 +1364,17 @@ new_segment: return nsegs; } +EXPORT_SYMBOL(blk_rq_map_sg_bidi); + +/* + * map a request to scatterlist, return number of sg entries setup. Caller + * must make sure sg can hold rq-nr_phys_segments entries + */ +int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg) +{ + return blk_rq_map_sg_bidi(q, rq, sg, rq-data_dir); +} + EXPORT_SYMBOL(blk_rq_map_sg); /* @@ -2561,15 +2574,18 @@ int blk_rq_unmap_user(struct bio *bio) EXPORT_SYMBOL(blk_rq_unmap_user); /** - * blk_rq_map_kern - map kernel data to a request, for REQ_BLOCK_PC usage + * blk_rq_map_kern_bidi - maps kernel data to a request_io_part, for BIDI usage * @q: request queue where request should be inserted * @rq:request to fill * @kbuf: the kernel buffer * @len: length of user data * @gfp_mask: memory allocation flags + * @dir:if it is a BIDIRECTIONAL request than DMA_TO_DEVICE to prepare + * the bidi_write side or DMA_FROM_DEVICE to prepare the bidi_read + * side, else it should be same as req-data_dir */ -int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf, - unsigned int len, gfp_t gfp_mask) +int blk_rq_map_kern_bidi(request_queue_t *q, struct request *rq, void *kbuf, + unsigned int len, gfp_t gfp_mask, enum dma_data_direction dir) { struct bio *bio; @@ -2582,14 +2598,30 @@ int blk_rq_map_kern(request_queue_t *q, if (IS_ERR(bio)) return PTR_ERR(bio); - if (rq_uni_write_dir(rq) == WRITE) + if (dma_write_dir(dir)) bio-bi_rw |= (1 BIO_RW); - blk_rq_bio_prep(q, rq, bio); + blk_rq_bio_prep_bidi(q, rq, bio ,dir); rq-buffer = rq-data = NULL; return 0; } +EXPORT_SYMBOL(blk_rq_map_kern_bidi); + +/** + * blk_rq_map_kern - map kernel data to a request, for REQ_BLOCK_PC usage + * @q: request queue where request should be inserted + * @rq:request to fill + * @kbuf: the kernel buffer + * @len: length of user data + * @gfp_mask: memory allocation flags + */ +int blk_rq_map_kern(request_queue_t *q, struct request *rq, void *kbuf, + unsigned int len, gfp_t gfp_mask) +{ + return blk_rq_map_kern_bidi( q, rq, kbuf, len, gfp_mask, rq-data_dir
[RFC 4/6] bidi support: bidirectional SCSI layer
- Add bidi members to struct scsi_cmnd. - Add API at the SCSI level for bidirectional commands. - Implement support for BIDI requests in scsi_setup_blk_pc_cmnd() and scsi_io_completion(). Signed-off-by: Benny Halevy [EMAIL PROTECTED] Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/scsi_lib.c| 324 +++- include/scsi/scsi_cmnd.h | 55 +++- include/scsi/scsi_device.h | 16 ++ 3 files changed, 330 insertions(+), 65 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 9c21025..92d3d44 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -246,27 +246,28 @@ static void scsi_end_async(struct reques { struct scsi_io_context *sioc = req-end_io_data; - if (sioc-done) - sioc-done(sioc-data, sioc-sense, req-errors, rq_uni(req)-data_len); + if (sioc-done) /* FIXME: API of done needs to change for bidi requests*/ + sioc-done(sioc-data, sioc-sense, req-errors, req-uni.data_len); kmem_cache_free(scsi_io_context_cache, sioc); __blk_put_request(req-q, req); } -static int scsi_merge_bio(struct request *rq, struct bio *bio) +static int scsi_merge_bio(struct request *rq, struct bio *bio, + enum dma_data_direction dir) { struct request_queue *q = rq-q; - struct request_io_part* req_io = rq_uni(rq); + struct request_io_part* req_io = rq_io(rq, dir); bio-bi_flags = ~(1 BIO_SEG_VALID); - if (rq_uni_write_dir(rq)) + if (dir == DMA_TO_DEVICE) bio-bi_rw |= (1 BIO_RW); else bio-bi_rw = ~(1 BIO_RW); blk_queue_bounce(q, bio); - if (!rq_uni(rq)-bio) - blk_rq_bio_prep(q, rq, bio); + if (!req_io-bio) + blk_rq_bio_prep_bidi(q, rq, bio, dir); else if (!ll_back_merge_fn(q, rq, bio)) return -EINVAL; else { @@ -299,7 +300,7 @@ static int scsi_bi_endio(struct bio *bio * sent to use, as some ULDs use that struct to only organize the pages. */ static int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl, - int nsegs, unsigned bufflen, gfp_t gfp) + int nsegs, unsigned bufflen, gfp_t gfp, enum dma_data_direction dir) { struct request_queue *q = rq-q; int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) PAGE_SHIFT; @@ -337,7 +338,7 @@ static int scsi_req_map_sg(struct reques } if (bio-bi_vcnt = nr_vecs) { - err = scsi_merge_bio(rq, bio); + err = scsi_merge_bio(rq, bio ,dir); if (err) { bio_endio(bio, bio-bi_size, 0); goto free_bios; @@ -352,12 +353,12 @@ static int scsi_req_map_sg(struct reques } rq-buffer = rq-data = NULL; - rq_uni(rq)-data_len = data_len; + rq_io(rq,dir)-data_len = data_len; return 0; free_bios: - while ((bio = rq_uni(rq)-bio) != NULL) { - rq_uni(rq)-bio = bio-bi_next; + while ((bio = rq_io(rq,dir)-bio) != NULL) { + rq_io(rq,dir)-bio = bio-bi_next; /* * call endio instead of bio_put incase it was bounced */ @@ -385,31 +386,89 @@ int scsi_execute_async(struct scsi_devic int use_sg, int timeout, int retries, void *privdata, void (*done)(void *, char *, int, int), gfp_t gfp) { + struct scsi_cmnd_buff buff; + + buff.use_sg = use_sg; + buff.buffer = buffer; + buff.bufflen = bufflen; + + return scsi_execute_bidi_async( + sdev, (unsigned char *)cmd, cmd_len, data_direction, + buff, NULL, + timeout, retries, privdata, done, gfp ); +} +EXPORT_SYMBOL_GPL(scsi_execute_async); + +/** + * scsi_execute_bidi_async - insert bidi request, don't wait + * @sdev: scsi device + * @cmd: scsi command + * @cmd_len: length of scsi cdb + * @data_direction: data direction + * @bidi_write_buff.buffer:data buffer (this can be a kernel buffer or scatterlist) + * @bidi_write_buff.bufflen: len of buffer + * @bidi_write_buff.use_sg:if buffer is a scatterlist this is the number of elements + * @bidi_read_buff: same as above bidi_write_buff but for the bidi read part + * @timeout: request timeout in jiffies + * @retries: number of times to retry request + * @privdata: user data passed to done function + * @done: pointer to done function called at io completion. + * signature: void done(void *user_data, char *sence, int errors, int data_bytes_advanced) + * @gfp: gfp allocation flags + **/ +int scsi_execute_bidi_async(struct scsi_device *sdev, + unsigned char *cmd, int cmd_len, int data_direction
[RFC 5/6] bidi support: varlen + OSD support
- Add support for varlen CDBs or large vendor specific commands in struct request. - Add support for above at SCSI level API's and devices. - Add the OSD device type. Signed-off-by: Benny Halevy [EMAIL PROTECTED] Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- block/ll_rw_blk.c |2 ++ drivers/scsi/scsi.c | 11 --- drivers/scsi/scsi_debug.c |6 +- drivers/scsi/scsi_lib.c | 24 +--- drivers/scsi/scsi_scan.c |1 + include/linux/blkdev.h|6 ++ include/scsi/scsi.h |2 ++ include/scsi/scsi_cmnd.h | 20 +++- include/scsi/scsi_host.h |8 +++- 9 files changed, 67 insertions(+), 13 deletions(-) diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index 1358b35..04bc43e 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -256,6 +256,8 @@ static void rq_init(request_queue_t *q, rq-q = q; rq-special = NULL; rq-data = NULL; + rq-varlen_cdb_len = 0; + rq-varlen_cdb = NULL; rq-sense = NULL; rq-end_io = NULL; rq-end_io_data = NULL; diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 24cffd9..f835496 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -485,6 +485,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd * unsigned long flags = 0; unsigned long timeout; int rtn = 0; + int cdb_size; /* check if the device is still usable */ if (unlikely(cmd-device-sdev_state == SDEV_DEL)) { @@ -566,9 +567,13 @@ int scsi_dispatch_cmd(struct scsi_cmnd * * Before we queue this command, check if the command * length exceeds what the host adapter can handle. */ - if (CDB_SIZE(cmd) cmd-device-host-max_cmd_len) { - SCSI_LOG_MLQUEUE(3, - printk(queuecommand : command too long.\n)); + cdb_size = cmd-request-varlen_cdb ? + cmd-request-varlen_cdb_len : COMMAND_SIZE(cmd-cmnd[0]); + if (cdb_size cmd-device-host-max_cmd_len) { + SCSI_LOG_MLQUEUE(0, + printk(queuecommand : command too long. + cdb_size(%d) host-max_cmd_len(%d)\n, + cdb_size, cmd-device-host-max_cmd_len)); cmd-result = (DID_ABORT 16); scsi_done(cmd); diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index 30ee3d7..8520873 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -1993,8 +1993,12 @@ static int scsi_debug_slave_configure(st if (SCSI_DEBUG_OPT_NOISE scsi_debug_opts) printk(KERN_INFO scsi_debug: slave_configure %u %u %u %u\n, sdp-host-host_no, sdp-channel, sdp-id, sdp-lun); - if (sdp-host-max_cmd_len != SCSI_DEBUG_MAX_CMD_LEN) + if (sdp-host-max_cmd_len SCSI_DEBUG_MAX_CMD_LEN) { + printk(KERN_INFO + scsi_debug: max_cmd_len(%d) SCSI_DEBUG_MAX_CMD_LEN\n, + sdp-host-max_cmd_len); sdp-host-max_cmd_len = SCSI_DEBUG_MAX_CMD_LEN; + } devip = devInfoReg(sdp); sdp-hostdata = devip; if (sdp-host-cmd_per_lun) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 92d3d44..d672ade 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -189,8 +189,17 @@ int scsi_execute(struct scsi_device *sde buffer, bufflen, __GFP_WAIT)) goto out; - req-cmd_len = COMMAND_SIZE(cmd[0]); + if (cmd[0] == VARLEN_CDB) { + req-varlen_cdb_len = scsi_varlen_cdb_length(cmd); + req-varlen_cdb = (unsigned char *)cmd; /* varlen cmd is pointed to */ + } + req-cmd_len = min( + req-varlen_cdb_len ? req-varlen_cdb_len : COMMAND_SIZE(cmd[0]), + MAX_COMMAND_SIZE); memcpy(req-cmd, cmd, req-cmd_len); + if (req-cmd_len MAX_COMMAND_SIZE) + memset(req-cmd[req-cmd_len], 0, MAX_COMMAND_SIZE-req-cmd_len); + req-sense = sense; req-sense_len = 0; req-retries = retries; @@ -452,8 +461,6 @@ int scsi_execute_bidi_async(struct scsi_ if (err) goto free_req; - req-cmd_len = cmd_len; - memset(req-cmd, 0, BLK_MAX_CDB); /* ATAPI hates garbage after CDB */ if (is_bidi) { if (bidi_read_buff-use_sg) err = scsi_req_map_sg( @@ -469,7 +476,18 @@ int scsi_execute_bidi_async(struct scsi_ } } + BUG_ON( (cmd[0]==VARLEN_CDB) (scsi_varlen_cdb_length(cmd) cmd_len) ); + /* Unlike scsi_excute, scsi_execute_bidi_xxx supports big commands that are not VARLEN_CDB */ + if ((cmd[0] == VARLEN_CDB) || (cmd_len MAX_COMMAND_SIZE)) { + req-varlen_cdb_len = cmd_len; + req-varlen_cdb = cmd; + } + req-cmd_len
[RFC 6/6] bidi support: iSCSI bidi varlen CDBs
This patch is intended to provide a working example of a use case for bidi and varlen CDBs. The actual patches will be sent via the open-iscsi project. - Use proposed SCSI implementation for iSCSI bidirectional commands. - Use proposed block layer implementation for iSCSI extended CDBs. - Dynamically build AHSs for extended cdbs and bidirectional requests. - Follow iscsi rfc-3720 concerning datasn and r2tsn with bidirectional commands, these must be the same counter. [- Remove check for first-burst bigger than max-burst so iSCSI regression tests can pass. this is the wrong fix and will be removed in actual patches] Signed-off-by: Benny Halevy [EMAIL PROTECTED] Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/infiniband/ulp/iser/iscsi_iser.c |4 +- drivers/scsi/iscsi_tcp.c | 190 +++--- drivers/scsi/iscsi_tcp.h | 10 +- drivers/scsi/libiscsi.c | 33 +++-- include/scsi/iscsi_proto.h |8 ++ include/scsi/libiscsi.h | 18 +++- 6 files changed, 200 insertions(+), 63 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index dd221ed..a0eae0c 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -140,10 +140,10 @@ iscsi_iser_cmd_init(struct iscsi_cmd_tas iser_ctask-iser_conn= iser_conn; if (sc-sc_data_direction == DMA_TO_DEVICE) { - BUG_ON(ctask-total_length == 0); + BUG_ON(iscsi_out_total_length(ctask) == 0); debug_scsi(cmd [itt %x total %d imm %d unsol_data %d\n, - ctask-itt, ctask-total_length, ctask-imm_count, + ctask-itt, iscsi_out_total_length(ctask), ctask-imm_count, ctask-unsol_count); } diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index d0b139c..2bc57a5 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -109,7 +109,9 @@ iscsi_hdr_digest(struct iscsi_conn *conn struct iscsi_tcp_conn *tcp_conn = conn-dd_data; crypto_hash_digest(tcp_conn-tx_hash, buf-sg, buf-sg.length, crc); - buf-sg.length = tcp_conn-hdr_size; + buf-sg.length += sizeof(__u32); + debug_tcp(iscsi_hdr_digest: crc %p crc 0x%02x%02x%02x%02x\n, + crc, crc[0], crc[1], crc[2], crc[3]); } static inline int @@ -229,14 +231,21 @@ iscsi_data_rsp(struct iscsi_conn *conn, if (tcp_conn-in.datalen == 0) return 0; - if (ctask-datasn != datasn) + if (tcp_ctask-exp_datasn != datasn) { + debug_tcp(%s: ctask-datasn(%d) != rhdr-datasn(%d)\n, + __FUNCTION__, tcp_ctask-exp_datasn, datasn); return ISCSI_ERR_DATASN; + } - ctask-datasn++; + tcp_ctask-exp_datasn++; tcp_ctask-data_offset = be32_to_cpu(rhdr-offset); - if (tcp_ctask-data_offset + tcp_conn-in.datalen ctask-total_length) + if (tcp_ctask-data_offset + tcp_conn-in.datalen iscsi_in_total_length(ctask)) { + debug_tcp(%s: data_offset(%d) + data_len(%d) total_length_in(%d)\n, + __FUNCTION__, tcp_ctask-data_offset, + tcp_conn-in.datalen, iscsi_in_total_length(ctask)); return ISCSI_ERR_DATA_OFFSET; + } if (rhdr-flags ISCSI_FLAG_DATA_STATUS) { struct scsi_cmnd *sc = ctask-sc; @@ -246,7 +255,7 @@ iscsi_data_rsp(struct iscsi_conn *conn, int res_count = be32_to_cpu(rhdr-residual_count); if (res_count 0 - res_count = sc-request_bufflen) { + res_count = iscsi_in_total_length(ctask)) { sc-resid = res_count; sc-result = (DID_OK 16) | rhdr-cmd_status; } else @@ -281,6 +290,7 @@ iscsi_solicit_data_init(struct iscsi_con { struct iscsi_data *hdr; struct scsi_cmnd *sc = ctask-sc; + struct scsi_cmnd_buff scb; hdr = r2t-dtask.hdr; memset(hdr, 0, sizeof(struct iscsi_data)); @@ -308,12 +318,13 @@ iscsi_solicit_data_init(struct iscsi_con iscsi_buf_init_iov(r2t-headbuf, (char*)hdr, sizeof(struct iscsi_hdr)); - if (sc-use_sg) { + scsi_get_out_buff(sc, scb); + if (scb.use_sg) { int i, sg_count = 0; - struct scatterlist *sg = sc-request_buffer; + struct scatterlist *sg = scb.buffer; r2t-sg = NULL; - for (i = 0; i sc-use_sg; i++, sg += 1) { + for (i = 0; i scb.use_sg; i++, sg += 1) { /* FIXME: prefetch ? */ if (sg_count + sg-length r2t-data_offset) { int
[RFC 0/6] bidi support: block, SCSI, and iSCSI layers
Following are 6 (large) patches that introduce support for bidirectional requests in the kernel. Since all this is going to interfere with everyone's work, let us all comment on the implementation, naming, and future directions. (or forever hold your peace). The submitted work is against linux-2.6-block tree of 2007/01/15. It compiles with make allmodconfig in both i386 and x86_64, and it boots and runs on my x86_64 test machines. The target kernel is able to compile a Linux-kernel, burn and read cd, and pass iSCSI and OSD tests. OSD tests are the ones who exercise BIDI I/O. Patches summary: 1. data direction - add support for enum dma_data_direction in struct request in preparation to full bidi support. - removed rq_data_dir() and added other APIs for querying request's direction. - fix usage of rq_data_dir() and peeking at req-cmd_flags REQ_RW to using new api. - clean-up bad usage of DMA_BIDIRECTIONAL and bzero of black requests, to use the new blk_rq_init_unqueued_req() 2. request_io_part - extract io related fields in struct request into struct request_io_part in preparation to full bidi support. - use new API for accessing the new sub-structure 3. bidi request - add one more request_io_part member for bidi support in struct request. - add block layer API functions for mapping and accessing bidi data buffers and for ending a block request as a whole (end_that_request_block()) 4. bidi-scsi - add bidi support at the scsi layer - new scsi api functions to generate bidi capable requests (scsi_execute_bidi{,_async}) - new scsi api for accessing scsi_cmnd_buff (similar to request_io_part but not yet implemented as a sub-structure) 5. varlen cdb - add support for variable length (or just longer than 16 bytes) SCSI CDBs - add scsi device type for OSD devices (will be separated in actual patchset) 6. iscsi bidi varlen - add support for bidi and variable length commands at the iscsi layer for the tcp transport. (will also be separated and sent through the open-iscsi project) Developer comments: (long, can be skipped) 1. This part borrows from struct scsi_cmnd use of an enum dma_data_direction member, to keep everything the same. 1.A. It could be done in a more backward compatible way but it was a good chance for some clean-up (see the mess with BIO flags that used to be the same but no longer are). I also believe that two ways to describe the same thing will always come to haunt you later. 1.B. Speaking of which, the PCI_DMA_XXX constants are a pain being the same constants but #defined and untyped. It confused me in my bidi-bug hunting and it seems I'm not the only one. I would suggest boldly inverting enum dma_data_direction from its active-low DMA logic and change its name to just enum data_direction. Then, define the PCI_DMA_XXX constants as a new enum, and accept that new enum at all the dma_XXX mapping APIs, instead of the current u32. Note that this is cardinal now because bidirectional means __different_things__ in struct request and in DMA (or memory) mapping. the first means bidirectional access to __same__ buffer, the later means __two_sets__ of buffers at one request. (each buffer mapped for uni-directional access). 2. Separation of IO members into two sub structures: There are 2 possible approaches here and each approach can have a few implementations. 2.A. First approach is to keep everything backward compatible and have a bidi sub-structure on the side. This approach can be seen in the attached patch for bidi support in SCSI layer. It keeps the patch to a minimum but is hell on readability maintenance. It also puts a performance penalty on users like iscsi who wants to use a generic bidi-api. 2.B. Second approach is what is seen here. Separate all I/O members to a sub-structure, craft an API to access each part and a UNI api that wants to access buffers knowing they are uni- directional. 3.C. The second approach can be implemented as I did it: one member for uni/bidi-write and second for bidi-read or alternatively, using one sub-structure for read and another for write. I hope I have built it in such a way that this choice is merely an implementation detail only concerning the block layer and hidden behind proper access functions. 3.D. I have currently decided on the first approach for performance reasons, since 99.99% of kernel is uni-directional (only exception is proposed iscsi). What will be in the future? I'm not sure, If most bus protocols are like iSCSI (and SCSI) where there is a clear separation, and concurrency, between write and read states. then second approach will be the logical choice. Or maybe it is all exact
[RFC 2/6] bidi support: request_io_part
The patch was probably too big and did not go through the mailing list. I have compressed and attached it, is that OK or must I put it online somewhere? - Extract all I/O members of struct request into a request_io_part member. - Define API to access the I/O part - Adjust block layer accordingly. - Change all users to new API. At this stage it is all still uni-directional but the intention is clear. Signed-off-by: Benny Halevy [EMAIL PROTECTED] Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- arch/um/drivers/ubd_kern.c | 12 +- block/as-iosched.c | 16 ++-- block/cfq-iosched.c | 14 ++-- block/deadline-iosched.c|6 +- block/elevator.c| 26 +++--- block/ll_rw_blk.c | 207 --- block/scsi_ioctl.c |4 +- drivers/acorn/block/fd1772.c| 18 ++-- drivers/acorn/block/mfmhd.c | 23 +++-- drivers/block/DAC960.c |6 +- drivers/block/amiflop.c | 10 +- drivers/block/ataflop.c | 18 ++-- drivers/block/cciss.c | 26 +++--- drivers/block/cpqarray.c| 12 +- drivers/block/floppy.c | 50 +- drivers/block/nbd.c | 14 ++-- drivers/block/paride/pcd.c |4 +- drivers/block/paride/pd.c |8 +- drivers/block/paride/pf.c |6 +- drivers/block/ps2esdi.c | 10 +- drivers/block/swim3.c | 32 +++--- drivers/block/sx8.c |8 +- drivers/block/ub.c | 18 ++-- drivers/block/viodasd.c |8 +- drivers/block/xd.c |4 +- drivers/block/z2ram.c |6 +- drivers/cdrom/aztcd.c | 34 +++--- drivers/cdrom/cdrom.c |2 +- drivers/cdrom/cdu31a.c |4 +- drivers/cdrom/cm206.c |6 +- drivers/cdrom/gscd.c| 20 ++-- drivers/cdrom/mcdx.c| 12 +- drivers/cdrom/optcd.c | 32 +++--- drivers/cdrom/sbpcd.c | 26 +++--- drivers/cdrom/sjcd.c| 32 +++--- drivers/cdrom/sonycd535.c |4 +- drivers/cdrom/viocd.c |4 +- drivers/ide/ide-cd.c| 189 ++- drivers/ide/ide-disk.c |6 +- drivers/ide/ide-dma.c |6 +- drivers/ide/ide-floppy.c| 16 ++-- drivers/ide/ide-io.c| 22 ++-- drivers/ide/ide-lib.c |2 +- drivers/ide/ide-tape.c | 32 +++--- drivers/ide/ide-taskfile.c | 21 +++-- drivers/ide/legacy/hd.c | 26 +++--- drivers/ide/pci/pdc202xx_old.c |2 +- drivers/md/dm-emc.c |3 +- drivers/message/i2o/i2o_block.c | 22 ++-- drivers/mmc/mmc_block.c |6 +- drivers/mtd/mtd_blkdevs.c |4 +- drivers/s390/block/dasd.c |2 +- drivers/s390/block/dasd_diag.c |4 +- drivers/s390/block/dasd_eckd.c |6 +- drivers/s390/block/dasd_fba.c |6 +- drivers/s390/char/tape_34xx.c |2 +- drivers/s390/char/tape_3590.c |2 +- drivers/s390/char/tape_block.c |4 +- drivers/sbus/char/jsflash.c |4 +- drivers/scsi/eata.c | 22 ++-- drivers/scsi/ide-scsi.c |2 +- drivers/scsi/scsi_lib.c | 43 drivers/scsi/scsi_tgt_lib.c | 16 ++-- drivers/scsi/sd.c | 16 ++-- drivers/scsi/sr.c | 14 ++-- drivers/scsi/u14-34f.c | 16 ++-- include/linux/blkdev.h | 99 --- include/linux/blktrace_api.h|7 +- include/linux/elevator.h|2 +- 69 files changed, 707 insertions(+), 659 deletions(-) Total size :148.5 KB Compressed Size: 36.6 KB - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 2/6] bidi support: request_io_part
The patch was probably too big and did not go through the mailing list. I have compressed and attached it, is that OK or must I put it online somewhere? - Extract all I/O members of struct request into a request_io_part member. - Define API to access the I/O part - Adjust block layer accordingly. - Change all users to new API. At this stage it is all still uni-directional but the intention is clear. Signed-off-by: Benny Halevy [EMAIL PROTECTED] Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- arch/um/drivers/ubd_kern.c | 12 +- block/as-iosched.c | 16 ++-- block/cfq-iosched.c | 14 ++-- block/deadline-iosched.c|6 +- block/elevator.c| 26 +++--- block/ll_rw_blk.c | 207 --- block/scsi_ioctl.c |4 +- drivers/acorn/block/fd1772.c| 18 ++-- drivers/acorn/block/mfmhd.c | 23 +++-- drivers/block/DAC960.c |6 +- drivers/block/amiflop.c | 10 +- drivers/block/ataflop.c | 18 ++-- drivers/block/cciss.c | 26 +++--- drivers/block/cpqarray.c| 12 +- drivers/block/floppy.c | 50 +- drivers/block/nbd.c | 14 ++-- drivers/block/paride/pcd.c |4 +- drivers/block/paride/pd.c |8 +- drivers/block/paride/pf.c |6 +- drivers/block/ps2esdi.c | 10 +- drivers/block/swim3.c | 32 +++--- drivers/block/sx8.c |8 +- drivers/block/ub.c | 18 ++-- drivers/block/viodasd.c |8 +- drivers/block/xd.c |4 +- drivers/block/z2ram.c |6 +- drivers/cdrom/aztcd.c | 34 +++--- drivers/cdrom/cdrom.c |2 +- drivers/cdrom/cdu31a.c |4 +- drivers/cdrom/cm206.c |6 +- drivers/cdrom/gscd.c| 20 ++-- drivers/cdrom/mcdx.c| 12 +- drivers/cdrom/optcd.c | 32 +++--- drivers/cdrom/sbpcd.c | 26 +++--- drivers/cdrom/sjcd.c| 32 +++--- drivers/cdrom/sonycd535.c |4 +- drivers/cdrom/viocd.c |4 +- drivers/ide/ide-cd.c| 189 ++- drivers/ide/ide-disk.c |6 +- drivers/ide/ide-dma.c |6 +- drivers/ide/ide-floppy.c| 16 ++-- drivers/ide/ide-io.c| 22 ++-- drivers/ide/ide-lib.c |2 +- drivers/ide/ide-tape.c | 32 +++--- drivers/ide/ide-taskfile.c | 21 +++-- drivers/ide/legacy/hd.c | 26 +++--- drivers/ide/pci/pdc202xx_old.c |2 +- drivers/md/dm-emc.c |3 +- drivers/message/i2o/i2o_block.c | 22 ++-- drivers/mmc/mmc_block.c |6 +- drivers/mtd/mtd_blkdevs.c |4 +- drivers/s390/block/dasd.c |2 +- drivers/s390/block/dasd_diag.c |4 +- drivers/s390/block/dasd_eckd.c |6 +- drivers/s390/block/dasd_fba.c |6 +- drivers/s390/char/tape_34xx.c |2 +- drivers/s390/char/tape_3590.c |2 +- drivers/s390/char/tape_block.c |4 +- drivers/sbus/char/jsflash.c |4 +- drivers/scsi/eata.c | 22 ++-- drivers/scsi/ide-scsi.c |2 +- drivers/scsi/scsi_lib.c | 43 drivers/scsi/scsi_tgt_lib.c | 16 ++-- drivers/scsi/sd.c | 16 ++-- drivers/scsi/sr.c | 14 ++-- drivers/scsi/u14-34f.c | 16 ++-- include/linux/blkdev.h | 99 --- include/linux/blktrace_api.h|7 +- include/linux/elevator.h|2 +- 69 files changed, 707 insertions(+), 659 deletions(-) Total size :148.5 KB Compressed Size: 36.6 KB 2of6-request_io_part.patch.tar.gz Description: GNU Zip compressed data
Re: [ 16/46] NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
On 09/17/2012 04:05 PM, Myklebust, Trond wrote: -Original Message- From: Greg Kroah-Hartman [mailto:gre...@linuxfoundation.org] Sent: Sunday, September 16, 2012 12:37 PM To: Ben Hutchings Cc: Myklebust, Trond; linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Boaz Harrosh; Tigran Mkrtchyan Subject: Re: [ 16/46] NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done On Sun, Sep 16, 2012 at 05:33:03PM +0100, Ben Hutchings wrote: On Wed, 2012-09-12 at 16:39 -0700, Greg Kroah-Hartman wrote: From: Greg KH gre...@linuxfoundation.org 3.0-stable review patch. If anyone has any objections, please let me know. -- From: Trond Myklebust trond.mykleb...@netapp.com commit 47fbf7976e0b7d9dcdd799e2a1baba19064d9631 upstream. Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence disconnected data server) we've been sending layoutreturn calls while there is potentially still outstanding I/O to the data servers. The reason we do this is to avoid races between replayed writes to the MDS and the original writes to the DS. When this happens, the BUG_ON() in nfs4_layoutreturn_done can be triggered because it assumes that we would never call layoutreturn without knowing that all I/O to the DS is finished. The fix is to remove the BUG_ON() now that the assumptions behind the test are obsolete. Reported-by: Boaz Harrosh bharr...@panasas.com Reported-by: Tigran Mkrtchyan tigran.mkrtch...@desy.de Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org [...] The upstream commit has: Cc: sta...@vger.kernel.org [=3.5] and so I ignored it for 3.2. Is it actually needed for the earlier stable series? Crud, I missed that somehow :( Trond, should I revert this in 3.0 and 3.4 stable kernels? Hi Greg, Applying it to those kernels should be unnecessary but harmless, so if you've already applied them then I'd say just keep them. Cheers Trond Trond hi I do hit this with objects layout also in 3.2. I know that in files-layout it only hits post 3.5 But we've been using layout-return since 3.0 Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 12/25] userns: Convert exofs to use kuid/kgid where appropriate
On 09/20/2012 02:41 PM, Eric W. Biederman wrote: From: Eric W. Biederman ebied...@xmission.com Cc: Boaz Harrosh bharr...@panasas.com Cc: Benny Halevy bhal...@tonian.com Acked-by: Serge Hallyn serge.hal...@canonical.com Acked-by: Boaz Harrosh bharr...@panasas.com Signed-off-by: Eric W. Biederman ebied...@xmission.com --- fs/exofs/inode.c |8 init/Kconfig |1 - 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c index 5badb0c..190c3d6 100644 --- a/fs/exofs/inode.c +++ b/fs/exofs/inode.c @@ -1163,8 +1163,8 @@ struct inode *exofs_iget(struct super_block *sb, unsigned long ino) /* copy stuff from on-disk struct to in-memory struct */ inode-i_mode = le16_to_cpu(fcb.i_mode); - inode-i_uid = le32_to_cpu(fcb.i_uid); - inode-i_gid = le32_to_cpu(fcb.i_gid); + i_uid_write(inode, le32_to_cpu(fcb.i_uid)); + i_gid_write(inode, le32_to_cpu(fcb.i_gid)); set_nlink(inode, le16_to_cpu(fcb.i_links_count)); inode-i_ctime.tv_sec = (signed)le32_to_cpu(fcb.i_ctime); inode-i_atime.tv_sec = (signed)le32_to_cpu(fcb.i_atime); @@ -1376,8 +1376,8 @@ static int exofs_update_inode(struct inode *inode, int do_sync) fcb = args-fcb; fcb-i_mode = cpu_to_le16(inode-i_mode); - fcb-i_uid = cpu_to_le32(inode-i_uid); - fcb-i_gid = cpu_to_le32(inode-i_gid); + fcb-i_uid = cpu_to_le32(i_uid_read(inode)); + fcb-i_gid = cpu_to_le32(i_gid_read(inode)); fcb-i_links_count = cpu_to_le16(inode-i_nlink); fcb-i_ctime = cpu_to_le32(inode-i_ctime.tv_sec); fcb-i_atime = cpu_to_le32(inode-i_atime.tv_sec); diff --git a/init/Kconfig b/init/Kconfig index 51084b0..39e55d6 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -941,7 +941,6 @@ config UIDGID_CONVERTED depends on CEPH_FS = n depends on CIFS = n depends on CODA_FS = n - depends on EXOFS_FS = n depends on FUSE_FS = n depends on GFS2_FS = n depends on HFS_FS = n -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)
On 07/25/2012 08:43 PM, Paolo Bonzini wrote: Il 25/07/2012 17:28, Boaz Harrosh ha scritto: 1) what I get is a scsi_cmnd which contains an N-element scatterlist. 2) virtio-scsi has to build the packet that is passed to the hardware (it does not matter that the hardware is virtual). This packet (per virtio-scsi spec) has an N+1-element scatterlist, where the first element is a request descriptor (struct virtio_scsi_cmd_req), and the others describe the written data. Then virtio-scsi spec is crap. It overloads the meaning of struct scatterlist of the first element in an array. to be a struct virtio_scsi_cmd_req. What the holy fuck? The first element simply _points_ to the struct virtio_scsi_cmd_req, just like subsequent elements point to the data. And the protocol of the device is _not_ a struct scatterlist[]. The virtio _API_ takes that array and converts to a series of physical address + offset pairs. Since you need to change the standard to support chaining then it is a good time to fix this. Perhaps it is a good time for you to read the virtio spec. You are making a huge confusion between the LLD-virtio interface and the virtio-hardware interface. I'm talking only of the former. 3) virtio takes care of converting the packet from a scatterlist (which currently must be a flat one) to the hardware representation. Here a walk is inevitable, so we don't care about this walk. hardware representation you mean aio or biovec, what ever the IO submission path uses at the host end? No, I mean the way the virtio spec encodes the physical address + offset pairs. I stopped reading here. The picture confused me. It looked like the first element is the virtio_scsi_cmd_req not an sgilist-element that points to the struct's buffer. In that case then yes your plan of making a two-elements fragment that points to the original scsi-sglist is perfect. All you have to do is that, and all you have to do at virtio is use the sg_for_each macro and you are done. You don't need any sglist allocation or reshaping. And you can easily support chaining. Looks like order of magnitude more simple then what you do now So what is the problem? And BTW you won't need that new __sg_set_page API anymore. Paolo Cheers Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)
On 07/25/2012 11:06 PM, Paolo Bonzini wrote: Il 25/07/2012 21:16, Boaz Harrosh ha scritto: The picture confused me. It looked like the first element is the virtio_scsi_cmd_req not an sgilist-element that points to the struct's buffer. In that case then yes your plan of making a two-elements fragment that points to the original scsi-sglist is perfect. All you have to do is that, and all you have to do at virtio is use the sg_for_each macro and you are done. You don't need any sglist allocation or reshaping. And you can easily support chaining. Looks like order of magnitude more simple then what you do now It is. So what is the problem? That not all architectures have ARCH_HAS_SG_CHAIN (though all those I care about do). So I need to go through all architectures and make sure they use for_each_sg, or at least to change ARCH_HAS_SG_CHAIN to a Kconfig define so that dependencies can be expressed properly. What is actually preventing ARCH_HAS_SG_CHAIN from all these ARCHES? is that the DMA drivers not using for_each_sg(). Sounds like easy to fix. But yes a deep change would convert ARCH_HAS_SG_CHAIN to a Kconfig. If you want to be lazy, like me, You might just put a BUILD_BUG_ON in code, requesting the user to disable the driver for this ARCH. I bet there is more things to do at ARCH to enable virtualization then just support ARCH_HAS_SG_CHAIN. Be it just another requirement. If you Document it and make sure current ARCHs are fine, it should not ever trigger. And BTW you won't need that new __sg_set_page API anymore. Kind of. sg_init_table(sg, 2); sg_set_buf(sg[0], req, sizeof(req)); sg_chain(sg[1], scsi_out(sc)); is still a little bit worse than __sg_set_buf(sg[0], req, sizeof(req)); __sg_chain(sg[1], scsi_out(sc)); I believe they are the same, specially for the on the stack 2 elements array. Actually I think In both cases you need to at least call sg_init_table() once after allocation, No? Your old code with big array copy and re-shaping was a better example of the need for your new API. Which I agree. But please for my sake do not call it __sg_chain. Call it something like sg_chain_not_end(). I hate those __ which for god sack means what? (A good name is when I don't have to read the code, an __ means fuck you go read the code) Paolo Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)
On 07/26/2012 10:23 AM, Paolo Bonzini wrote: In the meanwhile, we still have a bug to fix, and we need to choose between Sen Wang's v1 (sg_set_page) or v2 (value assignment). I'm still leaning more towards v2, if only because I already tested that one myself. It's your call, you know what I think ;-) I Agree none of them are bugs in current code, they will both work just the same. Paolo Please CC me on the convert to sg copy-less patches, It looks interesting Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/