Re: IRQ (routing ?) problem [was Re: epic100 in current -ac kernels]
I noticed that there have been updates to epic100 again and just wanted to note that the problem remains: 2.4.2-ac3 still crashes, but it works fine when I use the epic100.c from 2.4.0-test9, which was the last working version for me. Arnd On Thu, 15 Feb 2001, ARND BERGMANN wrote: Sorry for the delay, I could not get physical access to the machine for the last days. I was able to do some more testing today and found this: - The problem is not the IRQ /sharing/, after getting rid of all the other PCI cards, the problem was still there. - The only thing that seems to have any effect on the symptoms is the presence of the USB driver, either usb-uhci or uhci. I am not using USB at all. As described before, the system behaves is either of those ways: * epic100 driver without DMA mapping (e.g. 2.4.0-ac9): normal operation * driver with DMA mapping+USB driver loaded: lots of interrupts - slow * driver with DMA mapping, USB driver not loaded: hang after ~2 seconds - I sometimes get 'spurious interrupt: IRQ7', even though no device is connected there. Probably not important. On Sat, 10 Feb 2001, Francois Romieu wrote: The following informations may help: - motherboard type Asus A7V, onboard USB hub and Promise ATA/100 chip - bios revision Can't see right now, system was bought in October 2000 I think it was 1.004, but I am not sure. - lspci -x see attachment, this was when I ripped out sound, tv and scsi - 2.4.2pre3 + whatever recent ac epic100 = ? Still no improvement until latest -ac (2.4.1-ac13) Arnd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
test11: lockup when reading /proc/ide/hde/identify
Hi! I think I found a bug in the IDE subsystem. When I do 'cat /proc/ide/hde/identify', the system locks up completely, not even Alt+RysRq+B helps. Everything else under /proc/ide works. hdparm can cause the same symptoms, but I have not checked when exactly it does so. I have an Asus A7V mainboard with VIA 82C686A as first IDE controller and an onboard Promise PDC20265 as second IDE controller. Both have a Fujitsu MPF3204AT as their primary master drive, but the problem occurs only on the Promise adapter. I have tried kernel 2.4.0-test11-pre6, test11-ac2 and ide.2.4.0-t11.1120, all with the same result, but I did not try any older kernels, because I installed the machine just two days ago. Arnd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
epic100 in current -ac kernels
There seems to be some movement in the driver and the latest one is not working for me (again), so I'm giving a subjective status report for the versions I have tried lately: Working epic100 drivers: - 2.4.0 - 2.4.0-ac9 Broken epic100 drivers: - 2.4.0-ac4 - 2.4.1-ac2 - 2.4.1-ac4 I have not yet looked at the source to find the problem, but the other kernels between that each seem to contain one of the those versions above. The symptom is always that after 'ifconfig eth0 up', the system slows down to the point where I can hardly type on the keyboard and even 'ifconfig eth0 down' takes serveral seconds (on an Athlon-800)! The boot message is: eth0: SMSC EPIC/100 83c170 at 0xd091e000, IRQ 11, 00:e0:29:6c:36:6f. eth0: MII transceiver #3 control 3000 status 7849. eth0: Autonegotiation advertising 01e1 link partner 0001. epic100.c:v1.11 1/7/2001 Written by Donald Becker [EMAIL PROTECTED] http://www.scyld.com/network/epic100.html (unofficial 2.4.x kernel port, version 1.1.6, January 11, 2001) PCI: Found IRQ 11 for device 00:0d.0 PCI: The same IRQ used for device 00:04.2 PCI: The same IRQ used for device 00:04.3 PCI: The same IRQ used for device 00:09.0 The device on 00:04:[23] is a VT82C586B USB and on 00:09:0 an Ensoniq 5880 AudioPCI (rev 02). I can not change the IRQ settings right now without physical access to the machine (it is locked). At least with some broken versions, I also got these messages in syslog (every 4 seconds): Feb 7 21:10:06 project kernel: NETDEV WATCHDOG: eth0: transmit timed out Feb 7 21:10:06 project kernel: eth0: Transmit timeout using MII device, Tx status 000b. Feb 7 21:10:10 project kernel: NETDEV WATCHDOG: eth0: transmit timed out Feb 7 21:10:10 project kernel: eth0: Transmit timeout using MII device, Tx status 000b. Feb 7 21:10:14 project kernel: NETDEV WATCHDOG: eth0: transmit timed out Feb 7 21:10:14 project kernel: eth0: Transmit timeout using MII device, Tx status 000b. ... The card is acoording to lspci: 00:0d.0 Ethernet controller: Standard Microsystems Corp [SMC] 83C170QF (rev 08) Subsystem: Standard Microsystems Corp [SMC]: Unknown device a020 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 32 (2000ns min, 7000ns max) Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at 9800 [size=256] Region 1: Memory at df80 (32-bit, non-prefetchable) [size=4K] Expansion ROM at unassigned [disabled] [size=64K] Capabilities: [dc] Power Management version 1 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Arnd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: epic100 in current -ac kernels
On Thu, 8 Feb 2001, Francois Romieu wrote: Working epic100 drivers: - 2.4.0 - 2.4.0-ac9 Could you give a look at ac12 (fine here) ? No, does not work, same problem. Arnd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: epic100 in current -ac kernels
On Fri, 9 Feb 2001, Francois Romieu wrote: ARND BERGMANN [EMAIL PROTECTED] écrit : On Thu, 8 Feb 2001, Francois Romieu wrote: Working epic100 drivers: - 2.4.0 - 2.4.0-ac9 Could you give a look at ac12 (fine here) ? No, does not work, same problem. The modifications between ac9 and ac12 come from the new DMA mapping. What about 2.4.0-ac5? That had the same problem as -ac12. Did it also have the new DMA mapping? They added a bug for the (already buggy ?) big-endian machines. I would be surprised that something has *always* been missing in the driver and your hardware triggers it*. IMHO the culprit is to be found elsewhere. Yes, I'm pretty sure the problem is not only the epic100 driver, now that I have done some more investigation. With the broken drivers (I tried 2.4.0-ac12 and 2.4.1-ac5), something generates an enourmous amount of interrupts as soon as I run 'ifconfig eth0 up'. Within 10 seconds, I got roughly 95 interrupts on IRQ11, instead of 30! After disabling the usb-uhci (I was using the JE driver) in the BIOS setup, the system reproducibly locked up hard a few seconds after 'ifconfig eth0 up' instead of just getting slow. Unfortunately, I have no way to also disable the sound card, but at least it does not make a change if the sound driver is loaded or not. I'd like to know what it's worth to share an irq with a pio audio card. On Monday I can ask the system administrator for the keys so I can open the machine and put the card into another slot. Right now, USB, sound and network are hardwired to the same IRQ, that's how the system arrived here. Arnd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ (routing ?) problem [was Re: epic100 in current -ac kernels]
Sorry for the delay, I could not get physical access to the machine for the last days. I was able to do some more testing today and found this: - The problem is not the IRQ /sharing/, after getting rid of all the other PCI cards, the problem was still there. - The only thing that seems to have any effect on the symptoms is the presence of the USB driver, either usb-uhci or uhci. I am not using USB at all. As described before, the system behaves is either of those ways: * epic100 driver without DMA mapping (e.g. 2.4.0-ac9): normal operation * driver with DMA mapping+USB driver loaded: lots of interrupts - slow * driver with DMA mapping, USB driver not loaded: hang after ~2 seconds - I sometimes get 'spurious interrupt: IRQ7', even though no device is connected there. Probably not important. On Sat, 10 Feb 2001, Francois Romieu wrote: The following informations may help: - motherboard type Asus A7V, onboard USB hub and Promise ATA/100 chip - bios revision Can't see right now, system was bought in October 2000 I think it was 1.004, but I am not sure. - lspci -x see attachment, this was when I ripped out sound, tv and scsi - 2.4.2pre3 + whatever recent ac epic100 = ? Still no improvement until latest -ac (2.4.1-ac13) Arnd 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02) Subsystem: Asustek Computer, Inc.: Unknown device 8033 Flags: bus master, medium devsel, latency 0 Memory at e700 (32-bit, prefetchable) [size=16M] Capabilities: [a0] AGP version 2.0 Capabilities: [c0] Power Management version 2 00: 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00 10: 08 00 00 e7 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 00:01.0 PCI bridge: VIA Technologies, Inc.: Unknown device 8305 (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, medium devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Memory behind bridge: e280-e3df Prefetchable memory behind bridge: e3f0-e6ff Capabilities: [80] Power Management version 2 00: 06 11 05 83 07 00 30 22 00 00 04 06 00 00 01 00 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 d0 00 00 20: 80 e2 d0 e3 f0 e3 f0 e6 00 00 00 00 00 00 00 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 00 00 08 00 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) Subsystem: Asustek Computer, Inc.: Unknown device 8033 Flags: bus master, stepping, medium devsel, latency 0 00: 06 11 86 06 87 00 10 02 22 00 01 06 00 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) (prog-if 8a [Master SecP PriP]) Flags: bus master, medium devsel, latency 32 I/O ports at d800 [size=16] Capabilities: [c0] Power Management version 2 00: 06 11 71 05 07 00 90 02 10 8a 01 01 00 20 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 01 d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 00 00 00 00:04.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10) (prog-if 00 [UHCI]) Subsystem: Unknown device 0925:1234 Flags: bus master, medium devsel, latency 32, IRQ 11 I/O ports at d400 [size=32] Capabilities: [80] Power Management version 2 00: 06 11 38 30 17 00 10 02 10 00 03 0c 08 20 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 01 d4 00 00 00 00 00 00 00 00 00 00 25 09 34 12 30: 00 00 00 00 80 00 00 00 00 00 00 00 0b 04 00 00 00:04.3 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10) (prog-if 00 [UHCI]) Subsystem: Unknown device 0925:1234 Flags: bus master, medium devsel, latency 32, IRQ 11 I/O ports at d000 [size=32] Capabilities: [80] Power Management version 2 00: 06 11 38 30 17 00 10 02 10 00 03 0c 08 20 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 01 d0 00 00 00 00 00 00 00 00 00 00 25 09 34 12 30: 00 00 00 00 80 00 00 00 00 00 00 00 0b 04 00 00 00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30) Flags: medium devsel, IRQ 9 Capabilities: [68] Power Management version 2 00: 06 11 57 30 00 00 90 02 30 00 00 06 00 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 68 00 00 00 00 00 00 00 00 00 00 00 00:0c.0 Ethernet controller: Standard Microsystems Corp [SMC] 83C170QF (rev 08) Subsystem: Standard Microsystems Corp [SMC]: Unknown device a020 Flags: bus master, fast devsel, latency 32, IRQ 10 I/O ports at a400 [size=256]
Re: epic100 aka smc etherpower II
Daniel Nofftz [EMAIL PROTECTED] wrote: i can`t get my smc etherpower ii working with the 2.4.3 kernel. now i have downgraded to 2.4.2 and it works again ... does anyone have a suggestion, what the problem is ? Looks to me like the problem I had in Febuary, see the thread "epic100 in current -ac kernels" at http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg28523.html After I had upgraded my BIOS, the problems were gone and I stopped looking into it. The DMA mapping code first introduced in 2.4.0-ac2 (smallest diff here) originally triggered the bug, which had different symptoms depending on the configuration of the chipset. Note that I have a VIA VT8363 (KT133) chipset while this is a VT82C595 (VP2) chipset, so it is appearantly not limited to one very specific configuration. Arnd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] MTD driver for MMC cards
This is a new version of the driver I posted back in January. I now have hardware to test it and fixed a number of bugs, most of which are the ones that Pierre told me about in the first place. It now seems to work fine with the mtdblock driver, which of course it entirely pointless. I've tried using it with jffs2 once, but get an immediate oops, which still needs some investigation. I'm also not sure what to do about SDHC media, but probably the easiest solutions is to disallow them with this driver -- the mtd layer doesn't deal with media larger than 4GB anyway at this point. There is also still some need for performance testing. Jörn brought up the point that if a specific card can't have multiple open erase block simulateously, it's rather pointless for logfs. It might still be useful to use jffs2 on those cards, because IFAIK that only writes to one erase block at any time. Signed-off-by: Arnd Bergmann [EMAIL PROTECTED] --- Index: olpc-2.6/drivers/mmc/mmc.c === --- olpc-2.6.orig/drivers/mmc/mmc.c +++ olpc-2.6/drivers/mmc/mmc.c @@ -621,6 +621,7 @@ static void mmc_decode_csd(struct mmc_ca csd-r2w_factor = UNSTUFF_BITS(resp, 26, 3); csd-write_blkbits = UNSTUFF_BITS(resp, 22, 4); csd-write_partial = UNSTUFF_BITS(resp, 21, 1); + csd-erase_blksize = UNSTUFF_BITS(resp, 39, 7); break; case 1: /* @@ -649,6 +650,8 @@ static void mmc_decode_csd(struct mmc_ca csd-r2w_factor = 4; /* Unused */ csd-write_blkbits = 9; csd-write_partial = 0; +#warning need to read au_size for sdhc + csd-erase_blksize = 0; // 8192 au_size; break; default: printk(%s: unrecognised CSD structure version %d\n, @@ -691,6 +694,8 @@ static void mmc_decode_csd(struct mmc_ca csd-r2w_factor = UNSTUFF_BITS(resp, 26, 3); csd-write_blkbits = UNSTUFF_BITS(resp, 22, 4); csd-write_partial = UNSTUFF_BITS(resp, 21, 1); + csd-erase_blksize = (UNSTUFF_BITS(resp, 37, 5) + 1) * + (UNSTUFF_BITS(resp, 42, 5) + 1); } } Index: olpc-2.6/include/linux/mmc/card.h === --- olpc-2.6.orig/include/linux/mmc/card.h +++ olpc-2.6/include/linux/mmc/card.h @@ -32,6 +32,7 @@ struct mmc_csd { unsigned intmax_dtr; unsigned intread_blkbits; unsigned intwrite_blkbits; + unsigned interase_blksize; unsigned intcapacity; unsigned intread_partial:1, read_misalign:1, Index: olpc-2.6/drivers/mmc/mmc_mtd.c === --- /dev/null +++ olpc-2.6/drivers/mmc/mmc_mtd.c @@ -0,0 +1,366 @@ +/* + * MTD driver for MMC cards + */ +#include linux/init.h +#include linux/module.h +#include linux/mmc/card.h +#include linux/mmc/protocol.h +#include linux/mmc/host.h +#include linux/scatterlist.h +#include linux/mtd/mtd.h + +/* + * check if a write command was completed correctly, must be called + * with host claimed. + */ +static int mmc_mtd_get_status(struct mmc_card *card) +{ + int err; + struct mmc_command cmd; + + do { + cmd = (struct mmc_command) { + .opcode = MMC_SEND_STATUS, + .arg = card-rca 16, + .flags = MMC_RSP_R1 | MMC_CMD_AC, + }; + + err = mmc_wait_for_cmd(card-host, cmd, 5); + if (err) { + dev_err(card-dev, error %d requesting status\n, err); + break; + } + } while (!(cmd.resp[0] R1_READY_FOR_DATA)); + + return err; +} + +/* + * erase a range of erase groups aligned to mtd-erase_size + */ +static int mmc_mtd_erase(struct mtd_info *mtd, struct erase_info *instr) +{ + struct mmc_card *card = mtd-priv; + struct mmc_command cmd[3] = { { + .opcode = MMC_ERASE_GROUP_START, + .arg = instr-addr, + .flags = MMC_RSP_R1 | MMC_CMD_AC, + }, { + .opcode = MMC_ERASE_GROUP_END, + .arg = instr-addr + instr-len, + .flags = MMC_RSP_R1 | MMC_CMD_AC, + }, { + .opcode = MMC_ERASE, + .flags = MMC_RSP_R1B | MMC_CMD_AC, + }, + }; + int err, i; + + dev_dbg(card-dev, %s: from %d len %d\n, __FUNCTION__, + instr-addr, instr-len); + + instr-state = MTD_ERASING; + err = 0
Re: [mmc] alternative TI FM MMC/SD driver for 2.6.21-rc7
On Thursday 19 April 2007, Sergey Yanovich wrote: The device is present in many notebooks. Notebooks depend heavily on suspend/resume functionality. tifm_core/7xx1/sd family is an ambitous, but uncompleted project. It used to crash on resuming, or hang up on suspending. A less common failure used to be trigerred by a fast card insert/removal sequence. Finally, tifm_sd module needs to be manually inserted. As very general comments, you should have the maintainer of the subsystem (Pierre in this case) on Cc when posting a driver, and you should include the patch inline in your mail, see Documentation/SubmittingPatches. More specific to your patch: You should include the Makefile and Kconfig changes in the same patch/mail, no point splitting these out. Don't define your own DBG macro, instead use the predefined dev_dbg() that has a similar definition. Your mmc_tifm_irq_chip() function does a _very_ long delay of 100 miliseconds. This is normally not acceptable, since it is a noticeable time in which the system is completely unresponsive. Maybe you can convert the tasklet to a workqueue, which lets you call msleep instead of mdelay. Your use of pci_map_sg() looks wrong, you simply can't assume that the return value is '1' in general. I've stumbled over that same problem in the sdhci driver, so it may be inherent to the mmc layer and not be driver specific. Other than that, your driver looks pretty good to me. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] Kconfig: refine depends statements.
On Friday 20 April 2007, Martin Schwidefsky wrote: diff -urpN linux-2.6/drivers/auxdisplay/Kconfig linux-2.6-patched/drivers/auxdisplay/Kconfig --- linux-2.6/drivers/auxdisplay/Kconfig2007-04-19 15:23:55.0 +0200 +++ linux-2.6-patched/drivers/auxdisplay/Kconfig2007-04-19 15:49:17.0 +0200 @@ -6,6 +6,7 @@ # menu Auxiliary Display support + depends on PARPORT_PC config KS0108 tristate KS0108 LCD Controller I would guess that this actually depends on PARPORT, not PARPORT_PC. The rest of this patch looks good. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/8] Kconfig: unwanted menus for s390.
On Friday 20 April 2007, Martin Schwidefsky wrote: diff -urpN linux-2.6/drivers/char/ipmi/Kconfig linux-2.6-patched/drivers/char/ipmi/Kconfig --- linux-2.6/drivers/char/ipmi/Kconfig 2007-02-04 19:44:54.0 +0100 +++ linux-2.6-patched/drivers/char/ipmi/Kconfig 2007-04-19 15:49:55.0 +0200 @@ -3,6 +3,8 @@ # menu IPMI + depends on !S390 + config IPMI_HANDLER tristate 'IPMI top-level message handler' help I think I made this comment the last time we discussed this topic, but don't remember the exact outcome. I would prefer to not have 'depends on !S390' but rather 'depends on MMIO', because that is what really drives stuff like IPMI: they expect the device to be reachable through the use of ioremap or inX/outX instructions, which don't exist on s390. While it's unlikely that another architecture has the same restriction, it expresses much clearer what you mean. In drivers/Kconfig, you can then simply add a config MMIO def_bool !S390 There are a few exceptions though, that I think should not depend on MMIO: --- linux-2.6/drivers/dma/Kconfig 2007-04-19 15:24:33.0 +0200 +++ linux-2.6-patched/drivers/dma/Kconfig 2007-04-19 15:49:55.0 +0200 @@ -3,6 +3,7 @@ # menu DMA Engine support + depends on !S390 config DMA_ENGINE bool Support for DMA engines I'd leave the menu enabled. If the DMA engine infrastructure becomes more widely used, you may want to add an implementation for s390 using milicoded instructions like xor-string or copy-page. diff -urpN linux-2.6/drivers/input/Kconfig linux-2.6-patched/drivers/input/Kconfig --- linux-2.6/drivers/input/Kconfig 2007-02-04 19:44:54.0 +0100 +++ linux-2.6-patched/drivers/input/Kconfig 2007-04-19 15:49:55.0 +0200 @@ -3,6 +3,7 @@ # menu Input device support + depends on !S390 config INPUT tristate Generic input layer (needed for keyboard, mouse, ...) if EMBEDDED Probably leave this as !S390. One could imagine channel-attached input devices or the idea of intepreting a terminal as an input device, but no driver currently does and probably never will. diff -urpN linux-2.6/drivers/isdn/Kconfig linux-2.6-patched/drivers/isdn/Kconfig --- linux-2.6/drivers/isdn/Kconfig2007-02-04 19:44:54.0 +0100 +++ linux-2.6-patched/drivers/isdn/Kconfig2007-04-19 15:49:55.0 +0200 @@ -3,6 +3,7 @@ # menu ISDN subsystem + depends on !S390 config ISDN tristate ISDN support Same here, actually there was an IBM 2216 ISDN adapter with channel attachment, but I don't think anybody wants to add a driver for that one. diff -urpN linux-2.6/drivers/misc/Kconfig linux-2.6-patched/drivers/misc/Kconfig --- linux-2.6/drivers/misc/Kconfig2007-04-19 15:24:35.0 +0200 +++ linux-2.6-patched/drivers/misc/Kconfig2007-04-19 15:49:55.0 +0200 @@ -3,6 +3,7 @@ # menu Misc devices + depends on !S390 config IBM_ASM tristate Device driver for IBM RSA service processor Maybe just leave the menu open, all drivers in it are already depending on PCI or similar and someone might add a driver that does work on s390 here. diff -urpN linux-2.6/drivers/net/phy/Kconfig linux-2.6-patched/drivers/net/phy/Kconfig --- linux-2.6/drivers/net/phy/Kconfig 2007-02-04 19:44:54.0 +0100 +++ linux-2.6-patched/drivers/net/phy/Kconfig 2007-04-19 15:49:55.0 +0200 @@ -3,6 +3,7 @@ # menu PHY device support + depends on !S390 config PHYLIB tristate PHY Device support and infrastructure Also depends on !S390, not MMIO. A future network adapter might give you access to the phy device through other means than MMIO. diff -urpN linux-2.6/drivers/rtc/Kconfig linux-2.6-patched/drivers/rtc/Kconfig --- linux-2.6/drivers/rtc/Kconfig 2007-04-19 15:24:39.0 +0200 +++ linux-2.6-patched/drivers/rtc/Kconfig 2007-04-19 15:49:55.0 +0200 @@ -3,6 +3,7 @@ # menu Real Time Clock + depends on !S390 config RTC_LIB tristate Applications might actually want to use the RTC interface to access the system time or get accurate timers, but the rtc drivers are all very dependant on either MMIO or I2C. Not sure what would be best here. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] Kconfig: unwanted config options for s390.
On Friday 20 April 2007, Martin Schwidefsky wrote: diff -urpN linux-2.6/drivers/char/Kconfig linux-2.6-patched/drivers/char/Kconfig --- linux-2.6/drivers/char/Kconfig2007-04-19 15:49:51.0 +0200 +++ linux-2.6-patched/drivers/char/Kconfig2007-04-19 15:50:50.0 +0200 @@ -6,6 +6,7 @@ menu Character devices config VT bool Virtual terminal if EMBEDDED + depends on !S390 select INPUT default y if !VIOCONS ---help--- ok @@ -81,6 +82,7 @@ config VT_HW_CONSOLE_BINDING config SERIAL_NONSTANDARD bool Non-standard serial port support + depends on !S390 ---help--- Say Y here if you have any non-standard serial boards -- boards which aren't supported using the standard dumb serial driver. depends on MMIO @@ -774,7 +776,7 @@ config NVRAM config RTC tristate Enhanced Real Time Clock Support - depends on !PPC !PARISC !IA64 !M68K (!SPARC || PCI) !FRV !ARM !SUPERH + depends on !PPC !PARISC !IA64 !M68K (!SPARC || PCI) !FRV !ARM !SUPERH !S390 ---help--- If you say Y here and create a character special file /dev/rtc with major number 10 and minor number 135 using mknod (man mknod), you @@ -822,7 +824,7 @@ config SGI_IP27_RTC config GEN_RTC tristate Generic /dev/rtc emulation - depends on RTC!=y !IA64 !ARM !M32R !SPARC !FRV + depends on RTC!=y !IA64 !ARM !M32R !SPARC !FRV !S390 ---help--- If you say Y here and create a character special file /dev/rtc with major number 10 and minor number 135 using mknod (man mknod), you ok. this one is bad in general and should probably be a select from the architecture, but that should not stop you from adding another architecture... @@ -878,6 +880,7 @@ config DTLK config R3964 tristate Siemens R3964 line discipline + depends on !S390 ---help--- This driver allows synchronous communication with devices using the Siemens R3964 packet protocol. Unless you are dealing with special Does it build? I don't see a point disabling this one just because there are no users. Most architectures also don't have users for this one, but it doesn't hurt be able to build it using allyesconfig. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] Kconfig: silicon backplane dependency.
On Friday 20 April 2007, Martin Schwidefsky wrote: From: Martin Schwidefsky [EMAIL PROTECTED] Make the Sonics Silicon Backplane menu dependent on the two buses it can be found on. Goes on top of git-wireless.patch. Cc: Michael Buesch [EMAIL PROTECTED] Cc: John W. Linville [EMAIL PROTECTED] Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED] --- drivers/ssb/Kconfig | 1 + 1 files changed, 1 insertion(+) diff -urpN linux-2.6/drivers/ssb/Kconfig linux-2.6-patched/drivers/ssb/Kconfig --- linux-2.6/drivers/ssb/Kconfig 2007-04-19 15:24:40.0 +0200 +++ linux-2.6-patched/drivers/ssb/Kconfig 2007-04-19 15:55:44.0 +0200 @@ -1,4 +1,5 @@ menu Sonics Silicon Backplane + depends on PCI || PCMCIA No, this doesn't look right. There are other devices that come with SiliconBackplane but are not PCI or PCMCIA style devices. I'd make this 'depends on MMIO' as well if you add that option. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/8] Kconfig: unwanted menus for s390.
On Sunday 22 April 2007, Arnd Bergmann wrote: I would prefer to not have 'depends on !S390' but rather 'depends on MMIO', because that is what really drives stuff like IPMI: they expect the device to be reachable through the use of ioremap or inX/outX instructions, which don't exist on s390. While it's unlikely that another architecture has the same restriction, it expresses much clearer what you mean. In drivers/Kconfig, you can then simply add a config MMIO def_bool !S390 I just saw that we already have an option like that, with a slightly different name. arch/s390/Kconfig contains config NO_IOMEM def_bool y and lib/Kconfig contains config HAS_IOMEM boolean depends on !NO_IOMEM default y You should probably just use one of these two to disable any driver that uses ioremap or similar. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH][RFC] PCMCIA support for 8xx using platform devices
On Sunday 22 April 2007, Vitaly Bordug wrote: This utilizes PCMCIA on mpc885ads and mpc866ads from arch/powerpc. In the new approach, direct IMMR accesses from within drivers/ were totally eliminated, that requires hardware_enable, hardware_disable, voltage_set board-specific functions to be moved over to BSP code section (arch/powerpc/platforms/8xx in 885 case). There is just no way to have both arch/ppc and arch/powerpc approaches to work simultaneously because of that. Maybe I'm missing a key issue here, but what's the point of adding more platform_devices for stuff that is already in the device tree? Shouldn't this be made an of_platform_driver instead so you can use the existing of_device directly? Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] Kconfig: silicon backplane dependency.
On Monday 23 April 2007, Martin Schwidefsky wrote: The current Kconfig code does not check all select statements if they can be enabled before allowing the config option that does the select. So the rule for using select statements is that the depends line of the config option that selects another config option needs to be at least as restrictive as the depends line of the selected option. Hence I'll add the HAS_IOMEM depends to B44 as well. Okay ? Isn't B44 already behind a WIRELESS or IEEE80211 or similar option that can't be selected on s390? Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] Kconfig: silicon backplane dependency.
On Monday 23 April 2007, Martin Schwidefsky wrote: Isn't B44 already behind a WIRELESS or IEEE80211 or similar option that can't be selected on s390? No, the option can be found in drivers/net/Kconfig under menu Ethernet (10 or 100Mbit). Ah, I was confusing it with b43. Depends on HAS_IOMEM sounds good then. I'd prefer to make it 'depends on SSB' instead of 'select SSB', but I don't want to get into that argument ;-) Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RESEND] ehea: fix for dlpar and sysfs entries
On Monday 23 April 2007, Jan-Bernd Themann wrote: - dlpar fix: certain resources may only be allocated when first logical port is available, and must be removed when last logical port has been removed - sysfs entries: create symbolic link from each logical port to ehea driver I can't see anything wrong with the patch contents, but if you know that there are two changes, you really should make it two separate patches. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] Kconfig: cleanup s390 v2.
On Monday 23 April 2007, Martin Schwidefsky wrote: I've added the results of the review to the Kconfig cleanup patches for s390. Patch #2 has been split, one half has all the HAS_IOMEM depends lines the other the remaining !S390 depends lines. They all look good to me now - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/7] revoke: wire up s390 system calls
On Friday 09 March 2007, Pekka J Enberg wrote: From: Serge E. Hallyn [EMAIL PROTECTED] Make revokeat and frevoke system calls available to user-space on s390. Signed-off-by: Serge E. Hallyn [EMAIL PROTECTED] Signed-off-by: Pekka Enberg [EMAIL PROTECTED] Looks good to me, but you really should through Martin, since he has an overview of what syscall numbers may already be assigned some another patch he has queued up. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH x86 for review III] [1/29] i386: avoid gcc extension
On Monday 12 February 2007 17:51, Andi Kleen wrote: setcc() in math-emu is written as a gcc extension statement expression macro that returns a value. However, it's not used that way and it's not needed like that, so just make it a do-while non-extension macro so that we don't use an extension when it's not needed. The patch looks good but it doesn't match the description any more, since you now use a function... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Open Firmware serial port driver
This can be used for serial ports that are connected to an OF platform bus but are not autodetected by the lecacy serial support. It will automatically take over devices that come from the legacy serial detection, which usually is only one device. In some cases, rtas may be set up to use the serial port in the firmware, which allows easier debugging before probing the serial ports. In this case, the used-by-rtas property must be set by the firmware. This patch also adds code to the legacy serial driver to check for this. Signed-off-by: Arnd Bergmann [EMAIL PROTECTED] --- Who will handle this driver? It is powerpc specific and hooks into powerpc code at one place, but it's also a new driver for the (orphaned) serial layer. Could either Paul or Andrew merge this, or whoever else feels responsible? arch/powerpc/kernel/legacy_serial.c | 15 + drivers/serial/Kconfig | 10 + drivers/serial/Makefile |1 drivers/serial/of_serial.c | 143 4 files changed, 169 insertions(+) Index: linux-2.6/drivers/serial/Makefile === --- linux-2.6.orig/drivers/serial/Makefile +++ linux-2.6/drivers/serial/Makefile @@ -58,3 +58,4 @@ obj-$(CONFIG_SERIAL_SGI_IOC3) += ioc3_se obj-$(CONFIG_SERIAL_ATMEL) += atmel_serial.o obj-$(CONFIG_SERIAL_UARTLITE) += uartlite.o obj-$(CONFIG_SERIAL_NETX) += netx-serial.o +obj-$(CONFIG_SERIAL_OF_PLATFORM) += of_serial.o Index: linux-2.6/drivers/serial/of_serial.c === --- /dev/null +++ linux-2.6/drivers/serial/of_serial.c @@ -0,0 +1,143 @@ +/* + * Serial Port driver for Open Firmware platform devices + * + *Copyright (C) 2006 Arnd Bergmann [EMAIL PROTECTED], IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ +#include linux/init.h +#include linux/module.h +#include linux/serial_core.h +#include linux/serial_8250.h + +#include asm/of_platform.h +#include asm/prom.h + +/* + * Fill a struct uart_port for a given device node + */ +static int __devinit of_platform_serial_setup(struct of_device *ofdev, + int type, struct uart_port *port) +{ + struct resource resource; + struct device_node *np = ofdev-node; + const unsigned int *clk, *spd; + int ret; + + memset(port, 0, sizeof *port); + spd = get_property(np, current-speed, NULL); + clk = get_property(np, clock-frequency, NULL); + if (!clk) { + dev_warn(ofdev-dev, no clock-frequency property set\n); + return -ENODEV; + } + + ret = of_address_to_resource(np, 0, resource); + if (ret) { + dev_warn(ofdev-dev, invalid address\n); + return ret; + } + + spin_lock_init(port-lock); + port-mapbase = resource.start; + port-irq = irq_of_parse_and_map(np, 0); + port-iotype = UPIO_MEM; + port-type = type; + port-uartclk = *clk; + port-flags = UPF_SHARE_IRQ | UPF_BOOT_AUTOCONF | UPF_IOREMAP; + port-dev = ofdev-dev; + port-custom_divisor = *clk / (16 * (*spd)); + + return 0; +} + +/* + * Try to register a serial port + */ +static int __devinit of_platform_serial_probe(struct of_device *ofdev, + const struct of_device_id *id) +{ + struct uart_port port; + int port_type; + int ret; + + if (of_find_property(ofdev-node, used-by-rtas, NULL)) + return -EBUSY; + + port_type = (unsigned long)id-data; + ret = of_platform_serial_setup(ofdev, port_type, port); + if (ret) + goto out; + + switch (port_type) { + case PORT_UNKNOWN: + dev_info(ofdev-dev, Unknown serial port found, + attempting to use 8250 driver\n); + /* fallthrough */ + case PORT_8250 ... PORT_MAX_8250: + ret = serial8250_register_port(port); + break; + default: + /* need to add code for these */ + ret = -ENODEV; + break; + } + if (ret 0) + goto out; + + ofdev-dev.driver_data = (void *)(unsigned long)ret; + return 0; +out: + irq_dispose_mapping(port.irq); + return ret; +} + +/* + * Release a line + */ +static int of_platform_serial_remove(struct of_device *ofdev) +{ + int line = (unsigned long)ofdev-dev.driver_data; + serial8250_unregister_port(line); + return 0; +} + +/* + * A few common types, add more as needed. + */ +static struct of_device_id __devinitdata of_platform_serial_table[] = { + { .type = serial, .compatible = ns8250, .data
Re: export of_find_property
On Wednesday 14 February 2007 22:54, Dave Jones wrote: Without this, building drivers/serial/of_serial.c as a module fails. WARNING: .of_find_property [drivers/serial/of_serial.ko] undefined! Signed-off-by: Dave Jones [EMAIL PROTECTED] Acked-by: Arnd Bergmann [EMAIL PROTECTED] Sorry about that one. This was introduced by a last-minute change, and I didn't retest building as a module with it. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
On Thursday 15 February 2007 00:52, Carl Love wrote: --- linux-2.6.20-rc1.orig/arch/powerpc/oprofile/Kconfig 2007-01-18 16:43:14.0 -0600 +++ linux-2.6.20-rc1/arch/powerpc/oprofile/Kconfig2007-02-13 19:04:46.271028904 -0600 @@ -7,7 +7,8 @@ config OPROFILE tristate OProfile system profiling (EXPERIMENTAL) - depends on PROFILING + default m + depends on SPU_FS PROFILING help OProfile is a profiling system capable of profiling the whole system, include the kernel, kernel modules, libraries, Milton already commented on this being wrong. I think what you want is depends on PROFILING (SPU_FS = n || SPU_FS) that should make sure that when SPU_FS=y that OPROFILE can not be 'm'. @@ -15,3 +16,10 @@ If unsure, say N. +config OPROFILE_CELL + bool OProfile for Cell Broadband Engine + depends on SPU_FS OPROFILE + default y + help + OProfile for Cell BE requires special support enabled + by this option. You should at least mention that this allows profiling the spus. +#define EFWCALL ENOSYS /* Use an existing error number that is as + * close as possible for a FW call that failed. + * The probability of the call failing is + * very low. Passing up the error number + * ensures that the user will see an error + * message saying OProfile did not start. + * Dmesg will contain an accurate message + * about the failure. + */ ENOSYS looks wrong though. It would appear to the user as if the oprofile function in the kernel was not present. I'd suggest EIO, and not use an extra define for that. static int rtas_ibm_cbe_perftools(int subfunc, int passthru, void *address, unsigned long length) { u64 paddr = __pa(address); - return rtas_call(pm_rtas_token, 5, 1, NULL, subfunc, passthru, - paddr 32, paddr 0x, length); + pm_rtas_token = rtas_token(ibm,cbe-perftools); + + if (unlikely(pm_rtas_token == RTAS_UNKNOWN_SERVICE)) { + printk(KERN_ERR +%s: rtas token ibm,cbe-perftools unknown\n, +__FUNCTION__); + return -EFWCALL; + } else { + + return rtas_call(pm_rtas_token, 5, 1, NULL, subfunc, + passthru, paddr 32, paddr 0x, length); + } } Are you now reading the rtas token every time you call rtas? that seems like a waste of time. +#define size 24 +#define ENTRIES (0x18) /* 256 */ +#define MAXLFSR 0xFF + +int initial_lfsr[] = +{16777215, 3797240, 13519805, 11602690, 6497030, 7614675, 2328937, 2889445, + 12364575, 8723156, 2450594, 16280864, 14742496, 10904589, 6434212, 4996256, + 5814270, 13014041, 9825245, 410260, 904096, 15151047, 15487695, 3061843, + 16482682, 7938572, 4893279, 9390321, 4320879, 5686402, 1711063, 10176714, + 4512270, 1057359, 16700434, 5731602, 2070114, 16030890, 1208230, 15603106, + 11857845, 6470172, 1362790, 7316876, 8534496, 1629197, 10003072, 1714539, + 1814669, 7106700, 5427154, 3395151, 3683327, 12950450, 16620273, 12122372, + 7194999, 9952750, 3608260, 13604295, 2266835, 14943567, 7079230, 777380, + 4516801, 1737661, 8730333, 13796927, 3247181, 9950017, 3481896, 16527555, + 13116123, 14505033, 9781119, 4860212, 7403253, 13264219, 12269980, 100120, + 664506, 607795, 8274553, 13133688, 6215305, 13208866, 16439693, 3320753, + 8773582, 13874619, 1784784, 4513501, 11002978, 9318515, 3038856, 14254582, + 15484958, 15967857, 13504461, 13657322, 14724513, 13955736, 5695315, 7330509, + 12630101, 6826854, 439712, 4609055, 13288878, 1309632, 4996398, 11392266, + 793740, 7653789, 2472670, 14641200, 5164364, 5482529, 10415855, 1629108, + 2012376, 13661123, 14655718, 9534083, 16637925, 2537745, 9787923, 12750103, + 4660370, 3283461, 14862772, 7034955, 6679872, 8918232, 6506913, 103649, + 6085577, 13324033, 14251613, 11058220, 11998181, 3100233, 468898, 7104918, + 12498413, 14408165, 1208514, 15712321, 3088687, 14778333, 3632503, 11151952, + 98896, 9159367, 8866146, 4780737, 4925758, 12362320, 4122783, 8543358, + 7056879, 10876914, 6282881, 1686625, 5100373, 4573666, 9265515, 13593840, + 5853060, 110, 4237111, 1576, 14344137, 4608332, 6590210, 13745050, + 10916568, 12340402, 7145275, 4417153, 2300360, 12079643, 7608534, 15238251, + 4947424, 7014722, 3984546, 7168073, 10759589, 16293080, 3757181, 4577717, + 5163790, 2488841, 4650617, 3650022, 5440654, 1814617, 6939232, 15540909, + 501788, 1060986, 5058235, 5078222, 3734500, 10762065, 390862, 5172712, + 1070780, 7904429, 1669757, 3439997, 2956788, 14944927, 12496638, 994152, + 8901173, 11827497, 4268056, 15725859, 1694506,
Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
On Thursday 15 February 2007 17:15, Maynard Johnson wrote: +void spu_set_profile_private(struct spu_context * ctx, void * profile_info, + struct kref * prof_info_kref, + void (* prof_info_release) (struct kref * kref)) +{ + ctx-profile_private = profile_info; + ctx-prof_priv_kref = prof_info_kref; + ctx-prof_priv_release = prof_info_release; +} +EXPORT_SYMBOL_GPL(spu_set_profile_private); I think you don't need the profile_private member here, if you just use container_of with ctx-prof_priv_kref in all users. Sorry, I don't follow. We want the profile_private to be stored in the spu_context, don't we? How else would I be able to do that? And besides, wouldn't container_of need the struct name of profile_private? SPUFS doesn't have access to the type. The idea was to have spu_get_profile_private return the kref pointer, and then change the user of that to do + if (!spu_info[spu_num] the_spu) { + spu_info[spu_num] = container_of( + spu_get_profile_private(the_spu-ctx), + struct cached_info, cache_kref); + if (spu_info[spu_num]) + kref_get(spu_info[spu_num]-cache_ref); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
On Thursday 15 February 2007 21:21, Carl Love wrote: I have done some quick measurements. The above method limits the loop to at most 2^16 iterations. Based on running the algorithm in user space, it takes about 3ms of computation time to do the loop 2^16 times. At the vary least, we need to put the resched in say every 10,000 iterations which would be about every 0.5ms. Should we do a resched more often? Yes, just to be on the safe side, I'd suggest to do it every 1000 iterations. Additionally we could up the size of the table to 512 which would reduce the maximum time to about 1.5ms. What do people think about increasing the table size? No, that won't help too much. I'd say 256 or 128 entries is the most we should have. As for using a logarithmic spacing of the precomputed values, this approach means that the space between the precomputed values at the high end would be much larger then 2^14, assuming 256 precomputed values. That means it could take much longer then 3ms to get the needed LFSR value for a large N. By evenly spacing the precomputed values, we can ensure that for all N it will take less then 3ms to get the value. Personally, I am more comfortable with a hard limit on the compute time then a variable time that could get much bigger then the 1ms threshold that Arnd wants for resched. Any thoughts? When using precomputed values on a logarithmic scale, I'd recommend just rounding to the closest value and accepting the relative inaccuracy, instead of using the precomputed value as the base and then calculating from there. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
On Thursday 15 February 2007 22:50, Paul E. McKenney wrote: Is this 1.5ms with interrupts disabled? This time period is problematic from a realtime perspective if so -- need to be able to preempt. No, interrupts should be enabled here. Still, 1.5ms is probably a little too long without a cond_resched() in case kernel preemption is disabled. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
On Friday 16 February 2007 01:32, Maynard Johnson wrote: config OPROFILE_CELL bool OProfile for Cell Broadband Engine depends on OPROFILE SPU_FS default y if ((SPU_FS = y OPROFILE = y) || (SPU_FS = m OPROFILE = m)) help Profiling of Cell BE SPUs requires special support enabled by this option. Both SPU_FS and OPROFILE options must be set 'y' or both be set 'm'. = Can anyone see a problem with any of this . . . or perhaps a suggestion of a better way? The text suggests it doesn't allow SPU_FS=y with OPROFILE=m, which I think should be allowed. I also don't see any place in the code where you actually use CONFIG_OPROFILE_CELL. Ideally, you should be able to have an oprofile_spu module that can be loaded after spufs.ko and oprofile.ko. In that case you only need config OPROFILE_SPU depends on OPROFILE SPU_FS default y and it will automatically build oprofile_spu as a module if one of the two is a module and won't build it if one of them is disabled. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Friday 16 February 2007 13:10, Eric W. Biederman wrote: To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/ throughout the entire kernel. Getting the arch specific code and the generic kernel infrastructure fixed and ready for that change looks like a pain but pretty doable. We did something like this a few years back on the s390 architecture, which happens to be lucky enough not to share any interrupt based drivers with any of the other architectures. It helped a lot on s390, and I think the change will be beneficial on others as well, e.g. powerpc already uses 'virtual' interrupt numbers to collapse the large (sparse) range of interrupt numbers into 512 unique numbers. This could easily be avoided if there was simply an array of irq_desc structures per interrupt controller. However, I also think we should maintain the old interface, and introduce a new one to deal only with those cases that benefit from it (MSI, Xen, powerpc VIO, ...). This means one subsystem can be converted at a time. I don't think there is a point converting the legacy ISA interrupts to a different interface, as the concept of IRQ numbers is part of the subsystem itself (if you want to call ISA a subsystem...). For PCI, it makes a lot more sense to use something else, considering that PCI interrupts are defined as 'pins' instead of 'lines', and while an interrupt pin is defined per slot, while the line is per bus, in a system with multiple PCI buses, the line is still not necessarily unique. One interface I could imagine for PCI devices would be /* generic functions */ int request_irq_desc(struct irq_desc *desc, irq_handler_t handler, unsigned long irqflags, const char *devname, void *dev_id); int free_irq_desc(struct irq_desc *desc, void *dev_id); /* legacy functions */ int request_irq(int irq, irq_handler_t handler, unsigned long irqflags, const char *devname, void *dev_id) { return request_irq_desc(lookup_irq_desc(irq), handler, irqflags, devname, dev_id); } int free_irq(int irq, void *dev_id) { return free_irq_desc(lookup_irq_desc(irq), dev_id); } /* pci specific */ struct irq_desc *pci_request_irq(struct pci_device *dev, int pin, irq_handler_t handler) { struct irq_desc *desc = pci_lookup_irq(dev, pin); int ret; if (!desc) return NULL; ret = request_irq_desc(desc, handler, IRQF_SHARED, dev-dev.bus_id, dev); if (ret 0) return NULL; return desc; } int pci_free_irq(struct pci_device *dev, int pin) { return free_irq_desc(pci_lookup_irq(dev, pin), dev); } Now I don't know enough about MSI yet, but I could imagine that something along these lines would work as well, and we could simply require all drivers that want to support MSI to use the new interfaces. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Friday 16 February 2007 20:52, Russell King wrote: On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote: We did something like this a few years back on the s390 architecture, which happens to be lucky enough not to share any interrupt based drivers with any of the other architectures. What you're proposing is looking similar to a proposal I put forward some 4 years ago, but was rejected. Maybe times have changed and there's a need for it now. Yes, I think times have changed, with the increased popularity of MSI and paravirtualized devices. A few points on your old proposal though: - Doing it per architecture no longer sounds feasible, I think it would need to be done per subsystem so that the drivers can be adapted to a new interface, and most drivers are used across multiple architectures. - struct irq sounds much more fitting than struct irq_desc - creating new irq_foo() functions to replace foo_irq() also sounds right. - I don't see the point in splitting request_irq into irq_request and irq_register. - doing subsystem specific abstractions ideally allows the drivers to not even need to worry about the irq pointer, significantly simplifying the interface for register/unregister. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Friday 16 February 2007 23:37, Benjamin Herrenschmidt wrote: You might want to have a look at the powerpc API with it's remaping capabilities. It's very nice for handling multiple domain spaces. It might be of some use for you. I don't consider the powerpc virtual IRQs a solution for the problem. While I believe you did the right thing for powerpc with generalizing this over all its platforms, it really isn't more than a workaround for the problem that we can't deal well with the static irq_desc array. When that problem is now getting worse on other architectures, we should try to get it right on all of them, rather than spreading the workaround further. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 12/44 take 2] [UBI] allocation unit implementation
On Saturday 17 February 2007 17:55, Artem Bityutskiy wrote: diff -auNrp tmp-from/drivers/mtd/ubi/alloc.c tmp-to/drivers/mtd/ubi/alloc.c +#include ubi.h +#include alloc.h +#include io.h +#include background.h +#include wl.h +#include debug.h +#include eba.h +#include scan.h I don't see much point in having one local header for each of these, you could simply put all of the declarations into one header in the ubi directory. + +#define BGT_WORK_SLAB_NAMEubi_bgt_work_slab +#define WL_ERASE_WORK_SLAB_NAME ubi_wl_erase_work_slab +#define WL_ENTRY_SLAB_NAMEubi_wl_entry_slab +#define WL_PROT_ENTRY_SLAB_NAME ubi_wl_prow_entry_slab +#define EBA_LTREE_ENTRY_SLAB_NAME ubi_eba_ltree_entry_slab +#define SCAN_EB_SLAB_NAME ubi_scan_leb +#define SCAN_VOLUME_SLAB_NAME ubi_scan_volume These macros seem rather pointless, each of them is only used once, and the macro name directly corresponds to the contents. +static struct kmem_cache *bgt_work_slab; +static struct kmem_cache *wl_erase_work_slab; +static struct kmem_cache *wl_entries_slab; +static struct kmem_cache *wl_prot_entry_slab; +static struct kmem_cache *eba_ltree_entry_slab; +static struct kmem_cache *scan_eb_slab; +static struct kmem_cache *scan_volume_slab; Do you really need all these slab caches? If a cache only contains a small number of objects, e.g. one per volume, then you're much better off using a regular kmalloc. +void *ubi_kzalloc(size_t size) +{ + void *ret; + + ret = kzalloc(size, GFP_KERNEL); + if (unlikely(!ret)) { + ubi_err(cannot allocate %zd bytes, size); + dump_stack(); + return NULL; + } + + return ret; +} + +void *ubi_kmalloc(size_t size) +{ + void *ret; + + ret = kmalloc(size, GFP_KERNEL); + if (unlikely(!ret)) { + ubi_err(cannot allocate %zd bytes, size); + dump_stack(); + return NULL; + } + + return ret; +} + +void ubi_kfree(const void *obj) +{ + if (unlikely(!obj)) + return; + kfree(obj); +} These look somewhat too complex. Don't introduce your own generic infrastructure if you can help it. IIRC, when kmalloc fails, you already get the full stack trace from the buddy allocator, so this is just duplication. Better use the regular kzalloc/kfree calls directly. +struct ubi_ec_hdr *ubi_zalloc_ec_hdr(const struct ubi_info *ubi) +{ + struct ubi_ec_hdr *ec_hdr; + const struct ubi_io_info *io = ubi-io; + + ec_hdr = kzalloc(io-ec_hdr_alsize, GFP_KERNEL); + if (unlikely(!ec_hdr)) { + ubi_err(cannot allocate %d bytes, io-ec_hdr_alsize); + dump_stack(); + return NULL; + } + + return ec_hdr; +} + +void ubi_free_ec_hdr(const struct ubi_info *ubi, struct ubi_ec_hdr *ec_hdr) +{ + if (unlikely(!ec_hdr)) + return; + kfree(ec_hdr); +} same for this and the others. Unless the allocation is done in many places in the code from a single slab cache, just call kmem_cache_alloc or kmalloc directly. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/44 take 2] [UBI] debug unit implementation
On Saturday 17 February 2007 17:55, Artem Bityutskiy wrote: diff -auNrp tmp-from/drivers/mtd/ubi/debug.c tmp-to/drivers/mtd/ubi/debug.c --- tmp-from/drivers/mtd/ubi/debug.c1970-01-01 02:00:00.0 +0200 +++ tmp-to/drivers/mtd/ubi/debug.c 2007-02-17 18:07:26.0 +0200 This whole file looks like it can be removed, as nothing in here is really relevant for regular operation. I'm sure that much of it was a good help in developing the code and finding the bugs in here, but why would you want to merge it into the mainline kernel? Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/44 take 2] [UBI] internal common header
On Saturday 17 February 2007 17:54, Artem Bityutskiy wrote: +/* Maximum number of supported UBI devices */ +#define UBI_MAX_INSTANCES 32 Does this need to be limited? +/* UBI messages printk level */ +#define UBI_MSG_LEVEL KERN_INFO +#define UBI_WARN_LEVEL KERN_WARNING +#define UBI_ERR_LEVEL KERN_ERR + +/* Prefixes of UBI messages */ +#define UBI_MSG_PREF UBI: +#define UBI_WARN_PREF UBI warning: +#define UBI_ERR_PREF UBI error: + +/* Normal UBI messages */ +#define ubi_msg(fmt, ...) \ + printk(UBI_MSG_LEVEL UBI_MSG_PREF fmt \n, ##__VA_ARGS__) +/* UBI warning messages */ +#define ubi_warn(fmt, ...) \ + printk(UBI_WARN_LEVEL UBI_WARN_PREF %s: fmt \n, __FUNCTION__, \ +##__VA_ARGS__) +/* UBI error messages */ +#define ubi_err(fmt, ...) \ + printk(UBI_ERR_LEVEL UBI_ERR_PREF %s fmt \n, __FUNCTION__,\ +##__VA_ARGS__) You shouldn't need these helpers, just use the regular dev_dbg, dev_info and related macros. +/** + * struct ubi_info - UBI device description structure + * + * @ubi_num: number of the UBI device + * @io: input/output unit information + * @bgt: background thread unit information + * @wl: wear-leveling unit information + * @beb: bad eraseblock handling unit information + * @vmt: volume management unit information + * @ivol: internal volume management unit information + * @vtbl: volume table unit information + * @acc: accounting unit information + * @upd: update unit information + * @eba: EBA unit information + * @uif: user interface unit information + */ +struct ubi_info { + int ubi_num; + struct ubi_io_info *io; + struct ubi_bgt_info *bgt; + struct ubi_wl_info *wl; + struct ubi_beb_info *beb; + struct ubi_vmt_info *vmt; + struct ubi_ivol_info *ivol; + struct ubi_vtbl_info *vtbl; + struct ubi_acc_info *acc; + struct ubi_upd_info *upd; + struct ubi_eba_info *eba; + struct ubi_uif_info *uif; +}; I don't know what went wrong here, but this does not at all look ok. The members in here probably should all be part of the ubi_info structure itself. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 41/44 take 2] [UBI] gluebi unit header
On Saturday 17 February 2007 17:57, Artem Bityutskiy wrote: + * This unit is responsible for emulating MTD devices on top of UBI devices. + * This sounds strange, but it is in fact quite useful to make legacy software + * work on top of UBI. New software should use native UBI API instead. + * + * Gluebi emulated MTD devices of MTD_UBIVOLUME type. Their minimal I/O unit + * size (mtd-writesize) is equivalent to the underlying flash minimal I/O + * unit. The eraseblock size is equivalent to the logical UBI volume eraseblock + * size. This approach doesn't seem to make sense at all. If the MTD device interface is flawed, the right approach should be to fix that instead. After all, there are not many users of the MTD interface, so you should be able to adapt them. In fact, I would expect that there is much more reason to merge the existing MTD interface with the block interface in the kernel, but you now introduce a third interface that is unrelated to the first two, and make another conversion to convert it back? Let's assume I want to use the wear levelling capabilities of UBI on top of an SD card, and use the ext3 file system on top of it. I get a stack of 1. MMC 2. block2mtd 3. UBI 4. gluebi 5. mtdblock 6. VFS when in an ideal world, it should just be 1. MMC 2. UBI 3. VFS Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/44 take 2] [UBI] debug unit header
On Saturday 17 February 2007 17:55, Artem Bityutskiy wrote: + +/** + * UBI debugging unit. + * + * UBI provides rich debugging capabilities which are implemented in + * this unit. Stop right here. You should be doing one thing and do it right. Since the point of your patches is to do volume management for MTD, it should do just that. If you feel that Linux needs rich debugging capabilities, then submit a patch for that independent of UBI. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/44 take 2] [UBI] user-space API header
On Saturday 17 February 2007 17:54, Artem Bityutskiy wrote: +struct ubi_mkvol_req { + int32_t vol_id; + int32_t alignment; + int64_t bytes; + int8_t vol_type; + int8_t padding[9]; + int16_t name_len; + __user const char *name; +} __attribute__ ((packed)); This structure is not suitable for an ioctl call, because it has incompatible layout between 32 and 64 bit processes. The easiest fix for this would be to change the 'name' field to an array instead of a pointer. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 41/44 take 2] [UBI] gluebi unit header
On Sunday 18 February 2007 03:04, Josh Boyer wrote: No, the MTD interface isn't flawed. gluebi is present to make things like JFFS2 work on top of UBI volumes with very little adaptations. If you go changing _every_ MTD user to now use either an MTD device or a native UBI device, then the code for those users just gets bloated. Right, that was my point. If the MTD API in the kernel is not flawed, why do we need the 'native' UBI interface? Just merge gluebi into UBI and get rid of the extra abstraction. Assuming your SD card isn't doing wear-leveling itself within the device, yes that is what you would get. While probably all modern SD cards have some amount of wear leveling built in, I wouldn't want to rely on that for anything but the simple large-file-on-fatfs (jpeg or mp3) case. Using UBI on top of the native wear-leveling sounds like the right solution. Or you could do something slightly more sane and use: 1. MMC 2. block2mtd 3. JFFS2 Not on a 4GB SD medium, with the current jffs2 version. The problem is that jffs2 doesn't scale that well, so you want a different fs. Since logfs isn't stable yet, you end up with something like ext3, which in turn means that you need a UBI-like concept to avoid wearing out the blocks that store your metadata. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 41/44 take 2] [UBI] gluebi unit header
On Sunday 18 February 2007 04:02:17 Josh Boyer wrote: On Sun, Feb 18, 2007 at 03:15:23AM +0100, Arnd Bergmann wrote: On Sunday 18 February 2007 03:04, Josh Boyer wrote: No, the MTD interface isn't flawed. gluebi is present to make things like JFFS2 work on top of UBI volumes with very little adaptations. If you go changing _every_ MTD user to now use either an MTD device or a native UBI device, then the code for those users just gets bloated. Right, that was my point. If the MTD API in the kernel is not flawed, why do we need the 'native' UBI interface? Just merge gluebi into UBI and get rid of the extra abstraction. That suggestion came up several times. gluebi represents a compromise between the two groups. IIRC, the issue was that representing UBI volumes as MTD devices only makes sense in the dynamic volume case. Static UBI volumes require special write/update handling and so there was a need for a native interface anyway. Which brings be back to my original point ;-) I'm sure this has been discussed before, but I'd still like to understand what is so special with 'static UBI volumes' that they can't be used with a slightly extended MTD interface. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/13] signal/timer/event fds v6 - anonymous inode source ...
On Friday 16 March 2007 01:22:15 Davide Libenzi wrote: + +static int ainofs_delete_dentry(struct dentry *dentry); +static struct inode *aino_getinode(void); +static struct inode *aino_mkinode(void); +static int ainofs_get_sb(struct file_system_type *fs_type, int flags, + const char *dev_name, void *data, struct vfsmount *mnt); + In general, it would be good if you could just reorder your functions so that you don't need any forward declarations like these. It makes reviewing from bottom to top a little easier and it becomes obvious that there are no recursions in the code. +static struct vfsmount *aino_mnt __read_mostly; +static struct inode *aino_inode; +static struct file_operations aino_fops = { }; Iirc, file_operations should be const. +int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile, +char const *name, const struct file_operations *fops, void *priv) +{ Since this is meant to be a generic interface that can be used from other subsystems, a kerneldoc style comment would be nice +static int __init aino_init(void) +{ + + if (register_filesystem(aino_fs_type)) + goto epanic; + + aino_mnt = kern_mount(aino_fs_type); + if (IS_ERR(aino_mnt)) + goto epanic; + + aino_inode = aino_mkinode(); + if (IS_ERR(aino_inode)) + goto epanic; + + return 0; + +epanic: + panic(aino_init() failed\n); +} panic() is a little harsh from a loadable module. If you mean the aino support to be used as a module, this should probably just return an error. +static void __exit aino_exit(void) +{ + iput(aino_inode); + unregister_filesystem(aino_fs_type); + mntput(aino_mnt); +} but since the Makefile always has it as built-in, maybe you should instead just kill the exit function and use fs_initcall instead of init_module(). Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/13] signal/timer/event fds v6 - signalfd core ...
On Friday 16 March 2007 01:22:15 Davide Libenzi wrote: + +static struct sighand_struct *signalfd_get_sighand(struct signalfd_ctx *ctx, + unsigned long *flags); +static void signalfd_put_sighand(struct signalfd_ctx *ctx, + struct sighand_struct *sighand, + unsigned long *flags); +static void signalfd_cleanup(struct signalfd_ctx *ctx); +static int signalfd_close(struct inode *inode, struct file *file); +static unsigned int signalfd_poll(struct file *file, poll_table *wait); +static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo, + siginfo_t const *kinfo); +static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos); + see my comment about forward declarations in the previous mail +asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask) +{ + int error; + unsigned long flags; + sigset_t sigmask; + struct signalfd_ctx *ctx; + struct sighand_struct *sighand; + struct file *file; + struct inode *inode; + + error = -EINVAL; + if (sizemask != sizeof(sigset_t) || + copy_from_user(sigmask, user_mask, sizeof(sigmask))) + goto err_exit; sizeof(sigset_t) may be different for native and 32-bit compat code. It would be good if you could handle sizemask==4 sizeof(sigset_t)==8 in this code, so that there is no need for an extra compat_sys_signalfd function. + if ((sighand = signalfd_get_sighand(ctx, flags)) != NULL) { + if (next_signal(ctx-tsk-pending, ctx-sigmask) 0 || + next_signal(ctx-tsk-signal-shared_pending, + ctx-sigmask) 0) + events |= POLLIN; + signalfd_put_sighand(ctx, sighand, flags); + } else + events |= POLLIN; + + return events; +} I never really understood the events mask, but other subsystems often use (POLLIN | POLLRDNORM) instead of just POLLIN. Is there a reason for not returning POLLRDNORM here? +static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo, + siginfo_t const *kinfo) +{ + long err; + + err = __clear_user(uinfo, sizeof(*uinfo)); + + /* + * If you change siginfo_t structure, please be sure + * this code is fixed accordingly. + */ + err |= __put_user(kinfo-si_signo, uinfo-signo); + err |= __put_user(kinfo-si_errno, uinfo-err); + err |= __put_user((short)kinfo-si_code, uinfo-code); + switch (kinfo-si_code __SI_MASK) { + case __SI_KILL: + err |= __put_user(kinfo-si_pid, uinfo-pid); + err |= __put_user(kinfo-si_uid, uinfo-uid); + break; + case __SI_TIMER: + err |= __put_user(kinfo-si_tid, uinfo-tid); + err |= __put_user(kinfo-si_overrun, uinfo-overrun); + err |= __put_user(kinfo-si_ptr, uinfo-svptr); + break; + case __SI_POLL: + err |= __put_user(kinfo-si_band, uinfo-band); + err |= __put_user(kinfo-si_fd, uinfo-fd); + break; + case __SI_FAULT: + err |= __put_user(kinfo-si_addr, uinfo-addr); +#ifdef __ARCH_SI_TRAPNO + err |= __put_user(kinfo-si_trapno, uinfo-trapno); +#endif + break; + case __SI_CHLD: + err |= __put_user(kinfo-si_pid, uinfo-pid); + err |= __put_user(kinfo-si_uid, uinfo-uid); + err |= __put_user(kinfo-si_status, uinfo-status); + err |= __put_user(kinfo-si_utime, uinfo-utime); + err |= __put_user(kinfo-si_stime, uinfo-stime); + break; + case __SI_RT: /* This is not generated by the kernel as of now. */ + case __SI_MESGQ: /* But this is */ + err |= __put_user(kinfo-si_pid, uinfo-pid); + err |= __put_user(kinfo-si_uid, uinfo-uid); + err |= __put_user(kinfo-si_ptr, uinfo-svptr); + break; + default: /* this is just in case for now ... */ + err |= __put_user(kinfo-si_pid, uinfo-pid); + err |= __put_user(kinfo-si_uid, uinfo-uid); + break; + } + + return err ? -EFAULT: sizeof(*uinfo); +} Doing it this way looks rather inefficient to me. I think it's better to just prepare the signalfd_siginfo on the stack and do a single copy_to_user. Also, what's the reasoning behind defining a new structure instead of just returning siginfo_t? Sure siginfo_t is ugly but it is a well-defined structure and users already deal with the problems it causes. +static void __exit signalfd_exit(void) +{ + kmem_cache_destroy(signalfd_ctx_cachep); +} + +module_init(signalfd_init); +module_exit(signalfd_exit); + +MODULE_LICENSE(GPL); Since this file defines a syscall, it can't
Re: [patch 6/13] signal/timer/event fds v6 - timerfd core ...
On Friday 16 March 2007 01:22:15 Davide Libenzi wrote: This patch introduces a new system call for timers events delivered though file descriptors. This allows timer event to be used with standard POSIX poll(2), select(2) and read(2). As a consequence of supporting the Linux f_op-poll subsystem, they can be used with epoll(2) too. Half of my comments about signalfd also apply to the code in here. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/13] signal/timer/event fds v6 - signalfd core ...
On Saturday 17 March 2007 22:35:08 Arnd Bergmann wrote: Also, what's the reasoning behind defining a new structure instead of just returning siginfo_t? Sure siginfo_t is ugly but it is a well-defined structure and users already deal with the problems it causes. Ok, found the answer myself, fops-read() must not do the conversion to compat_siginfo_t on a 64 bit kernel, that would just be too ugly for words. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/13] signal/timer/event fds v6 - signalfd core ...
On Sunday 18 March 2007, Davide Libenzi wrote: bah, __put_user is basically a move, so I don't think that efficency would be that different (assuming that it'd matter in this case). The only thing many __put_user do, is increase the exception table sizes. The cost of user access functions varies a lot depending on the architectures. Those platforms with a 4G/4G split e.g. need to do more than a simple move, and for s390 it may even come down to an indirect function call, which incurs significant register pressure. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/4] Blackfin: architecture update patch
On Wednesday 21 March 2007, Wu, Bryan wrote: @@ -97,6 +97,11 @@ static inline void leds_switch(int flag) /* * The idle loop on BFIN */ +#ifdef CONFIG_IDLE_L1 +static inline void default_idle(void)__attribute__((l1_text)); +void cpu_idle(void)__attribute__((l1_text)); +#endif + A forward declaration for an inline function seems rather pointless. Moreover, marking default_idle both l1_text and inline seems contradicting, right? diff -purN linux-2.6-orig/include/asm-blackfin/asm-offsets.h linux-2.6/include/asm-blackfin/asm-offsets.h --- linux-2.6-orig/include/asm-blackfin/asm-offsets.h 1970-01-01 08:00:00.0 +0800 +++ linux-2.6/include/asm-blackfin/asm-offsets.h 2007-03-21 15:21:10.0 +0800 @@ -0,0 +1,89 @@ +#ifndef __ASM_OFFSETS_H__ +#define __ASM_OFFSETS_H__ +/* + * DO NOT MODIFY. + * + * This file was generated by Kbuild This file should be in the exclude list for your diff, it is generally not shipped with the kernel sources. +#ifndef __ASSEMBLY__ + +static inline unsigned char readb(volatile unsigned char *addr) +{ The prototype for this should normally contain an __iomem. This kind of error is normally caught by running 'make C=1' to use the 'sparse' tool. If you have not run that yet, you should start to, as it finds a number of common bugs. +/* + * Map some physical address range into the kernel address space. + */ +static inline void *__ioremap(unsigned long physaddr, unsigned long size, + int cacheflag) +{ + return (void *)physaddr; +} Likewise, this should return an __iomem pointer. The rest of the patch looks good to me. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/4] Blackfin: architecture update patch
On Wednesday 21 March 2007, Wu, Bryan wrote: I sent 4 mail to LKML, but this one lost. Arnd, can you receive this email from LKML. The mail was around 400kb, while the limit for lkml is 100kb. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/4] Blackfin: architecture update patch
On Wednesday 21 March 2007, Wu, Bryan wrote: 1) Some issues are fixed according to LKML patch review. 2) Remove not supported BF535 code 3) Fixed some bugs from blackfin.uclinux.org SVN update Here is the updated patch for 2.6.21-rc4-mm1 One rather general but important comment: You need to get used to providing smaller, one fix per mail, patches. As long as the full tree is waiting in -mm, this is not as important, as I assume that Andrew will fold the whole architecture support into a big patch before submitting to Linus, but as soon as it's in, such big patches will not be acceptable any more. If you're not already doing it, look into how 'quilt' or similar tools help you with this. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: remove-unused-header-file-include-linux-elfnoteh.patch
On Wednesday 21 March 2007 22:36:42 Jeremy Fitzhardinge wrote: Please don't. We need it. BTW, I didn't see this one go by, and I couldn't see it searching around. Did it get posted to lkml? I think it was only on the janitor list. It was considered obviously correct since it does not get installed by headers_installed and did not seem to be used anywhere in the kernel. Could you explain how this file is used in the kernel? Robert probably wants to update his script to handle this correctly. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BLK_DEV_MD with CONFIG_NET
On Wednesday 21 March 2007 13:02:46 Sam Ravnborg wrote: Anything which is every exported to modules, which ought to be the situation in this case, should be obj-y not lib-y right? That is also my understanding of lib-y - I should update makefiles.txt to reflect this.. Strictly speaking, it could well be obj-m instead of obj-y if it is _only_ used by modules. OTOH, it makes the Makefile a lot simpler to not optimize for this case. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] Blackfin arch: add kdebug header file
I can see nothing wrong with your patches, but you should make the patch descriptions a little clearer: On Monday 26 March 2007, Wu, Bryan wrote: Hi folks, No need for this line, if it's there, Andrew just needs to remove it from the changelog. This patch adds kdebug.h header file to blackfin architecture. This line is completely redundant, as it states the same information as the subject. You should give some background information here, like: kdebug.h is needed for kprobes. For trivial patches where the subject already tells the whole story (e.g. 'remove redundant declaration of foo'), just leave out the description entirely except for the Signed-off-by. Arnd you can even leave out the description - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Questions about porting perfmon2 to powerpc
On Thursday 05 April 2007, Kevin Corry wrote: First, the stock 2.6.20 kernel has a prototype in include/linux/smp.h for a function called smp_call_function_single(). However, this routine is only implemented on i386, x86_64, ia64, and mips. Perfmon2 apparently needs to call this to run a function on a specific CPU. Powerpc provides an smp_call_function() routine to run a function on all active CPUs, so I used that as a basis to add an smp_call_function_single() routine. I've included the patch below and was wondering if it looked like a sane approach. The function itself looks good, but since it's very similar to the existing smp_call_function(), you should probably try to share some of the code, e.g. by making a helper function that gets an argument to decide whether to run on a specific CPU or on all CPUs. Next, we ran into a problem related to Perfmon2 initialization and sysfs. The problem turned out to be that the powerpc version of topology_init() is defined as an __initcall() routine, but Perfmon2's initialization is done as a subsys_initcall() routine. Thus, Perfmon2 tries to initialize its sysfs information before some of the powerpc cpu information has been initialized. However, on all other architectures, topology_init() is defined as a subsys_initcall() routine, so this problem was not seen on any other platforms. Changing the powerpc version of topology_init() to a subsys_initcall() seems to have fixed the bug. However, I'm not sure if that is going to cause problems elsewhere in the powerpc code. I've included the patch below (after the smp-call-function-single patch). Does anyone know if this change is safe, or if there was a specific reason that topology_init() was left as an __initcall() on powerpc? In general, it's better to do initcalls as late as possible, so __initcall() is preferred over subsys_initcall() if both work. Have you tried doing it the other way and starting perfmon2 from a regular __initcall()? Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Questions about porting perfmon2 to powerpc
On Thursday 05 April 2007, Kevin Corry wrote: For the moment, I made the change to topology_init() since it was the simplest fix to get things working. I have considered switching the perfmon2 initialization to __initcall(), but there are apparently some timing issues with ensuring that the perfmon2 core code is initialized before any of its sub-modules. Since they could all be compiled statically in the kernel, I'm not sure if there's a way to ensure the ordering of calls within a single initcall level. I'll need to ask Stephane if there were any other reasons why subsys_initcall() was used for perfmon2. If they all come from the same directory, you can simply order them in the Makefile. If a module in arch/ needs to be initialized after one in drivers/, that's not possible though, and changing topology_init() should be the best option. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/01] New FBDev driver for Intel Vermilion Range
On Thursday 05 April 2007, Alan Hourihane wrote: As for the above, I've noticed that drivers/video/epson1355fb.c also has this wording and is under the GPL. Yes, many files have it, but that doesn't make it right ;-) Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] merge compat_ioctl.h into compat_ioctl.c
On Sunday 08 April 2007, Christoph Hellwig wrote: Now that there is no arch-specific compat ioctl handling left there is not point in having a separate copat_ioctl.h, so merge it into compat_ioctl.c Yes, definitely a good idea. Signed-off-by: Christoph Hellwig [EMAIL PROTECTED] Acked-by: Arnd Bergmann [EMAIL PROTECTED] On a similar subject, how about merging include/linux/ioctl32.h and the ioctl bits of fs/compat.c into fs/compat_ioctl.c as well to make it completely self-contained? Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/44 take 2] [UBI] user-space API header
On Tuesday 20 February 2007 14:07, Artem Bityutskiy wrote: This structure is not suitable for an ioctl call, because it has incompatible layout between 32 and 64 bit processes. The easiest fix for this would be to change the 'name' field to an array instead of a pointer. Will be fixed thanks. Just out of curiosity, could you please provide an example when this may be a problem. On a 64 bit process with a 32 bit user app calling this ioctl, the kernel would read the pointer value from the 8 bytes at the end, which means that it will read four bytes after the end of the structure and interpret whatever it finds as a pointer, instead of using only the first four bytes as the lower half. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
On Thursday 22 February 2007, Carl Love wrote: This patch updates the existing arch/powerpc/oprofile/op_model_cell.c to add in the SPU profiling capabilities. In addition, a 'cell' subdirectory was added to arch/powerpc/oprofile to hold Cell-specific SPU profiling code. There was a significant amount of whitespace breakage in this patch, which I cleaned up. The patch below consists of the other things I changed as a further cleanup. Note that I changed the format of the context switch record, which I found too complicated, as I described on IRC last week. Arnd -- Subject: cleanup spu oprofile code From: Arnd Bergmann [EMAIL PROTECTED] This cleans up some of the new oprofile code. It's mostly cosmetic changes, like way multi-line comments are formatted. The most significant change is a simplification of the context-switch record format. It does mean the oprofile report tool needs to be adapted, but I'm sure that it pays off in the end. Signed-off-by: Arnd Bergmann [EMAIL PROTECTED] Index: linux-2.6/arch/powerpc/oprofile/cell/spu_task_sync.c === --- linux-2.6.orig/arch/powerpc/oprofile/cell/spu_task_sync.c +++ linux-2.6/arch/powerpc/oprofile/cell/spu_task_sync.c @@ -61,11 +61,12 @@ static void destroy_cached_info(struct k static struct cached_info * get_cached_info(struct spu * the_spu, int spu_num) { struct kref * ref; - struct cached_info * ret_info = NULL; + struct cached_info * ret_info; if (spu_num = num_spu_nodes) { printk(KERN_ERR SPU_PROF: %s, line %d: Invalid index %d into spu info cache\n, __FUNCTION__, __LINE__, spu_num); + ret_info = NULL; goto out; } if (!spu_info[spu_num] the_spu) { @@ -89,9 +90,9 @@ static struct cached_info * get_cached_i static int prepare_cached_spu_info(struct spu * spu, unsigned int objectId) { - unsigned long flags = 0; + unsigned long flags; struct vma_to_fileoffset_map * new_map; - int retval = 0; + int retval; struct cached_info * info; /* We won't bother getting cache_lock here since @@ -112,6 +113,7 @@ prepare_cached_spu_info(struct spu * spu printk(KERN_ERR SPU_PROF: %s, line %d: create vma_map failed\n, __FUNCTION__, __LINE__); + retval = -ENOMEM; goto err_alloc; } new_map = create_vma_map(spu, objectId); @@ -119,6 +121,7 @@ prepare_cached_spu_info(struct spu * spu printk(KERN_ERR SPU_PROF: %s, line %d: create vma_map failed\n, __FUNCTION__, __LINE__); + retval = -ENOMEM; goto err_alloc; } @@ -144,7 +147,7 @@ prepare_cached_spu_info(struct spu * spu goto out; err_alloc: - retval = -1; + kfree(info); out: return retval; } @@ -215,11 +218,9 @@ static inline unsigned long fast_get_dco static unsigned long get_exec_dcookie_and_offset(struct spu * spu, unsigned int * offsetp, unsigned long * spu_bin_dcookie, - unsigned long * shlib_dcookie, unsigned int spu_ref) { unsigned long app_cookie = 0; - unsigned long * image_cookie = NULL; unsigned int my_offset = 0; struct file * app = NULL; struct vm_area_struct * vma; @@ -252,24 +253,17 @@ get_exec_dcookie_and_offset(struct spu * my_offset, spu_ref, vma-vm_file-f_dentry-d_name.name); *offsetp = my_offset; - if (my_offset == 0) - image_cookie = spu_bin_dcookie; - else if (vma-vm_file != app) - image_cookie = shlib_dcookie; break; } - if (image_cookie) { - *image_cookie = fast_get_dcookie(vma-vm_file-f_dentry, + *spu_bin_dcookie = fast_get_dcookie(vma-vm_file-f_dentry, vma-vm_file-f_vfsmnt); - pr_debug(got dcookie for %s\n, -vma-vm_file-f_dentry-d_name.name); - } + pr_debug(got dcookie for %s\n, vma-vm_file-f_dentry-d_name.name); - out: +out: return app_cookie; - fail_no_image_cookie: +fail_no_image_cookie: printk(KERN_ERR SPU_PROF: %s, line %d: Cannot find dcookie for SPU binary\n, __FUNCTION__, __LINE__); @@ -285,18 +279,18 @@ get_exec_dcookie_and_offset(struct spu * static int process_context_switch(struct spu * spu, unsigned int objectId) { unsigned long flags; - int retval = 0; - unsigned int offset = 0; - unsigned long spu_cookie = 0, app_dcookie = 0, shlib_cookie = 0; + int retval; + unsigned int offset
Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
On Tuesday 27 February 2007, Maynard Johnson wrote: I have applied the cleanup patch that Arnd sent, but had to fix up a few things: - Bug fix: Initialize retval in spu_task_sync.c, line 95, otherwise OProfile this function returns non-zero and OProfile fails. - Remove unused codes in include/linux/oprofile.h - Compile warnings: Initialize offset and spu_cookie at lines 283 and 284 in spu_task_sync.c With these changes and some userspace changes that were necessary to correspond with Arnd's changes, our testing was successful. A fixup patch is attached. The patch does not contain any of the metadata I need to apply it (subject, description, signed-off-by). @@ -280,8 +280,8 @@ static int process_context_switch(struct { unsigned long flags; int retval; - unsigned int offset; - unsigned long spu_cookie, app_dcookie; + unsigned int offset = 0; + unsigned long spu_cookie = 0, app_dcookie; retval = prepare_cached_spu_info(spu, objectId); if (retval) goto out; No, this is wrong. Leaving the variables uninitialized at least warns you about the bug you have in this function: when there is anything wrong, you just continue writing the record with zero offset and dcookie values in it. Instead, you should get handle the error condition somewhere down the code. It's harmless most of the time, but you really should not be painting over your bugs by blindly initializing variables. diff -paur linux-orig/include/linux/oprofile.h linux-new/include/linux/oprofile.h --- linux-orig/include/linux/oprofile.h 2007-02-27 14:41:29.0 -0600 +++ linux-new/include/linux/oprofile.h 2007-02-27 14:43:18.0 -0600 @@ -36,9 +36,6 @@ #define XEN_ENTER_SWITCH_CODE 10 #define SPU_PROFILING_CODE 11 #define SPU_CTX_SWITCH_CODE12 -#define SPU_OFFSET_CODE13 -#define SPU_COOKIE_CODE14 -#define SPU_SHLIB_COOKIE_CODE 15 struct super_block; struct dentry; Right, I forgot about this. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Tuesday 27 February 2007, Eric W. Biederman wrote: * Add a variation of the API in interrupt.h that uses struct irq *irq instead of unsigned int irq Probably replacing request_irq with irq_request or something trivial like that. This will need to touch all of different irq implementation back ends, but only very lightly. * Convert the generic irq code to use struct irq * everywhere it current uses unsigned int irq. * Start on the conversions of drivers and subsystems picking on the easy ones first :) Introducing the irq_request() etc. functions that take a struct irq* instead of an int sounds good, but I'd hope we can avoid using those in device drivers and do a separate abstraction for each bus_type that deals with interrupts. I'm not sure if that's possible for each bus_type, but the ones I have worked with in the past should allow that: pci: each device/function has a unique irq, drivers need not know about it afaics. isa/pnp: numbers from 1 to 15 are the right abstraction here, that how isa has worked for ages. s390: got rid of irq numbers already ofw: an open firmware device can have a number of interrupts, but like PCI, the driver only needs to know things like 'first irq of this device', not how it's connected ps3: irqs are requested from the firmware for each device, this can happen under the covers. mmc, usb, phy, ieee1394: these already have a higl-level abstraction for interrupt events platform: dunno, probably these really should use the struct irq directly eisa, mca, pcmcia, zorro, ...: no idea, but possibly similar to PCI. Note that we can even start converting device drivers first, before moving away from irq numbers. A typical PCI driver should get somewhat simpler by the conversion, and when they are all converted, we can replace pci_dev-irq with a struct irq* under the covers. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays.
On Wednesday 28 February 2007, Eric W. Biederman wrote: Arnd Bergmann [EMAIL PROTECTED] writes: Introducing the irq_request() etc. functions that take a struct irq* instead of an int sounds good, but I'd hope we can avoid using those in device drivers and do a separate abstraction for each bus_type that deals with interrupts. I'm not sure if that's possible for each bus_type, but the ones I have worked with in the past should allow that: pci: each device/function has a unique irq, drivers need not know about it afaics. Then there is msi and with msi-x you can have up to 4K irqs. I have to admit I still don't really understand how this works at all. Can a driver that uses msi-x have different handlers for each of those interrupts registered simultaneously? I would expect that instead there should be only one 'struct irq' for the device, with the handler getting a 12 bit number argument. s390: got rid of irq numbers already Yes. I should really look at that more and see if I could bring s390 into the generic irq code with my planned changes. I don't think there is much point in changing the s390 code, but the way it is solved there may be interesting for other buses as well. The interrupt handler there is not being registered explicitly, but is part of the driver (in case of subchannel) or of the device (in case of ccw_device) data structure. Similarly, in a pci device, one could imagine that the struct pci_driver contains a irq_handler_t member that is registered from the pci_device_probe() function if present. Note that we can even start converting device drivers first, before moving away from irq numbers. A typical PCI driver should get somewhat simpler by the conversion, and when they are all converted, we can replace pci_dev-irq with a struct irq* under the covers. Reasonable if it is easy and straight forward. Something like pci_request_irq(dev,) and the helper looks at dev-irq under the covers and calls request_irq or whatever makes sense. Is this what you are thinking. Examples would help me here. Ok, I had an example in on of my previous posts, but based on the discussion since then, it has become significantly simpler, basically reducing the work to struct irq *pci_irq_request(struct pci_device *dev, irq_handler_t handler) { if (!dev-irq) return -ENODEV; return irq_request(irq, handler, IRQF_SHARED, dev-driver-name, dev); } int pci_irq_free(struct pci_device *dev) { return irq_free(dev-irq, dev); } The most significant change of this to the current code would be that we can pass arguments down to irq_request automatically, e.g. the irq handler can always get the pci_device as its dev_id. For talking to user space I expect we will have numbers for a long time to come yet. I was wondering about that. Do you only mean /proc/interrupts or are there other user interfaces we need to worry about? For /proc/interrupts, what could break if we have interrupt numbers only local to each controller and potentially duplicate numbers in the list? It's good to be paranoid about changes to proc files, but I can definitely see value in having meaningful interrupt numbers in there instead of making up a more or less random mapping to a flat number space. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Cbe-oss-dev] [PATCH 14/22] spufs: use SPU master control to prevent wild SPU execution
On Thursday 01 March 2007, Michael Ellerman wrote: On Mon, 2006-11-20 at 18:45 +0100, Arnd Bergmann wrote: plain text document attachment (spufs-master-control.diff) When the user changes the runcontrol register, an SPU might be running without a process being attached to it and waiting for events. In order to prevent this, make sure we always disable the priv1 master control when we're not inside of spu_run. Hi Arnd, Sorry I didn't comment on this when you sent it, I wasn't paying enough attention. This patch confuses me, you say we should make sure we always disable the master control when we're not inside spu_run, but I see several exit paths where we leave the master run bit enabled - or maybe I'm reading it wrong. I think you're right, there is at least one path that I now saw getting out of spufs_run_spu incorrectly. In particular, when spu_reacquire_runnable() fails, we never call the master stop, which is a bug, but should happen very infrequently in practice. Do you see another case where we end up with the same problem? If not, I'll prepare a patch to fix this one case. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update
On Thursday 01 March 2007 05:14:40 Wu, Bryan wrote: The whole patch is located at URL: https://blackfin.uclinux.org/gf/download/frsrelease/39/2583/blackfin-arch.p atch The incremental patch is located at URL: https://blackfin.uclinux.org/gf/download/frsrelease/39/2584/blackfin-arch-m m2-update.patch I'm not sure if that was intentional, but the second patch does not apply on top of the -mm kernel but rather patch the the patch old itself. This basically makes it impossible to review just that part, so better provide the diff between the kernel with the old patch and the kernel with the new patch next time. OTOH, from what I could see from the contents, the changes themselves look pretty good, I'd probably add my 'Acked-by' if I could read that patch more easily ;-) Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update
On Thursday 01 March 2007 05:14:40 Wu, Bryan wrote: Here is the update version of blackfin-arch.patch in -mm tree. simply add support to utrace and it was tested on blackfin STAMP board as well as other following patches. Wow, this has come a long way since I looked at the patches last year, good work! I've gone through the complete patch again now, and these are the issues I've found in it. None of these are show-stoppers and I'd like to see it all go in during the next merge window. There should be enough time until then to address these points: +EXPORT_SYMBOL(__ioremap); +EXPORT_SYMBOL(strcmp); +EXPORT_SYMBOL(strncmp); +EXPORT_SYMBOL(dump_thread); + +EXPORT_SYMBOL(ip_fast_csum); + +EXPORT_SYMBOL(kernel_thread); + +EXPORT_SYMBOL(__up); +EXPORT_SYMBOL(__down); +EXPORT_SYMBOL(__down_trylock); +EXPORT_SYMBOL(__down_interruptible); + +EXPORT_SYMBOL(is_in_rom); In general, please put EXPORT_SYMBOL lines below the definition of the symbol itself. This list of exports should only be used for symbols that come from assembly files. You should probably also think about whether some of them are better done as EXPORT_SYMBOL_GPL. + pending = bfin_read_IPEND() ~0x8000; + other_ints = pending (pending - 1); + if (other_ints == 0) + lower_to_irq14(); + irq_exit(); + The last line here has trailing whitespace. While this gets automatically removed by akpm's scripts, you're normally better off not adding it in the first place, because it may cause your follow-on patches not to apply, aside from being wrong to start with. +void machine_halt(void) +{ + for (;;) + /* nothing */ ; +} + +void machine_power_off(void) +{ + for (;;) + /* nothing */ ; +} It might be nicer to make this for (;;) asm volatile (idle); Otherwise you end up burning CPU cycles after a halt without any particular need. +#if defined(CONFIG_MTD_UCLINUX) + /* generic memory mapped MTD driver */ + memory_mtd_end = memory_end; + + mtd_phys = _ramstart; + mtd_size = PAGE_ALIGN(*((unsigned long *)(mtd_phys + 8))); + +# if defined(CONFIG_EXT2_FS) || defined(CONFIG_EXT3_FS) + if (*((unsigned short *)(mtd_phys + 0x438)) == EXT2_SUPER_MAGIC) + mtd_size = + PAGE_ALIGN(*((unsigned long *)(mtd_phys + 0x404)) 10); +# endif + +# if defined(CONFIG_CRAMFS) + if (*((unsigned long *)(mtd_phys)) == CRAMFS_MAGIC) + mtd_size = PAGE_ALIGN(*((unsigned long *)(mtd_phys + 0x4))); +# endif + +# if defined(CONFIG_ROMFS_FS) + if (((unsigned long *)mtd_phys)[0] == ROMSB_WORD0 + ((unsigned long *)mtd_phys)[1] == ROMSB_WORD1) + mtd_size = + PAGE_ALIGN(be32_to_cpu(((unsigned long *)mtd_phys)[2])); This detection seems to me like a strange thing to do in setup_arch(). It should be possible to do this much later, at a point where the system is much less fragile and e.g. printk works. It could even be moved into some place in the mtd code itself, since other architectures might want to do the same thing. +#if defined(CONFIG_BF561) +static struct cpu cpu[2]; +#else +static struct cpu cpu[1]; +#endif +static int __init topology_init(void) +{ +#if defined (CONFIG_BF561) + register_cpu(cpu[0], 0); + register_cpu(cpu[1], 1); + return 0; +#else + return register_cpu(cpu, 0); +#endif +} I think you should try to avoid the special-case stuff here. You can have CONFIG_NR_CPUS in Kconfig set dependent on CONFIG_BF561 and change the code here (and similarly in other places) to static struct cpu cpu[NR_CPUS]; static int __init topology_init(void) { int i; for (i=0; i NR_CPUS; i++) { register_cpu(cpu[i], i); return 0; } + for (i = ZERO_P; i = L2_MEM; i++) { + + if (cplb_data[i].valid) { + + as_1m = cplb_data[i].start % SIZE_1M; + + /* We need to make sure all sections are properly 1M aligned + * However between Kernel Memory and the Kernel mtd section, depending on the + * rootfs size, there can be overlapping memory areas. + */ + + if (as_1m) { +#ifdef CONFIG_MTD_UCLINUX + if (i == SDRAM_RAM_MTD) { + if ((cplb_data[SDRAM_KERN].end + 1) cplb_data[SDRAM_RAM_MTD].start) + cplb_data[SDRAM_RAM_MTD].start = (cplb_data[i].start (-2*SIZE_1M)) + SIZE_1M; I count 6 levels of indentation, which severely limits readability, especially when you have terms this complex in the last level. Please try to split up functions like this into smaller units. +/* + * ++roman (07/09/96): implemented signal stacks (specially for tosemu on + * Atari :-) Current limitation: Only one sigstack can be active at one time.
Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update
On Saturday 03 March 2007 23:50:02 bert hubert wrote: for (;;) asm volatile (idle); This looks remarkably like relax_cpu() Actually not: cpu_relax() is defined as barrier(), it can't call idle because that might make it sleep for a indefinite amount of time (until the next interrupt, but only if they are enabled). Some nice architectures provide a hardware mechanism to do cpu_relax, like going to low-power mode for a few microseconds, but this one doesn't seem to have it. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Heads up on sys_fallocate()
On Friday 02 March 2007 00:38:19 Christoph Hellwig wrote: Forgive me if I haven't put enough thought into it, but would it be useful to create a generic_fallocate() that writes zeroed pages for any non-existent pages in the range? I don't know how glibc currently implements posix_fallocate(), but maybe the kernel could do it more efficiently, even in generic code. Maybe we don't care, since the major file systems can probably do something better in their own code. I'd be more happy to have the write out zeroes loop in glibc. And glibc needs to have it anyway, for older kernels. A generic_fallocate makes sense to me iff we can do it in the kernel more significantly more efficiently than in glibc, e.g. by using only a single page in page cache instead of one for each page to be preallocated. If glibc is smart enough to do an optimal implementation, I fully agree with you. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] configfs: add missing mutex_unlock()
On Sunday 04 March 2007 14:38:12 Akinobu Mita wrote: @@ -1168,8 +1168,10 @@ int configfs_register_subsystem(struct c err = -ENOMEM; dentry = d_alloc(configfs_sb-s_root, name); - if (!dentry) + if (!dentry) { + mutex_unlock(configfs_sb-s_root-d_inode-i_mutex); goto out_release; + } d_add(dentry, NULL); This should be changed to jump to a new exit point, before the mutex_unlock at the end of the function. Having multiple places in the function that release the same lock easily leads to the kind of bug you are fixing here. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/9] mtd: Allow mtd block device drivers to have a custom ioctl function
On Friday 02 March 2007 16:55:02 Richard Purdie wrote: Allow mtd block drivers to customise their ioctl functions. Also allow the drivers to obtain the gendisk struct since ioctl functions can need this. Are you sure that this is a good idea? I'd rather not open up this method of letting the individual drivers to bad things. This also moves the mtd ioctl functions from locked to unlocked. As far as I can see, nothing in the mtd code has locking problems. This part looks fine to me. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Heads up on sys_fallocate()
On Sunday 04 March 2007, Anton Altaparmakov wrote: A generic_fallocate makes sense to me iff we can do it in the kernel more significantly more efficiently than in glibc, e.g. by using only a single page in page cache instead of one for each page to be preallocated. If glibc is smart enough to do an optimal implementation, I fully agree with you. glibc cannot ever be smart enough because a file system driver will always know better and be able to do things in a much more optimized way. Ok, that's not what I meant. It's obvious that the file system itself can do better than both VFS and glibc. The question is whether VFS can be better than glibc on file systems that don't offer their own implementation of the fallocate operation. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/3] RFC: using hrtimers for in-kernel timeouts
I've played around with the new timer statistics to see which timers might benefit of being moved from traditional timers to hrtimers. Since my understanding is that timer_list timers are not really meant to expire, this seems to include a lot of what comes in through schedule_timeout, in particular select() and futex wait. I have no idea if what I was attempting is even the right approach to start with, but I want to share the patches in case it is ;-). Maybe someone is interested in running some low-level benchmarks on this or point out any bugs in the code. Arnd -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 2/3] use hrtimer in select and pselect
This changes the select and pselect system calls to use the new schedule_timeout_hr function. Since many applications use the select function instead of nanosleep, this provides a higher resolution sleep to them. BUG: the same needs to be done for the compat syscalls, the current patch breaks building on 64 bit machines. Signed-off-by: Arnd Bergmann [EMAIL PROTECTED] Index: linux-cg/fs/select.c === --- linux-cg.orig/fs/select.c +++ linux-cg/fs/select.c @@ -189,7 +189,7 @@ get_max: #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR) #define POLLEX_SET (POLLPRI) -int do_select(int n, fd_set_bits *fds, s64 *timeout) +int do_select(int n, fd_set_bits *fds, ktime_t *timeout) { struct poll_wqueues table; poll_table *wait; @@ -205,12 +205,11 @@ int do_select(int n, fd_set_bits *fds, s poll_initwait(table); wait = table.pt; - if (!*timeout) + if (timeout !timeout-tv64) wait = NULL; retval = 0; for (;;) { unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp; - long __timeout; set_current_state(TASK_INTERRUPTIBLE); @@ -266,27 +265,19 @@ int do_select(int n, fd_set_bits *fds, s *rexp = res_ex; } wait = NULL; - if (retval || !*timeout || signal_pending(current)) + if (retval || (timeout !timeout-tv64) + || signal_pending(current)) break; if(table.error) { retval = table.error; break; } - if (*timeout 0) { + if (!timeout || timeout-tv64 0) /* Wait indefinitely */ - __timeout = MAX_SCHEDULE_TIMEOUT; - } else if (unlikely(*timeout = (s64)MAX_SCHEDULE_TIMEOUT - 1)) { - /* Wait for longer than MAX_SCHEDULE_TIMEOUT. Do it in a loop */ - __timeout = MAX_SCHEDULE_TIMEOUT - 1; - *timeout -= __timeout; - } else { - __timeout = *timeout; - *timeout = 0; - } - __timeout = schedule_timeout(__timeout); - if (*timeout = 0) - *timeout += __timeout; + schedule(); + else + *timeout = schedule_timeout_hr(*timeout); } __set_current_state(TASK_RUNNING); @@ -307,7 +298,7 @@ int do_select(int n, fd_set_bits *fds, s ((unsigned long) (MAX_SCHEDULE_TIMEOUT / HZ)-1) static int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, - fd_set __user *exp, s64 *timeout) + fd_set __user *exp, ktime_t *timeout) { fd_set_bits fds; void *bits; @@ -384,7 +375,7 @@ out_nofds: asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, struct timeval __user *tvp) { - s64 timeout = -1; + ktime_t timeout, *timeoutp = NULL; struct timeval tv; int ret; @@ -395,24 +386,20 @@ asmlinkage long sys_select(int n, fd_set if (tv.tv_sec 0 || tv.tv_usec 0) return -EINVAL; + timeout = timeval_to_ktime(tv); /* Cast to u64 to make GCC stop complaining */ - if ((u64)tv.tv_sec = (u64)MAX_INT64_SECONDS) - timeout = -1; /* infinite */ - else { - timeout = ROUND_UP(tv.tv_usec, USEC_PER_SEC/HZ); - timeout += tv.tv_sec * HZ; - } + if ((u64)tv.tv_sec (u64)MAX_INT64_SECONDS) + timeoutp = timeout; } - ret = core_sys_select(n, inp, outp, exp, timeout); + ret = core_sys_select(n, inp, outp, exp, timeoutp); if (tvp) { struct timeval rtv; if (current-personality STICKY_TIMEOUTS) goto sticky; - rtv.tv_usec = jiffies_to_usecs(do_div((*(u64*)timeout), HZ)); - rtv.tv_sec = timeout; + rtv = ktime_to_timeval(timeout); if (timeval_compare(rtv, tv) = 0) rtv = tv; if (copy_to_user(tvp, rtv, sizeof(rtv))) { @@ -438,7 +425,7 @@ asmlinkage long sys_pselect7(int n, fd_s fd_set __user *exp, struct timespec __user *tsp, const sigset_t __user *sigmask, size_t sigsetsize) { - s64 timeout = MAX_SCHEDULE_TIMEOUT; + ktime_t timeout, *timeoutp = NULL; sigset_t ksigmask, sigsaved; struct timespec ts; int ret; @@ -450,13 +437,11 @@ asmlinkage long sys_pselect7(int n, fd_s
[RFC PATCH 1/3] introduce schedule_timeout_hr
The new schedule_timeout_hr function is a variant of schedule_timeout that uses hrtimers internally. Consequently, its argument and return value are ktime_t. Signed-off-by: Arnd Bergmann [EMAIL PROTECTED] Index: linux-cg/include/linux/sched.h === --- linux-cg.orig/include/linux/sched.h +++ linux-cg/include/linux/sched.h @@ -246,6 +246,8 @@ extern int in_sched_functions(unsigned l #defineMAX_SCHEDULE_TIMEOUTLONG_MAX extern signed long FASTCALL(schedule_timeout(signed long timeout)); +extern ktime_t FASTCALL(schedule_timeout_hr(ktime_t timeout)); + extern signed long schedule_timeout_interruptible(signed long timeout); extern signed long schedule_timeout_uninterruptible(signed long timeout); asmlinkage void schedule(void); Index: linux-cg/kernel/hrtimer.c === --- linux-cg.orig/kernel/hrtimer.c +++ linux-cg/kernel/hrtimer.c @@ -1206,6 +1206,54 @@ void hrtimer_init_sleeper(struct hrtimer #endif } +/** + * schedule_timeout_hr - sleep until timeout + * @timeout: timeout value + * + * Make the current task sleep until @timeout has elapsed. + * The routine will return immediately unless the current task + * state has been set (see set_current_state()). + * + * You can set the task state as follows - + * + * %TASK_UNINTERRUPTIBLE - at least @timeout is guaranteed to + * pass before the routine returns. The routine will return 0 + * + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is + * delivered to the current task. In this case the remaining time + * in jiffies will be returned, or 0 if the timer expired in time + * + * The current task state is guaranteed to be TASK_RUNNING when this + * routine returns. + * + * In all cases the return value is guaranteed to be a non-negative + * time value. + */ +static ktime_t __sched __schedule_timeout_hr(ktime_t time, void *addr) +{ + struct hrtimer_sleeper t; + ktime_t remain; + + hrtimer_init(t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer_init_sleeper(t, current); + __timer_stats_hrtimer_set_start_info(t.timer, addr); + hrtimer_start(t.timer, time, HRTIMER_MODE_REL); + schedule(); + hrtimer_cancel(t.timer); + remain = hrtimer_get_remaining(t.timer); + + if (ktime_to_ns(remain) 0) + return ktime_set(0, 0); + else + return remain; +} + +fastcall ktime_t __sched schedule_timeout_hr(ktime_t time) +{ + return __schedule_timeout_hr(time, __builtin_return_address(0)); +} +EXPORT_SYMBOL_GPL(schedule_timeout_hr); + static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mode) { hrtimer_init_sleeper(t, current); -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 3/3] change schedule_timeout to use hrtimers
According to the new timer statistics, many of the timers that expire come from schedule_timeout. Since the regular timer infrastructure is optimized for timers that don't expire, this might be a useful optimization. This also changes the timer stats to show the caller of schedule_timeout in the statistics rather than schedule_timeout itself. BUG: converting between jiffies and ktime is rather inefficient here. Signed-off-by: Arnd Bergmann [EMAIL PROTECTED] Index: linux-cg/kernel/hrtimer.c === --- linux-cg.orig/kernel/hrtimer.c +++ linux-cg/kernel/hrtimer.c @@ -1254,6 +1254,96 @@ fastcall ktime_t __sched schedule_timeou } EXPORT_SYMBOL_GPL(schedule_timeout_hr); +/** + * schedule_timeout - sleep until timeout + * @timeout: timeout value in jiffies + * + * Make the current task sleep until @timeout jiffies have + * elapsed. The routine will return immediately unless + * the current task state has been set (see set_current_state()). + * + * You can set the task state as follows - + * + * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to + * pass before the routine returns. The routine will return 0 + * + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is + * delivered to the current task. In this case the remaining time + * in jiffies will be returned, or 0 if the timer expired in time + * + * The current task state is guaranteed to be TASK_RUNNING when this + * routine returns. + * + * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule + * the CPU away without a bound on the timeout. In this case the return + * value will be %MAX_SCHEDULE_TIMEOUT. + * + * In all cases the return value is guaranteed to be non-negative. + */ +fastcall signed long __sched schedule_timeout(signed long timeout) +{ + ktime_t time; + struct timespec ts; + + switch (timeout) + { + case MAX_SCHEDULE_TIMEOUT: + /* +* These two special cases are useful to be comfortable +* in the caller. Nothing more. We could take +* MAX_SCHEDULE_TIMEOUT from one of the negative value +* but I' d like to return a valid offset (=0) to allow +* the caller to do everything it want with the retval. +*/ + schedule(); + goto out; + default: + /* +* Another bit of PARANOID. Note that the retval will be +* 0 since no piece of kernel is supposed to do a check +* for a negative retval of schedule_timeout() (since it +* should never happens anyway). You just have the printk() +* that will tell you if something is gone wrong and where. +*/ + if (timeout 0) { + printk(KERN_ERR schedule_timeout: wrong timeout + value %lx\n, timeout); + dump_stack(); + current-state = TASK_RUNNING; + goto out; + } + } + + /* FIXME: there ought to be an efficient ktime_to_jiffies +*and ktime_to_jiffies */ + jiffies_to_timespec(timeout, ts); + time = timespec_to_ktime(ts); + time = __schedule_timeout_hr(time, __builtin_return_address(0)); + ts = ktime_to_timespec(time); + timeout = timespec_to_jiffies(ts); + out: + return timeout 0 ? 0 : timeout; +} +EXPORT_SYMBOL(schedule_timeout); + +/* + * We can use __set_current_state() here because schedule_timeout() calls + * schedule() unconditionally. + */ +signed long __sched schedule_timeout_interruptible(signed long timeout) +{ + __set_current_state(TASK_INTERRUPTIBLE); + return schedule_timeout(timeout); +} +EXPORT_SYMBOL(schedule_timeout_interruptible); + +signed long __sched schedule_timeout_uninterruptible(signed long timeout) +{ + __set_current_state(TASK_UNINTERRUPTIBLE); + return schedule_timeout(timeout); +} +EXPORT_SYMBOL(schedule_timeout_uninterruptible); + static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mode) { hrtimer_init_sleeper(t, current); Index: linux-cg/kernel/timer.c === --- linux-cg.orig/kernel/timer.c +++ linux-cg/kernel/timer.c @@ -1369,103 +1369,6 @@ asmlinkage long sys_getegid(void) #endif -static void process_timeout(unsigned long __data) -{ - wake_up_process((struct task_struct *)__data); -} - -/** - * schedule_timeout - sleep until timeout - * @timeout: timeout value in jiffies - * - * Make the current task sleep until @timeout jiffies have - * elapsed. The routine will return immediately unless - * the current task state has been set (see set_current_state()). - * - * You can set the task state as follows - - * - * %TASK_UNINTERRUPTIBLE - at least @timeout
Re: [Cbe-oss-dev] [PATCH 14/22] spufs: use SPU master control to prevent wild SPU execution
On Friday 02 March 2007, Michael Ellerman wrote: There's also the error case for spu_run_init() which skips the master stop. I guess that's ok because we've only set the master control in the backing store, and the only way that will ever get propagated to an actual spu is by coming back thorough spufs_run_spu(). Hmm, the correct way would be to switch off the master control in there, afaics. Fixing it only in spu_run_init would mean that we also handle the case of spu_reacquire_runnable along with it. What originally caught my eye on this was the output from xmon. When we drop into xmon with no spu programs running and stop the spus, it reports that they _all_ have the master run enabled, That looks right, there is no problem to have master control enabled, as long as user space can't access the spu through a context that is bound to it. and some of them have the runcntl enabled (those that have had spu programs run on them since boot it seems). While this sounds wrong. Maybe the runcntl is active on those that have _not_ run since boot, which would make more sense. We should investigate this. It looks like the save/restore code sets the master bit in several places, but never sets/clears the runcntl, which seems bogus to me. So when we leave spufs_spu_run we do the master stop call: spu_mfc_sr1_set: spu: c0007ffdfc80 (15) sr1: 0x1b runcntl: 0x1 Call Trace: [C196BAA0] [C000F920] .show_stack+0x68/0x1b0 (unreliable) [C196BB40] [D01475C0] .spu_hw_master_stop+0xa8/0x170 [spufs] [C196BBE0] [D0148598] .spufs_run_spu+0x5ec/0x770 [spufs] [C196BCC0] [D0144BA0] .do_spu_run+0xb4/0x180 [spufs] [C196BD80] [C003905C] .sys_spu_run+0xb0/0x108 [C196BE30] [C0008634] syscall_exit+0x0/0x40 But then the save/restore code sets it back on? Right, the context save code needs to enable master control in order to run on the spu. However, that should be after all mappings to user space have been discarded. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wanted: simple, safe x86 stack overflow detection
On Wednesday 28 February 2007, Chuck Ebbert wrote: Can we just put a canary in the threadinfo and check it on every task switch? What are the drawbacks? It's not completely reliable, in case of functions that allocate far too much stack space. You might want to take a look at the gcc support that Andreas Krebbel implemented for s390 to check for stack overflows: http://gcc.gnu.org/ml/gcc-patches/2004-08/msg01308.html I think there are some additions planned for the next gcc release, but if you port this to i386, it will get you pretty far. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Heads up on sys_fallocate()
On Monday 05 March 2007, Jörn Engel wrote: That actually causes an interesting problem for compressing filesystems. The space consumed by blocks depends on their contents and how well it compresses. At the moment, the only option I see to support posix_fallocate for LogFS is to set an inode flag disabling compression, then allocate the blocks. But if the file already contains large amounts of compressed data, I have a problem. Disabling compression for a range within a file is not supported, so I can only return an error. But which one? Using the current glibc implementation on a compressed file system ideally should be a very expensive no-op because you won't actually allocate much space for a file when writing zeroes to it. You also don't benefit of a contiguous allocation in logfs, since flash has uniform seek times over all the medium. I'd suggest you implement posix_fallocate as an real nop and just return success without doing anything. You could also return ENOSPC in case the blocks requested by posix_fallocate don't fit on the medium without compression, but that is more or less just guesswork (like statfs is). Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Heads up on sys_fallocate()
On Monday 05 March 2007, Anton Altaparmakov wrote: An alternative would be to allocate blocks and then when the data is written perform the compression and free any blocks you do not need any more because the data has shrunk sufficiently. Depending on the implementation details this could potentially create horrible fragmentation as you would allocate a large consecutive region and then go and drop random blocks from that region thus making the file fragmented. Unfortunately, this is not as easy on logfs, because there is no point in allocating a block when there is no data to write into it. Fragmentation on flash media is free, but you can never modify a block in place without erasing it first. This means it will always be written to a new location on the next write access. One option that might work (similar to what you describe in your other mail) is to have a per-inode count of reserved blocks, without allocating specific blocks for them. The journal then needs to maintain the number of total reserved blocks for all files and keep that in sync with blocks that were reserved for specific inodes. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update
On Monday 05 March 2007, Wu, Bryan wrote: So could please give us some information about the merge window schedule, we may try to catch this. The merge window opens after 2.6.21 gets released and is open for two weeks aftre that. The idea is however that you have everything ready at the start of the merge window. Oh, if we should fix this issue, there are lots of work to do because tons of drivers rely on this. Maybe after some team internal discussion, we will give a solution to this. You can probably use a short perl script (or similar) to automate the conversion. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update
On Monday 05 March 2007, Aubrey Li wrote: On 3/4/07, Arnd Bergmann [EMAIL PROTECTED] wrote: In general, please put EXPORT_SYMBOL lines below the definition of the symbol itself. This list of exports should only be used for symbols that come from assembly files. What is the right way to export symbol coming from c files? As I said, below the symbol definition, like int global_var; EXPORT_SYMBOL(global_var); int global_function(void) { return 3; } EXPORT_SYMBOL(global_function); This detection seems to me like a strange thing to do in setup_arch(). It should be possible to do this much later, at a point where the system is much less fragile and e.g. printk works. It could even be moved into some place in the mtd code itself, since other architectures might want to do the same thing. After download the rootfs image from host to the target ram, we need to move the image to the right place, so we need to know the size of the image at this time. Well, it doesn't have to be in the modular part of the kernel, but some place later than setup_arch() would be a step in the right direction. If you need it before the file systems, an arch_initcall() might be the right place. I'm curious: In your dual-core bf561, don't you actually need to implement something that maintains atomicity across cores rather than just across processes? Yes, bf561 is a dual-core processor, but we are using only one core of bf561 now. IMHO, BF561 architecture was not designed for SMP or NUMA. Interesting, so what is the intended use of the other core? Does the hardware have any way of supporting concurrency between the cores, other than sending interrupts between them? How does this fit in with the generic SPI code? Does it duplicate stuff from there, or do you use it? We use our own. We have dma which can be used for SPI operations. I just looked again at your code. My question was more directed at whether you use your own SPI abstraction layer instead of drivers/spi, which you fortunately don't. The piece I was missing however is the spi_bfin5xx.c driver, which was not part of this patch, though you seem to rely on it. Is that already part of the -mm kernel? Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update
On Monday 05 March 2007, Wu, Bryan wrote: Maybe NUMA is a solution, but it is not a wonderful solution. NUMA doesn't help you. Linux only runs on cache-coherent NUMA, which this isn't. In some application product, BF561 core A is running Linux kernel +Applications while BF561 core B is just for some complicated video/audio codec algorithm. Any Linux multicore solution in BF561 situation is highly welcome. You definitely can't use the cache mode in this case, but one idea that should make atomic instructions work is to always do these on one of the two cores, and use cross-core interrupts to trigger an update. It's probably pretty inefficient and you also need to do something about atomic updates (spinlock_t and atomic_t) when interrupts are disabled. Another question: when is the merge point from -mm to linus mainline, is it the same as the merge window after 2.6.21 released? It's the same. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [patch 4/6 -rt] powerpc 2.6.20-rt8: fix a runtime warnings for xmon
On Wednesday 07 March 2007, Ingo Molnar wrote: i'm not an xmon expert, but maybe it might make more sense to first disable preemption, then interrupts - otherwise you could be preempted right after having disabled these interrupts (and be scheduled to another CPU, etc.). What is the difference between local_irq_save() and the above 'disable interrupts' sequence? If it's not the same and xmon_core() relied on having hardirqs disabled then it might make sense to do a local_irq_save() there, instead of a preempt_disable(). Since relatively recently, powerpc does no longer actually disable the hardware interrupts with local_irq_disable(), but rather sets a per-cpu flag that will be checked if an actual interrupt comes in as part of the critical section. The mtmsr() sequence in xmon corresponds to hard_irq_disable() and should probably changed to that, but then you still need the extra preempt_disable() / preempt_enable(). I think you're right about the sequence having to be 1. preempt_disable() 2. hard_irq_disable() 3. 4. hard_irq_enable() 5. preempt_enable() Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux v2.6.21-rc3
On Wednesday 07 March 2007 16:39:00 Linus Torvalds wrote: So did you hunt it down to a particular cases where it triggers? IIRC, it crashed on boot in the powerpc iommu code when slab debugging is enabled. Not sure if it was on Cell or on benh's powerbook though. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, H. Peter Anvin wrote: However, one probably wants to think about what the heck one actually means with virtualization in the absence of a lot of this stuff. PCI is probably the closest thing we have to a lowest common denominator for device detection. I think that's true outside of s390, but a standardized virtual device interface should be able to work there as well. Interestingly, the s390 channel I/O also uses two 16 bit numbers to identify a device (type and model), just like PCI or USB, so in that light, we might be able to use the same number space for something entirely different depending on the virtual bus. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, Cornelia Huck wrote: I think that's true outside of s390, but a standardized virtual device interface should be able to work there as well. Interestingly, the s390 channel I/O also uses two 16 bit numbers to identify a device (type and model), just like PCI or USB, so in that light, we might be able to use the same number space for something entirely different depending on the virtual bus. Even if we used those ids for cu_type and dev_type, it would still be ugly IMO. It would be much cleaner to just define a very simple, easy to implement virtual bus without dragging implementation details for other types of devices around. Right, but an interesting point is the question what to do when running another operating system as a guest under Linux, e.g. with kvm. Ideally, you'd want to use the same interface to announce the presence of the device, which can be done far more easily with PCI than using a new bus type that you'd need to implement for every OS, instead of just implementing the virtual PCI driver. Using a 16 bit number to identify a specific interface sounds like a good idea to me, if only for the reason that it is a widely used approach. The alternative would be to use an ascii string, like we have for open-firmware devices on powerpc or sparc. I think in either way, we need to abstract the driver for the virtual device from the underlying bus infrastructure, which is hypervisor and/or platform dependent. The abstraction could work roughly like this: == virt_dev.h == struct virt_driver { /* platform independent */ struct device_driver drv; struct pci_device_id *ids; /* not necessarily PCI */ }; struct virt_bus { /* platform dependent */ long (*transfer)(struct virt_dev *dev, void *buffer, unsigned long size, int type); }; struct virt_dev { struct device dev; struct virt_driver *driver; struct virt_bus *bus; struct pci_device_id id; int irq; }; == virt_example.c == static ssize_t virt_pipe_read(struct file *filp, char __user *buffer, size_t len, loff_t *off) { struct virt_dev *dev = filp-private_data; ssize_t ret = dev-bus-transfer(dev, buffer, len, READ); *off += ret; return ret; } static struct file_operations virt_pipe_fops = { .open = nonseekable_open, .read = virt_pipe_read, }; static int virt_pipe_probe(struct device *dev) { struct virt_dev *vdev = to_virt_dev(dev); struct miscdev *mdev = kmalloc(sizeof(*dev), GFP_KERNEL); mdev-name = virt_pipe; mdev-fops = virt_pipe_fops; mdev-parent = dev; return register_miscdev(mdev); } static struct pci_device_id virt_pipe_id = { .vendor = PCI_VENDOR_LINUX, .device = 0x3456, }; MODULE_DEVICE_TABLE(pci, virt_pipe_id); static struct virt_driver virt_pipe_driver = { .drv = { .name = virt_pipe, .probe = virt_pipe_probe, }, .ids = virt_pipe_id, } static int virt_pipe_init(void) { return virt_driver_register(virt_pipe_driver); } module_init(virt_pipe_init); == virt_devtree.c == static long virt_devtree_transfer(struct virt_dev *dev, void *buffer, unsigned long size, int type) { long reg; switch type { case READ: ret = hcall(HV_READ, dev-dev.platform_data, buffer, size); break; case WRITE: ret = hcall(HV_WRITE, dev-dev.platform_data, buffer, size); break; default: BUG(); } return ret; } static struct virt_bus virt_devtree_bus = { .transfer = virt_devtree_transfer, }; static int virt_devtree_probe(struct of_device *ofdev, struct of_device_id *match) { struct virt_dev *vdev = kzalloc(sizeof(*vdev); vdev-bus = virt_devtree_bus; vdev-dev.parent = ofdev-dev; vdev.id.vendor = PCI_VENDOR_LINUX; vdev.id.device = *of_get_property(ofdev, virt_dev_id), vdev.irq = of_irq_parse_and_map(ofdev, 0); return device_register(vdev-dev); } struct of_device_id virt_devtree_ids = { .compatible = virt-dev, }; static struct of_platform_driver virt_devtree_driver = { .probe = virt_devtree_probe, .match_table = virt_devtree_ids, }; == virt_pci.c == static long virt_pci_transfer(struct virt_dev *dev, void *buffer, unsigned long size, int type) { struct virt_pci_regs __iomem *regs = dev-dev.platform_data; switch type { case READ: mmio_insb(regs-read_port, buffer, size); break; case WRITE: mmio_outsb(regs-write_port, buffer, size); break; default: BUG(); }
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, Cornelia Huck wrote: On Tue, 3 Apr 2007 14:15:37 +0200, Arnd Bergmann [EMAIL PROTECTED] wrote: That's OK for a virtualized architecture where the base architecture already supports PCI. But a traditional s390 OS would be as unhappy with a PCI device as with a device of a completely new type :) Sure, that was my point from the start. There are several options for virtualized devices (and I don't know why they shouldn't coexist): 1. Emulate a well-known device (like a e1000 network card on PCI or a model 3390 dasd on CCW). Existing operating systems can just use them, but it's a lot of work in the hypervisor. Most hypervisors already do this, and it's an unrelated topic. What we're trying to achieve is to make sure not every hypervisor and simulator has to introduce its own set of drivers. struct virt_bus { /* platform dependent */ long (*transfer)(struct virt_dev *dev, void *buffer, unsigned long size, int type); }; Should this embed a struct bus_type? Or reference a generic_virt_bus? yes, that should embed the bus_type. struct virt_dev { struct device dev; struct virt_driver *driver; struct virt_bus *bus; struct pci_device_id id; int irq; }; And that's where I have problems :) The notion of irq is far too platform specific. I can bend my mind round using PCI-like ids for non-PCI virtualized devices, but an integer is far too small and to specific for a way to access the device. Sorry, I've been working too long on the lesser architectures. IRQ number are evil indeed. However, I'm pretty sure that we need _some_ abstraction of an interrupt mechanism here. The easiest way is probably to have a callback function like int (*irq_handler)(struct virt_dev*, unsigned long message); in the virt_dev. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, Cornelia Huck wrote: On s390, it would be more than strangeness. There's no implementation of PCI at all, someone would have to cook it up - and it wouldn't have any use beyond those special devices. Since there isn't any bus type that is available on *all* architectures, a generic virtual bus with very simple probing seems much saner... I think we need to separate two problems here: 1. Probing: That's really what triggered the discussion, PCI probing is well-understood and implemented on _most_ platforms, so there is some value in reusing it. When you talk about 'very simple probing', I'm not sure what the most simple approach could be. Ideas that have been implemented before include: a) have a limited set of device IDs (e.g. 65535 devices, or a hierarchic tree), and try to access each one of them in order to find out if it's there. We do that for PCI or CCW, for instance. b) Have an iterator in the hypervisor (or firmware), to return a handle to the first, next or child of a device. We do that for open firmware. c) ask the hypervisor for an unused device of a given class, which needs to be returned to the hypervisor when no longer used. This is how the PS3 hypervisor works, but it does not play well with the Linux driver model. 2. Device access: When talking to a virtual device, you want to have at least a way to give commands to it and a way to get interrupts back. Again, multiple ideas have been used in the past, and we should choose a subset: a) PCI-like: mmio using memory and/or I/O space BAR setup, interrupt numbers and DMA to guest physical addresses. b) Channel-like: use an hcall to give commands to the hypervisor, passing down a device handle command code and data areas in guest physical space. Interrupts return the device handle or a OS-defined per-device value. c) Minimalistic: Every device is mapped into the guest address space and can potentially be remapped into user space. The device memory can be shared between guests and/or with the host if that uses the same driver. The guest is able to signal the receiving end using an hcall and gets interrupts like in b) d) UNIX-like: devices appear like file descriptors, the guest can do operations like read/write/sync/mmap, potentially ioctl on them to talk to the host. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote: Arnd Bergmann wrote: I think we need to separate two problems here: 1. Probing: That's really what triggered the discussion, PCI probing is well-understood and implemented on _most_ platforms, so there is some value in reusing it. When you talk about 'very simple probing', I'm not sure what the most simple approach could be. Is probing an interesting problem to consider on its own? If there's some hypervisor-agnostic device driver in Linux, then obviously it needs some way to find the the corresponding (virtual) hardware for it to talk to. But that probing mechanism will depend on the actual interface structure, and is just one of the many problems that need to be solved. There's no point in overloading PCI to probe for the device unless you're actually using PCI to talk to the device. We already have device drivers for physical devices that can be attached to different buses. The EHCI USB is an example of a driver that can be for instance PCI, OF or an on-chip device. Moreover, you can have an abstracted device behind it that does not need to know about the transport, like the SCSI disk driver does not care if it is talking to an ATA, parallel SCSI or SAS chip, or even which controller that is. Let me say up front that I'm skeptical that we can come up with a single bus-like abstraction which can be a both simple and efficient interface to all the virtual architectures. I think a more fruitful path is to find what pieces of functionality can be made common, with the aim of having small, simple and self-contained hypervisor-specific backends. I think this needs to be considered on a class by class basis. This thread started with a discussion about entropy sources. In theory you could implement it as simply as exposing a mmaped ringbuffer. There are some extra complexities deriving from the security requirements though; for example, all the entropy needs to be kept strictly private to the domain that consumes it. But beyond that, there are 3 other important classes of device: * console * disk * networking (There are obviously more, but these are the must-have.) Console already provides us with a model to work on, in the form of hvc-console. The hvc-console code itself has the bulk of the common console code, along with a set of very small hypervisor-specific backends. The Xen console implementation shrunk considerably when we switched to using it. console is also the least problematic interface, you can do it over practically anything. If we could do the same thing with disk and net, I would be very happy. For example, if we wanted to change the Xen frontend/backend disk interface, we could use SCSI as the basic protocol, and then convert netfront into a relatively simple scsi driver. There would still be a Xen-specific piece, but it should be fairly small and have a clean interface. Though the existing interface is pretty simple shove-this-block-there affair. Doing a SCSI driver has been tried before, with ibmvscsi. Not good. The interesting question about block devices is how to handle concurrency and interrupt mitigation. An efficient interface should - have asynchronous notification, not sleep until the transfer is complete - allow multiple blocks to be in flight simultaneously, so the host can reorder the requests if it is smart enough - give only a single interrupt when multiple transfers have completed minor optimizations could be - give an interrupt early when some transfers are complete - allow I/O barriers to be inserted in the stream - allow marking blocks as more or less important (readahead vs. read) - provide passthrough of SG_IO or similar for optical media (e.g. DVD writer) I'm not sure what similar common code could be extracted for network devices. I haven't looked into it all that closely. One way to do networking would be to simply provide a shared memory area that everyone can write to, then use a ring buffer and atomic operations to synchronize between the guests, and a method to send interrupts to the others for flow control. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote: Doing a SCSI driver has been tried before, with ibmvscsi. Not good. OK, interesting. People had proposed using SCSI as the interface, but I wasn't aware of any results from doing that. How is it not good? SCSI is really overengineered for something as simple as a block interface. A large part of the SCSI stack deals only with error handling, which you don't want to burden the guests with at all, since most error conditions can be handled fine by the host. Another big aspect of SCSI is device enumeration and probing. Doing it the SCSI way is particularly pointless. It's much simpler to have one device with its own I/O interface at the hcall layer, and one interrupt number for the block device, instead of faking the full hca/bus/dev/lun hierarchy. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote: That said, something like USB is probably the best bet for this kind of low-performance device. I think. Not that I really know anything about USB. USB has the disadvantage that it is more complex than PCI and requires significantly more code to simulate on the host side. On the plus side, I think it should be possible to implement a virtual USB host on s390, which is not possible with PCI, but that again takes a lot of work to implement. One interesting aspect of the PS3 hypervisor is that some of the low-speed interfaces are implemented as a virtual UART, meaning something that only has read and write operations and uses an interrupt for flow control. The implementation in drivers/ps3/vuart.c is probably more complex than what we want as a generic transport mechanism, but simply having a bidirectional data stream sounds like an ideal abstraction for the simple case. Some more or less obvious users of this include: - console - additional tty - random - slow network (using ppp) - printer - watchdog - hid (e.g. mouse) - system management (like ps3) - fast network (in combination with shared memory segment) The transport can be hypervisor specific, e.g. there could be a virtual PCI serial port on kvm, an hcall interface on the ps3 and a virtual CTC on s390 (kidding), while all of them can have the same kind of hardware _behind_ the serial connection. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: missing madvise functionality
On Tuesday 03 April 2007, Ulrich Drepper wrote: The problem is glibc has to work around kernel limitations. If the malloc implementation detects that a large chunk of previously allocated memory is now free and unused it wants to return the memory to the system. What we currently have to do is this: to free: mmap(PROT_NONE) over the area to reuse: mprotect(PROT_READ|PROT_WRITE) Yep, that's expensive, both operations need to get locks preventing other threads from doing the same. I thought this is what the read_zero_pagealigned hack [1] was used for (read from /dev/zero replaces target pages with empty_zero_page). Now if read_zero_pagealigned does not solve _this_ scenario, is it good for anything else then? Can we simply kill that function as a misfeature and avoid future pain arising from it? Arnd [1] http://lkml.org/lkml/1997/1/16/49 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A set of standard virtual devices?
On Wednesday 04 April 2007, H. Peter Anvin wrote: Note that at least for PIO-based devices, there is nothing that says you can't implement PCI over another transport, if you wish. It's really just a very simple RPC protocol. The PIO aspect of PCI is simple, yes, except on architectures that don't have the concept of PIO or even uncached memory, but even that can be done by defining readl/writel/inl/outl/... as hcalls. The tricky part about PCI is the device probing, everything about config space accesses, interrupt swizzling, bus/device/function numbers and base address registers becomes a pointless excercise when the other side is just faking it. DMA is trickier, as it makes the data appear into the address space of the guest in a way that is both device- and host-dependent (in the presence of PCI domains, IOMMU etc.) There may be reason to avoid DMA for that reason. Right, PCI DMA and virtualization don't mix. DMA in general is fine though, as long as your devices (real or virtual) see the guest physical addresses as a contiguous 64 bit range and have well-defined semantics about what addresses are accessed in what way. When you think of file read/write syscalls as DMA into user space, it's a very clean concept. Async I/O somewhat less so, but still pretty good. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A set of standard virtual devices?
On Wednesday 04 April 2007, H. Peter Anvin wrote: Configuration space access is platform-dependent. It's only defined to work in a specific way on x86 platforms. Interrupt swizzling is really totally independent of PCI. ALL PCI really provides is up to four interrupts per device (not counting MSI/MSI-X) and an 8-bit writable field which the platform can choose to use to hold interrupt information. That's all. The rest is all platform information. PCI enumeration is hardly complex. Most of the stuff that doesn't apply to you you can generally ignore, as is done by other busses like HyperTransport when they emulate PCI. You still don't get my point: On a platform that doesn't have interrupt numbers, and where most of the fields in the config space don't correspond do anything that is already there, you really don't want to invent a set of new hcalls that implement emulation, to get something as simple as a pipe. wc drivers/pci/*.[ch] include/asm-i386/{pci,io}.h lib/iomap*.c \ arch/i386/pci/*.c kernel/irq/*.c 17015 59037 463967 total Even if you only need half of that code in reality, reimplementing all that in both the kernel and in the hypervisor is an enourmous effort. We've seen that before on the ps3, which initially faked a virtual PCI bus just for the USB controller, but doing something like that requires adding abstraction layers, to decide whether to implement e.g. an inb as a hypercall or as a memory read. That being said, on platforms which are PCI-centric, such as x86, this of course makes it a lot easier to produce virtual devices which work across hypervisors, since the device model, of *any* operating system is set up to handle them. Yes, as I said there are two separate problems. I really think that a standardized virtual driver interface should be modeled after kernel - user interfaces, not hardware - kernel interfaces. Once we know what operations we want (e.g. read, write and SIGIO, or some other set of primitives), it will be good to provide a virtual PCI device that can be used as one transport mechanism below it. Using PCI device IDs to tell what functionality is provided by the device would provide a reasonable method for autoprobing. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [JANITOR PROPOSAL] Switch ioctl functions to -unlocked_ioctl
On Tuesday 08 January 2008, Andi Kleen wrote: Thanks, Andi! I think it'd very useful change. Reminds me this is something that should be actually flagged in checkpatch.pl too Andy, it would be good if checkpatch.pl complained about .ioctl = as opposed to .unlocked_ioctl = ... This is rather hard, as there are different data structures that all contain -ioctl and/or -unlocked_ioctl function pointers. Some of them already use -ioctl in an unlocked fashion only, so blindly warning about this would give lots of false positives. Also perhaps if a whole new file_operations with a ioctl is added complain about missing compat_ioctl as a low prioritity warning? (might be ok if it's architecture specific on architectures without compat layer) Also, not every data structure that provides a -ioctl callback also has a -compat_ioctl, although there should be fewer exceptions here. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New linux arch
On Monday 07 January 2008, Michal Simek wrote: I would like to ask you what is the best way to push these changes to kernel.org. I would like to know step by step how to do. Adding the whole architecture tree will probably be too much for a single reviewer and almost certainly too much for a the size limit of mails to lkml. On the other hand, there is not much point in merging the architecture code in multiple changesets if there is nothing at all you can do with part of it. I suggest therefore that you split the code twice: First, split every device driver into its own git changeset. Often, these have to go through a different set of mailing lists, e.g. network drivers go to [EMAIL PROTECTED] (see MAINTAINERS for details), while the actual architecture changeset should not have device drivers by itself. These are going to be the changesets that you have in your git tree and merged upstream eventually. Then, split each of those changesets into reviewer-friendly chunks of less than 100kb. Don't worry if a patch ends up only having a few line while others are considerably larger. For people that like to see a whole changeset, upload it as a combined patch to an http location that you mention in your patch 00/12 or so, and have the smaller patches as reply mails to that. Use either 'quilt mail' or 'git-format-patch' to do that work for you. I think blackfin is a good example of how an architecture got merged, and how they resolved the initial problems. Read through the comments at http://lkml.org/lkml/2006/9/20/404 and related mails to see what can go wrong in such large projects and how to do it better. Regarding the code itself, my assumption is that you started out copying from another architecture (everyone does that) and hacked on it until you had it working. This is not wrong by itself, but it would be really nice if we can make it easier for the next person to add an architecture. My vision is that for each header file you copied from include/asm-i386 or similar and did not end up rewriting, you create a version in include/asm-generic and start using that instead of adding a private copy in your architecture. One example where this was already done is asm/errno.h, an example where you should do it is asm/stat.h. It's similar for files like arch/microblaze/kernel/sys.c and pci.c: ideally, you shouldn't have these at all, but be able to just use completely generic code. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [JANITOR PROPOSAL] Switch ioctl functions to -unlocked_ioctl
On Wednesday 09 January 2008, Andi Kleen wrote: I imagined it would check for +struct file_operations ... = { + ... + .ioctl = ... That wouldn't catch the case of someone adding only .ioctl to an already existing file_operations which is not visible in the patch context, but that should be hopefully rare. The more common case is adding completely new operations Right, this would work fine. We can probably even have a list of data structures that work like file_operations in this regard. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/15] Add DEFINE_SPUFS_ATTRIBUTE()
On Thursday 13 September 2007, Michael Ellerman wrote: Well that'd be nice, but I don't see anywhere that that happens. AFAICT the acquire we do in the first coredump callback is the first the SPU contexts know about their PPE process dying. And spufs is still live, so I think we definitely need to grab the mutex, or we might race with userspace accessing spufs files. Right, I was only thinking about the dumping process itself, but there may be other processes that still have files open for that context. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 7/8] debugfs: allow access to signed values
On Thursday 20 December 2007, Stefano Brivio wrote: debugfs: allow access to signed values Add debugfs_create_s{8,16,32,64}. For these to work properly, we need to remove a cast in libfs, change the simple_attr_open prototype and thus fix the users as well. Cc: Johannes Berg [EMAIL PROTECTED] Cc: Mattias Nissler [EMAIL PROTECTED] To: Greg Kroah-Hartman [EMAIL PROTECTED] To: Arnd Bergmann [EMAIL PROTECTED] To: Akinobu Mita [EMAIL PROTECTED] Signed-off-by: Stefano Brivio [EMAIL PROTECTED] Have you checked that spufs still builds? I would guess that you need to do the same interface changes there. Also, Christoph has recently posted a suggestion for how to improve the interface to allow the 'get' operation to return an error: http://patchwork.ozlabs.org/cbe-oss-dev/patch?id=14962 I'd suggest consolidating the two changes in order to avoid merge conflicts. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 05/43] compat_binfmt_elf
On Thursday 20 December 2007, Roland McGrath wrote: This adds fs/compat_binfmt_elf.c, a wrapper around fs/binfmt_elf.c for 32-bit ELF support on 64-bit kernels. It can replace all the hand-rolled versions of this that each 32/64 arch has, which are all about the same. Great stuff! I've attempted to do this a few times over the past years, but could never get my head around it. One more bit of broken compat code gone from the architectures! Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 18/43] powerpc compat_binfmt_elf
On Friday 21 December 2007, Kyle McMartin wrote: Just taking a stab that hch means, config BINFMT_COMPAT_ELF def_bool n depends on 64BIT I'd call it COMPAT_BINFMT_ELF, for consistency with the file name. Also, the definition and the depends are redundant if you expect the option to be autoselected. You can do either of config COMPAT_BINFMT_ELF bool or config COMPAT_BINFMT_ELF def_bool y depends on COMPAT The second option makes sense at the point where all architectures with compat code are using the same compat_binfmt_elf code. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Thursday 22 November 2007, Andi Kleen wrote: #define EXPORT_SYMBOL(sym) \ - __EXPORT_SYMBOL(sym, ) + __EXPORT_SYMBOL(sym, ,,, NULL) #define EXPORT_SYMBOL_GPL(sym) \ - __EXPORT_SYMBOL(sym, _gpl) + __EXPORT_SYMBOL(sym, _gpl,,, NULL) #define EXPORT_SYMBOL_GPL_FUTURE(sym) \ - __EXPORT_SYMBOL(sym, _gpl_future) + __EXPORT_SYMBOL(sym, _gpl_future,,, NULL) +/* Export symbol into namespace ns + * No _GPL variants because namespaces imply GPL only + */ +#define EXPORT_SYMBOL_NS(ns, sym) \ + __EXPORT_SYMBOL(sym, _gpl,__##ns, NS_SEPARATOR #ns, #ns) I think it would be good if you could specify a default namespace per module, that could reduce the amount of necessary changes significantly. For example, you can do #define EXPORT_SYMBOL_GLOBAL(sym) __EXPORT_SYMBOL(sym, _gpl,,, NULL) #ifdef MODULE_NAMESPACE #define EXPORT_SYMBOL_GPL(sym) EXPORT_SYMBOL_GLOBAL(sym) #else #define EXPORT_SYMBOL_GPL(sym) EXPORT_SYMBOL_NS(sym, MODULE_NAMESPACE) #endif If we go that way, it may be useful to extend the namespace mechanism to non-GPL symbols as well, like #define EXPORT_SYMBOL(sym) __EXPORT_SYMBOL(sym, ,__## MODULE_NAMESPACE, NS_SEPARATOR #MODULE_NAMESPACE, #MODULE_NAMESPACE) Unfortunately, doing this automatic namespace selection requires to set the namespace before #include linux/module.h. One way to work around this could be to use Makefile magic so you can list a Makefile as obj-$(CONFIG_COMBINED) += combined.o combined-$(CONFIG_SUBOPTION) += combined_main.o combined_other.o obj-$(CONFIG_SINGLE) += single.o obj-$(CONFIG_OTHER) += other.o obj-$(CONFIG_API) += api.o NAMESPACE = subsys # default, used for other.o NAMESPACE_single.o = single # used only for single.o NAMESPACE_combined.o = combined # all parts of combined.o NAMESPACE_combined_other.o = special #except this one NAMESPACE_api.o =# api.o is put into the global ns The Makefile logic here would basically just follow the rules we have for CFLAGS etc, and then pass -DMODULE_NAMESPACE=$(NAMESPACE_$(obj)) to gcc. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Thursday 29 November 2007, Andi Kleen wrote: I think it would be good if you could specify a default namespace per module, that could reduce the amount of necessary changes significantly. But also give less documentation. It's also not that difficult to mark the exports once. I've forward ported such patches over a few kernels and didn't run into significant me Part of your sentence seems to be missing, but I guess I understand your point. How many files did you annotate this way? I can see it as being useful to have the namespace explicit in each symbol, but doing it once per module sounds like the 80% solution for 20% of the work, and the two don't even conflict. In the current kernel, I count 12644 exported symbols in 1646 files, in 540 directories. One problem I can see with annotating every symbol is that it conflicts with other patches that add more exported functions to a file without adding the namespace, or that simply break because of context changes. Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
On Thursday 06 December 2007, Joachim Fenkes wrote: printk(KERN_INFO eHCA Infiniband Device Driver (Version HCAD_VERSION )\n); + /* Autodetect hCall locking -- we can't read the firmware version + * directly, but we know that starting with POWER6, all firmware + * versions are good. + */ + if (ehca_lock_hcalls == -1) + ehca_lock_hcalls = !(cur_cpu_spec-cpu_user_features + PPC_FEATURE_ARCH_2_05); + ret = ehca_create_comp_pool(); if (ret) { ehca_gen_err(Cannot create comp pool.); We already talked about this yesterday, but I still feel that checking the instruction set of the CPU should not be used to determine whether a specific device driver implementation is used int hypervisor. At the very least, I think you should change this to read the hypervisor version number from the device tree, though the ideal solution would be to have the absence of this bug encoded in the device node for the ehca device itself. Regarding the performance problem, have you checked whether converting all your spin_lock_irqsave to spin_lock/spin_lock_irq improves your performance on the older machines? Maybe it's already fast enough that way. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/