from:"Arnd Bergmann"

Re: IRQ (routing ?) problem [was Re: epic100 in current -ac kernels]

2001-02-26 Thread Arnd Bergmann


I noticed that there have been updates to epic100 again and just wanted
to note that the problem remains:
2.4.2-ac3 still crashes, but it works fine when I use the epic100.c
from 2.4.0-test9, which was the last working version for me.

Arnd 

On Thu, 15 Feb 2001, ARND BERGMANN wrote:

 Sorry for the delay, I could not get physical access to the machine
 for the last days.
 
 I was able to do some more testing today and found this:
 - The problem is not the IRQ /sharing/, after getting rid of all the
   other PCI cards, the problem was still there.
 - The only thing that seems to have any effect on the symptoms is the
   presence of the USB driver, either usb-uhci or uhci. I am not using
   USB at all. As described before, the system behaves is either of those
   ways:
* epic100 driver without DMA mapping (e.g. 2.4.0-ac9): normal operation
* driver with DMA mapping+USB driver loaded: lots of interrupts - slow
* driver with DMA mapping, USB driver not loaded: hang after ~2 seconds
 - I sometimes get 'spurious interrupt: IRQ7', even though no device is 
   connected there. Probably not important.
 
 On Sat, 10 Feb 2001, Francois Romieu wrote:
 
  
  The following informations may help:
  - motherboard type
 Asus A7V, onboard USB hub and Promise ATA/100 chip
 
  - bios revision
 Can't see right now, system was bought in October 2000
 I think it was 1.004, but I am not sure.
 
  - lspci -x 
 see attachment, this was when I ripped out sound, tv and scsi
 
  - 2.4.2pre3 + whatever recent ac epic100 = ?
 Still no improvement until latest -ac (2.4.1-ac13)
 
 Arnd 
 
 
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

test11: lockup when reading /proc/ide/hde/identify

2000-11-23 Thread ARND BERGMANN


Hi!

I think I found a bug in the IDE subsystem. When I do 'cat
/proc/ide/hde/identify', the system locks up completely, not
even Alt+RysRq+B helps. Everything else under /proc/ide works.
hdparm can cause the same symptoms, but I have not checked
when exactly it does so.

I have an Asus A7V mainboard with VIA 82C686A as first IDE
controller and an onboard Promise PDC20265 as second IDE
controller.
Both have a Fujitsu MPF3204AT as their primary master drive,
but the problem occurs only on the Promise adapter.

I have tried kernel 2.4.0-test11-pre6, test11-ac2 and 
ide.2.4.0-t11.1120, all with the same result, but I did not
try any older kernels, because I installed the machine
just two days ago.

Arnd 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

epic100 in current -ac kernels

2001-02-08 Thread ARND BERGMANN


There seems to be some movement in the driver and the latest one
is not working for me (again), so I'm giving a subjective status report
for the versions I have tried lately:

Working epic100 drivers:
 - 2.4.0
 - 2.4.0-ac9

Broken epic100 drivers:
 - 2.4.0-ac4
 - 2.4.1-ac2
 - 2.4.1-ac4

I have not yet looked at the source to find the problem, but the other
kernels between that each seem to contain one of the those versions above.
The symptom is always that after 'ifconfig eth0 up', the system slows down
to the point where I can hardly type on the keyboard and even 'ifconfig
eth0 down' takes serveral seconds (on an Athlon-800)!

The boot message is:
eth0: SMSC EPIC/100 83c170 at 0xd091e000, IRQ 11, 00:e0:29:6c:36:6f.
eth0: MII transceiver #3 control 3000 status 7849.
eth0: Autonegotiation advertising 01e1 link partner 0001.
epic100.c:v1.11 1/7/2001 Written by Donald Becker [EMAIL PROTECTED]
  http://www.scyld.com/network/epic100.html
 (unofficial 2.4.x kernel port, version 1.1.6, January 11, 2001)
PCI: Found IRQ 11 for device 00:0d.0
PCI: The same IRQ used for device 00:04.2
PCI: The same IRQ used for device 00:04.3
PCI: The same IRQ used for device 00:09.0  


The device on 00:04:[23] is a VT82C586B USB and on 00:09:0 an
Ensoniq 5880 AudioPCI (rev 02). I can not change the IRQ settings
right now without physical access to the machine (it is locked).

At least with some broken versions, I also got these messages in
syslog (every 4 seconds):
Feb  7 21:10:06 project kernel: NETDEV WATCHDOG: eth0: transmit timed out
Feb  7 21:10:06 project kernel: eth0: Transmit timeout using MII device,
Tx status 000b.
Feb  7 21:10:10 project kernel: NETDEV WATCHDOG: eth0: transmit timed out
Feb  7 21:10:10 project kernel: eth0: Transmit timeout using MII device,
Tx status 000b.
Feb  7 21:10:14 project kernel: NETDEV WATCHDOG: eth0: transmit timed out
Feb  7 21:10:14 project kernel: eth0: Transmit timeout using MII device,
Tx status 000b.  
...

The card is acoording to lspci:

00:0d.0 Ethernet controller: Standard Microsystems Corp [SMC] 83C170QF
(rev 08)
Subsystem: Standard Microsystems Corp [SMC]: Unknown device a020
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort-
TAbort- MAbort- SERR- PERR-
Latency: 32 (2000ns min, 7000ns max)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at 9800 [size=256]
Region 1: Memory at df80 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at unassigned [disabled] [size=64K]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0-,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-


Arnd 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: epic100 in current -ac kernels

2001-02-08 Thread ARND BERGMANN


On Thu, 8 Feb 2001, Francois Romieu wrote:

  
  Working epic100 drivers:
   - 2.4.0
   - 2.4.0-ac9
 
 Could you give a look at ac12 (fine here) ?
 
No, does not work, same problem.

Arnd 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: epic100 in current -ac kernels

2001-02-09 Thread ARND BERGMANN


On Fri, 9 Feb 2001, Francois Romieu wrote:

 ARND BERGMANN [EMAIL PROTECTED] écrit :
  On Thu, 8 Feb 2001, Francois Romieu wrote:
  

Working epic100 drivers:
 - 2.4.0
 - 2.4.0-ac9
   
   Could you give a look at ac12 (fine here) ?
   
  No, does not work, same problem.
 
 The modifications between ac9 and ac12 come from the new DMA 
 mapping.
What about 2.4.0-ac5? That had the same problem as -ac12. Did it also have
the new DMA mapping?

 They added a bug for the (already buggy ?) big-endian
 machines. I would be surprised that something has *always* been 
 missing in the driver and your hardware triggers it*. IMHO the culprit 
 is to be found elsewhere.
Yes, I'm pretty sure the problem is not only the epic100 driver, now that
I have done some more investigation. With the broken drivers (I tried
2.4.0-ac12 and 2.4.1-ac5), something generates an enourmous amount of
interrupts as soon as I run 'ifconfig eth0 up'. Within 10 seconds, I got
roughly 95 interrupts on IRQ11, instead of 30!
After disabling the usb-uhci (I was using the JE driver) in the BIOS
setup, the system reproducibly locked up hard a few seconds after
'ifconfig eth0 up' instead of just getting slow.
Unfortunately, I have no way to also disable the sound card, but at least
it does not make a change if the sound driver is loaded or not.

 I'd like to know what it's worth to share an irq with a pio audio card.
On Monday I can ask the system administrator for the keys so I can open
the machine and put the card into another slot. Right now, USB, sound and
network are hardwired to the same IRQ, that's how the system arrived here.

Arnd 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: IRQ (routing ?) problem [was Re: epic100 in current -ac kernels]

2001-02-15 Thread ARND BERGMANN


Sorry for the delay, I could not get physical access to the machine
for the last days.

I was able to do some more testing today and found this:
- The problem is not the IRQ /sharing/, after getting rid of all the
  other PCI cards, the problem was still there.
- The only thing that seems to have any effect on the symptoms is the
  presence of the USB driver, either usb-uhci or uhci. I am not using
  USB at all. As described before, the system behaves is either of those
  ways:
   * epic100 driver without DMA mapping (e.g. 2.4.0-ac9): normal operation
   * driver with DMA mapping+USB driver loaded: lots of interrupts - slow
   * driver with DMA mapping, USB driver not loaded: hang after ~2 seconds
- I sometimes get 'spurious interrupt: IRQ7', even though no device is 
  connected there. Probably not important.

On Sat, 10 Feb 2001, Francois Romieu wrote:

 
 The following informations may help:
 - motherboard type
Asus A7V, onboard USB hub and Promise ATA/100 chip

 - bios revision
Can't see right now, system was bought in October 2000
I think it was 1.004, but I am not sure.

 - lspci -x 
see attachment, this was when I ripped out sound, tv and scsi

 - 2.4.2pre3 + whatever recent ac epic100 = ?
Still no improvement until latest -ac (2.4.1-ac13)

Arnd 




00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Flags: bus master, medium devsel, latency 0
Memory at e700 (32-bit, prefetchable) [size=16M]
Capabilities: [a0] AGP version 2.0
Capabilities: [c0] Power Management version 2
00: 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00
10: 08 00 00 e7 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00

00:01.0 PCI bridge: VIA Technologies, Inc.: Unknown device 8305 (prog-if 00 [Normal 
decode])
Flags: bus master, 66Mhz, medium devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: e280-e3df
Prefetchable memory behind bridge: e3f0-e6ff
Capabilities: [80] Power Management version 2
00: 06 11 05 83 07 00 30 22 00 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 d0 00 00
20: 80 e2 d0 e3 f0 e3 f0 e6 00 00 00 00 00 00 00 00
30: 00 00 00 00 80 00 00 00 00 00 00 00 00 00 08 00

00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Flags: bus master, stepping, medium devsel, latency 0
00: 06 11 86 06 87 00 10 02 22 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) (prog-if 
8a [Master SecP PriP])
Flags: bus master, medium devsel, latency 32
I/O ports at d800 [size=16]
Capabilities: [c0] Power Management version 2
00: 06 11 71 05 07 00 90 02 10 8a 01 01 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 c0 00 00 00 00 00 00 00 ff 00 00 00

00:04.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10) (prog-if 00 
[UHCI])
Subsystem: Unknown device 0925:1234
Flags: bus master, medium devsel, latency 32, IRQ 11
I/O ports at d400 [size=32]
Capabilities: [80] Power Management version 2
00: 06 11 38 30 17 00 10 02 10 00 03 0c 08 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 d4 00 00 00 00 00 00 00 00 00 00 25 09 34 12
30: 00 00 00 00 80 00 00 00 00 00 00 00 0b 04 00 00

00:04.3 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10) (prog-if 00 
[UHCI])
Subsystem: Unknown device 0925:1234
Flags: bus master, medium devsel, latency 32, IRQ 11
I/O ports at d000 [size=32]
Capabilities: [80] Power Management version 2
00: 06 11 38 30 17 00 10 02 10 00 03 0c 08 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 d0 00 00 00 00 00 00 00 00 00 00 25 09 34 12
30: 00 00 00 00 80 00 00 00 00 00 00 00 0b 04 00 00

00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30)
Flags: medium devsel, IRQ 9
Capabilities: [68] Power Management version 2
00: 06 11 57 30 00 00 90 02 30 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 68 00 00 00 00 00 00 00 00 00 00 00

00:0c.0 Ethernet controller: Standard Microsystems Corp [SMC] 83C170QF (rev 08)
Subsystem: Standard Microsystems Corp [SMC]: Unknown device a020
Flags: bus master, fast devsel, latency 32, IRQ 10
I/O ports at a400 [size=256]

Re: epic100 aka smc etherpower II

2001-03-31 Thread Arnd Bergmann


Daniel Nofftz [EMAIL PROTECTED] wrote:

 i can`t get my smc etherpower ii working with the 2.4.3 kernel.
 now i have downgraded to 2.4.2 and it works again ...
 does anyone have a suggestion, what the problem is ?

Looks to me like the problem I had in Febuary, see the thread
"epic100 in current -ac kernels" at 
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg28523.html
After I had upgraded my BIOS, the problems were gone and I stopped
looking into it. The DMA mapping code first introduced in 2.4.0-ac2
(smallest diff here) originally triggered the bug, which had different
symptoms depending on the configuration of the chipset.
Note that I have a VIA VT8363 (KT133) chipset while this is a VT82C595 
(VP2) chipset, so it is appearantly not limited to one very specific 
configuration.

Arnd 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] MTD driver for MMC cards

2007-04-15 Thread Arnd Bergmann

This is a new version of the driver I posted back in January. I now
have hardware to test it and fixed a number of bugs, most of which
are the ones that Pierre told me about in the first place.

It now seems to work fine with the mtdblock driver, which of course
it entirely pointless.

I've tried using it with jffs2 once, but get an immediate oops,
which still needs some investigation.

I'm also not sure what to do about SDHC media, but probably the
easiest solutions is to disallow them with this driver -- the
mtd layer doesn't deal with media larger than 4GB anyway at
this point.

There is also still some need for performance testing. Jörn
brought up the point that if a specific card can't have multiple
open erase block simulateously, it's rather pointless for
logfs. It might still be useful to use jffs2 on those cards,
because IFAIK that only writes to one erase block at any
time.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]

---

Index: olpc-2.6/drivers/mmc/mmc.c
===
--- olpc-2.6.orig/drivers/mmc/mmc.c
+++ olpc-2.6/drivers/mmc/mmc.c
@@ -621,6 +621,7 @@ static void mmc_decode_csd(struct mmc_ca
csd-r2w_factor = UNSTUFF_BITS(resp, 26, 3);
csd-write_blkbits = UNSTUFF_BITS(resp, 22, 4);
csd-write_partial = UNSTUFF_BITS(resp, 21, 1);
+   csd-erase_blksize = UNSTUFF_BITS(resp, 39, 7);
break;
case 1:
/*
@@ -649,6 +650,8 @@ static void mmc_decode_csd(struct mmc_ca
csd-r2w_factor = 4; /* Unused */
csd-write_blkbits = 9;
csd-write_partial = 0;
+#warning need to read au_size for sdhc
+   csd-erase_blksize = 0; // 8192  au_size;
break;
default:
printk(%s: unrecognised CSD structure version %d\n,
@@ -691,6 +694,8 @@ static void mmc_decode_csd(struct mmc_ca
csd-r2w_factor = UNSTUFF_BITS(resp, 26, 3);
csd-write_blkbits = UNSTUFF_BITS(resp, 22, 4);
csd-write_partial = UNSTUFF_BITS(resp, 21, 1);
+   csd-erase_blksize = (UNSTUFF_BITS(resp, 37, 5) + 1) *
+   (UNSTUFF_BITS(resp, 42, 5) + 1);
}
 }
 
Index: olpc-2.6/include/linux/mmc/card.h
===
--- olpc-2.6.orig/include/linux/mmc/card.h
+++ olpc-2.6/include/linux/mmc/card.h
@@ -32,6 +32,7 @@ struct mmc_csd {
unsigned intmax_dtr;
unsigned intread_blkbits;
unsigned intwrite_blkbits;
+   unsigned interase_blksize;
unsigned intcapacity;
unsigned intread_partial:1,
read_misalign:1,
Index: olpc-2.6/drivers/mmc/mmc_mtd.c
===
--- /dev/null
+++ olpc-2.6/drivers/mmc/mmc_mtd.c
@@ -0,0 +1,366 @@
+/*
+ * MTD driver for MMC cards
+ */
+#include linux/init.h
+#include linux/module.h
+#include linux/mmc/card.h
+#include linux/mmc/protocol.h
+#include linux/mmc/host.h
+#include linux/scatterlist.h
+#include linux/mtd/mtd.h
+
+/*
+ * check if a write command was completed correctly, must be called
+ * with host claimed.
+ */
+static int mmc_mtd_get_status(struct mmc_card *card)
+{
+   int err;
+   struct mmc_command cmd;
+
+   do {
+   cmd = (struct mmc_command) {
+   .opcode = MMC_SEND_STATUS,
+   .arg = card-rca  16,
+   .flags = MMC_RSP_R1 | MMC_CMD_AC,
+   };
+
+   err = mmc_wait_for_cmd(card-host, cmd, 5);
+   if (err) {
+   dev_err(card-dev, error %d requesting status\n, 
err);
+   break;
+   }
+   } while (!(cmd.resp[0]  R1_READY_FOR_DATA));
+
+   return err;
+}
+
+/*
+ * erase a range of erase groups aligned to mtd-erase_size
+ */
+static int mmc_mtd_erase(struct mtd_info *mtd, struct erase_info *instr)
+{
+   struct mmc_card *card = mtd-priv;
+   struct mmc_command cmd[3] = { {
+   .opcode = MMC_ERASE_GROUP_START,
+   .arg = instr-addr,
+   .flags = MMC_RSP_R1 | MMC_CMD_AC,
+   }, {
+   .opcode = MMC_ERASE_GROUP_END,
+   .arg = instr-addr + instr-len,
+   .flags = MMC_RSP_R1 | MMC_CMD_AC,
+   }, {
+   .opcode = MMC_ERASE,
+   .flags = MMC_RSP_R1B | MMC_CMD_AC,
+   },
+   };
+   int err, i;
+
+   dev_dbg(card-dev, %s: from %d len %d\n, __FUNCTION__,
+   instr-addr, instr-len);
+
+   instr-state = MTD_ERASING;
+   err = 0

Re: [mmc] alternative TI FM MMC/SD driver for 2.6.21-rc7

2007-04-19 Thread Arnd Bergmann

On Thursday 19 April 2007, Sergey Yanovich wrote:
 The device is present in many notebooks. Notebooks depend heavily on 
 suspend/resume functionality. tifm_core/7xx1/sd family is an ambitous, 
 but uncompleted project. It used to crash on resuming, or hang up on 
 suspending. A less common failure used to be trigerred by a fast card 
 insert/removal sequence. Finally, tifm_sd module needs to be manually 
 inserted.

As very general comments, you should have the maintainer of the subsystem
(Pierre in this case) on Cc when posting a driver, and you should include
the patch inline in your mail, see Documentation/SubmittingPatches.

More specific to your patch:

You should include the Makefile and Kconfig changes in the same patch/mail,
no point splitting these out.

Don't define your own DBG macro, instead use the predefined dev_dbg()
that has a similar definition.

Your mmc_tifm_irq_chip() function does a _very_ long delay of 100
miliseconds. This is normally not acceptable, since it is a noticeable
time in which the system is completely unresponsive. Maybe you can convert
the tasklet to a workqueue, which lets you call msleep instead of mdelay.

Your use of pci_map_sg() looks wrong, you simply can't assume that the
return value is '1' in general. I've stumbled over that same problem
in the sdhci driver, so it may be inherent to the mmc layer and not
be driver specific.

Other than that, your driver looks pretty good to me.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/8] Kconfig: refine depends statements.

2007-04-21 Thread Arnd Bergmann

On Friday 20 April 2007, Martin Schwidefsky wrote:
 diff -urpN linux-2.6/drivers/auxdisplay/Kconfig 
 linux-2.6-patched/drivers/auxdisplay/Kconfig
 --- linux-2.6/drivers/auxdisplay/Kconfig2007-04-19 15:23:55.0 
 +0200
 +++ linux-2.6-patched/drivers/auxdisplay/Kconfig2007-04-19 
 15:49:17.0 +0200
 @@ -6,6 +6,7 @@
  #
  
  menu Auxiliary Display support
 +   depends on PARPORT_PC
  
  config KS0108
 tristate KS0108 LCD Controller

I would guess that this actually depends on PARPORT, not PARPORT_PC.

The rest of this patch looks good.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/8] Kconfig: unwanted menus for s390.

2007-04-21 Thread Arnd Bergmann

On Friday 20 April 2007, Martin Schwidefsky wrote:
 diff -urpN linux-2.6/drivers/char/ipmi/Kconfig 
 linux-2.6-patched/drivers/char/ipmi/Kconfig
 --- linux-2.6/drivers/char/ipmi/Kconfig 2007-02-04 19:44:54.0 +0100
 +++ linux-2.6-patched/drivers/char/ipmi/Kconfig 2007-04-19 15:49:55.0 
 +0200
 @@ -3,6 +3,8 @@
  #
  
  menu IPMI
 +   depends on !S390
 +
  config IPMI_HANDLER
         tristate 'IPMI top-level message handler'
         help

I think I made this comment the last time we discussed this topic, but don't
remember the exact outcome.

I would prefer to not have 'depends on !S390' but rather 'depends on MMIO',
because that is what really drives stuff like IPMI: they expect the device
to be reachable through the use of ioremap or inX/outX instructions, which
don't exist on s390.

While it's unlikely that another architecture has the same restriction,
it expresses much clearer what you mean.

In drivers/Kconfig, you can then simply add a

config MMIO
def_bool !S390

There are a few exceptions though, that I think should not depend on MMIO:

 --- linux-2.6/drivers/dma/Kconfig 2007-04-19 15:24:33.0 +0200
 +++ linux-2.6-patched/drivers/dma/Kconfig 2007-04-19 15:49:55.0 
 +0200
 @@ -3,6 +3,7 @@
  #
  
  menu DMA Engine support
 + depends on !S390
  
  config DMA_ENGINE
   bool Support for DMA engines

I'd leave the menu enabled. If the DMA engine infrastructure becomes more widely
used, you may want to add an implementation for s390 using milicoded 
instructions
like xor-string or copy-page.

 diff -urpN linux-2.6/drivers/input/Kconfig 
 linux-2.6-patched/drivers/input/Kconfig
 --- linux-2.6/drivers/input/Kconfig   2007-02-04 19:44:54.0 +0100
 +++ linux-2.6-patched/drivers/input/Kconfig   2007-04-19 15:49:55.0 
 +0200
 @@ -3,6 +3,7 @@
  #
  
  menu Input device support
 + depends on !S390
  
  config INPUT
   tristate Generic input layer (needed for keyboard, mouse, ...) if 
 EMBEDDED

Probably leave this as !S390. One could imagine channel-attached input devices
or the idea of intepreting a terminal as an input device, but no driver 
currently
does and probably never will.

 diff -urpN linux-2.6/drivers/isdn/Kconfig 
 linux-2.6-patched/drivers/isdn/Kconfig
 --- linux-2.6/drivers/isdn/Kconfig2007-02-04 19:44:54.0 +0100
 +++ linux-2.6-patched/drivers/isdn/Kconfig2007-04-19 15:49:55.0 
 +0200
 @@ -3,6 +3,7 @@
  #
  
  menu ISDN subsystem
 + depends on !S390
  
  config ISDN
   tristate ISDN support

Same here, actually there was an IBM 2216 ISDN adapter with channel attachment,
but I don't think anybody wants to add a driver for that one.

 diff -urpN linux-2.6/drivers/misc/Kconfig 
 linux-2.6-patched/drivers/misc/Kconfig
 --- linux-2.6/drivers/misc/Kconfig2007-04-19 15:24:35.0 +0200
 +++ linux-2.6-patched/drivers/misc/Kconfig2007-04-19 15:49:55.0 
 +0200
 @@ -3,6 +3,7 @@
  #
  
  menu Misc devices
 + depends on !S390
  
  config IBM_ASM
   tristate Device driver for IBM RSA service processor

Maybe just leave the menu open, all drivers in it are already depending on PCI
or similar and someone might add a driver that does work on s390 here.

 diff -urpN linux-2.6/drivers/net/phy/Kconfig 
 linux-2.6-patched/drivers/net/phy/Kconfig
 --- linux-2.6/drivers/net/phy/Kconfig 2007-02-04 19:44:54.0 +0100
 +++ linux-2.6-patched/drivers/net/phy/Kconfig 2007-04-19 15:49:55.0 
 +0200
 @@ -3,6 +3,7 @@
  #
  
  menu PHY device support
 + depends on !S390
  
  config PHYLIB
   tristate PHY Device support and infrastructure

Also depends on !S390, not MMIO. A future network adapter might give you access
to the phy device through other means than MMIO.

 diff -urpN linux-2.6/drivers/rtc/Kconfig linux-2.6-patched/drivers/rtc/Kconfig
 --- linux-2.6/drivers/rtc/Kconfig 2007-04-19 15:24:39.0 +0200
 +++ linux-2.6-patched/drivers/rtc/Kconfig 2007-04-19 15:49:55.0 
 +0200
 @@ -3,6 +3,7 @@
  #
  
  menu Real Time Clock
 + depends on !S390
  
  config RTC_LIB
   tristate

Applications might actually want to use the RTC interface to access the system 
time
or get accurate timers, but the rtc drivers are all very dependant on either 
MMIO
or I2C. Not sure what would be best here.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/8] Kconfig: unwanted config options for s390.

2007-04-21 Thread Arnd Bergmann

On Friday 20 April 2007, Martin Schwidefsky wrote:

 diff -urpN linux-2.6/drivers/char/Kconfig 
 linux-2.6-patched/drivers/char/Kconfig
 --- linux-2.6/drivers/char/Kconfig2007-04-19 15:49:51.0 +0200
 +++ linux-2.6-patched/drivers/char/Kconfig2007-04-19 15:50:50.0 
 +0200
 @@ -6,6 +6,7 @@ menu Character devices
  
  config VT
   bool Virtual terminal if EMBEDDED
 + depends on !S390
   select INPUT
   default y if !VIOCONS
   ---help---

ok

 @@ -81,6 +82,7 @@ config VT_HW_CONSOLE_BINDING
  
  config SERIAL_NONSTANDARD
   bool Non-standard serial port support
 + depends on !S390
   ---help---
 Say Y here if you have any non-standard serial boards -- boards
 which aren't supported using the standard dumb serial driver.

depends on MMIO

 @@ -774,7 +776,7 @@ config NVRAM
  
  config RTC
   tristate Enhanced Real Time Clock Support
 - depends on !PPC  !PARISC  !IA64  !M68K  (!SPARC || PCI)  !FRV 
  !ARM  !SUPERH
 + depends on !PPC  !PARISC  !IA64  !M68K  (!SPARC || PCI)  !FRV 
  !ARM  !SUPERH  !S390
   ---help---
 If you say Y here and create a character special file /dev/rtc with
 major number 10 and minor number 135 using mknod (man mknod), you
 @@ -822,7 +824,7 @@ config SGI_IP27_RTC
  
  config GEN_RTC
   tristate Generic /dev/rtc emulation
 - depends on RTC!=y  !IA64  !ARM  !M32R  !SPARC  !FRV
 + depends on RTC!=y  !IA64  !ARM  !M32R  !SPARC  !FRV  !S390
   ---help---
 If you say Y here and create a character special file /dev/rtc with
 major number 10 and minor number 135 using mknod (man mknod), you

ok.

this one is bad in general and should probably be a select from the 
architecture,
but that should not stop you from adding another architecture...

 @@ -878,6 +880,7 @@ config DTLK
  
  config R3964
   tristate Siemens R3964 line discipline
 + depends on !S390
   ---help---
 This driver allows synchronous communication with devices using the
 Siemens R3964 packet protocol. Unless you are dealing with special

Does it build? I don't see a point disabling this one just because there are
no users. Most architectures also don't have users for this one, but it
doesn't hurt be able to build it using allyesconfig.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/8] Kconfig: silicon backplane dependency.

2007-04-21 Thread Arnd Bergmann

On Friday 20 April 2007, Martin Schwidefsky wrote:

 From: Martin Schwidefsky [EMAIL PROTECTED]

 Make the Sonics Silicon Backplane menu dependent on the two buses
 it can be found on.
 Goes on top of git-wireless.patch.

 Cc: Michael Buesch [EMAIL PROTECTED]
 Cc: John W. Linville [EMAIL PROTECTED]
 Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED]
 ---

  drivers/ssb/Kconfig |    1 +
  1 files changed, 1 insertion(+)

 diff -urpN linux-2.6/drivers/ssb/Kconfig linux-2.6-patched/drivers/ssb/Kconfig
 --- linux-2.6/drivers/ssb/Kconfig   2007-04-19 15:24:40.0 +0200
 +++ linux-2.6-patched/drivers/ssb/Kconfig   2007-04-19 15:55:44.0 
 +0200
 @@ -1,4 +1,5 @@
  menu Sonics Silicon Backplane
 +   depends on PCI || PCMCIA

No, this doesn't look right. There are other devices that come with
SiliconBackplane but are not PCI or PCMCIA style devices.

I'd make this 'depends on MMIO' as well if you add that option.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/8] Kconfig: unwanted menus for s390.

2007-04-21 Thread Arnd Bergmann

On Sunday 22 April 2007, Arnd Bergmann wrote:
 I would prefer to not have 'depends on !S390' but rather 'depends on MMIO',
 because that is what really drives stuff like IPMI: they expect the device
 to be reachable through the use of ioremap or inX/outX instructions, which
 don't exist on s390.
 
 While it's unlikely that another architecture has the same restriction,
 it expresses much clearer what you mean.
 
 In drivers/Kconfig, you can then simply add a
 
 config MMIO
 def_bool !S390

I just saw that we already have an option like that, with a slightly different
name.

arch/s390/Kconfig contains

config NO_IOMEM
def_bool y

and lib/Kconfig contains

config HAS_IOMEM
boolean
depends on !NO_IOMEM
default y

You should probably just use one of these two to disable any driver that
uses ioremap or similar.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fw: [PATCH][RFC] PCMCIA support for 8xx using platform devices

2007-04-22 Thread Arnd Bergmann

On Sunday 22 April 2007, Vitaly Bordug wrote:
 This utilizes PCMCIA on mpc885ads and mpc866ads from arch/powerpc. In the
 new approach, direct IMMR accesses from within drivers/ were totally
 eliminated, that requires hardware_enable, hardware_disable, voltage_set
 board-specific functions to be moved over to BSP code section   
 (arch/powerpc/platforms/8xx in 885 case). There is just no way to have
 both arch/ppc and arch/powerpc approaches to work simultaneously because
 of that.  

Maybe I'm missing a key issue here, but what's the point of adding
more platform_devices for stuff that is already in the device tree?
Shouldn't this be made an of_platform_driver instead so you can
use the existing of_device directly?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/8] Kconfig: silicon backplane dependency.

2007-04-23 Thread Arnd Bergmann

On Monday 23 April 2007, Martin Schwidefsky wrote:
 The current Kconfig code does not check all select statements if they
 can be enabled before allowing the config option that does the select.
 So the rule for using select statements is that the depends line of the
 config option that selects another config option needs to be at least as
 restrictive as the depends line of the selected option. Hence I'll add
 the HAS_IOMEM depends to B44 as well. Okay ?

Isn't B44 already behind a WIRELESS or IEEE80211 or similar option that
can't be selected on s390?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/8] Kconfig: silicon backplane dependency.

2007-04-23 Thread Arnd Bergmann

On Monday 23 April 2007, Martin Schwidefsky wrote:
 
  Isn't B44 already behind a WIRELESS or IEEE80211 or similar option that
  can't be selected on s390?
 
 No, the option can be found in drivers/net/Kconfig under menu Ethernet
 (10 or 100Mbit).

Ah, I was confusing it with b43.

Depends on HAS_IOMEM sounds good then. I'd prefer to make it
'depends on SSB' instead of 'select SSB', but I don't want to get into
that argument ;-)

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RESEND] ehea: fix for dlpar and sysfs entries

2007-04-23 Thread Arnd Bergmann

On Monday 23 April 2007, Jan-Bernd Themann wrote:
 - dlpar fix: 
 certain resources may only be allocated when first
 logical port is available, and must be removed when
 last logical port has been removed
 
 - sysfs entries:
 create symbolic link from each logical port to ehea driver
 

I can't see anything wrong with the patch contents, but if you know that there
are two changes, you really should make it two separate patches.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-23 Thread Arnd Bergmann

On Monday 23 April 2007, Martin Schwidefsky wrote:
 I've added the results of the review to the Kconfig cleanup patches
 for s390. Patch #2 has been split, one half has all the HAS_IOMEM
 depends lines the other the remaining !S390 depends lines.
 

They all look good to me now
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/7] revoke: wire up s390 system calls

2007-03-09 Thread Arnd Bergmann

On Friday 09 March 2007, Pekka J Enberg wrote:
 
 From: Serge E. Hallyn [EMAIL PROTECTED]
 
 Make revokeat and frevoke system calls available to user-space on s390.
 
 Signed-off-by: Serge E. Hallyn [EMAIL PROTECTED]
 Signed-off-by: Pekka Enberg [EMAIL PROTECTED]

Looks good to me, but you really should through Martin, since he
has an overview of what syscall numbers may already be assigned
some another patch he has queued up.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH x86 for review III] [1/29] i386: avoid gcc extension

2007-02-13 Thread Arnd Bergmann

On Monday 12 February 2007 17:51, Andi Kleen wrote:
 setcc() in math-emu is written as a gcc extension statement expression
 macro that returns a value.  However, it's not used that way and it's not
 needed like that, so just make it a do-while non-extension macro so that we
 don't use an extension when it's not needed.
 

The patch looks good but it doesn't match the description any more, since
you now use a function...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Open Firmware serial port driver

2007-02-13 Thread Arnd Bergmann

This can be used for serial ports that are connected to an
OF platform bus but are not autodetected by the lecacy
serial support.
It will automatically take over devices that come from the
legacy serial detection, which usually is only one device.

In some cases, rtas may be set up to use the serial port
in the firmware, which allows easier debugging before probing
the serial ports. In this case, the used-by-rtas property
must be set by the firmware. This patch also adds code to the
legacy serial driver to check for this.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]

---

Who will handle this driver? It is powerpc specific and
hooks into powerpc code at one place, but it's also a 
new driver for the (orphaned) serial layer.

Could either Paul or Andrew merge this, or whoever
else feels responsible?

 arch/powerpc/kernel/legacy_serial.c |   15 +
 drivers/serial/Kconfig  |   10 +
 drivers/serial/Makefile |1
 drivers/serial/of_serial.c  |  143 
 4 files changed, 169 insertions(+)

Index: linux-2.6/drivers/serial/Makefile
===
--- linux-2.6.orig/drivers/serial/Makefile
+++ linux-2.6/drivers/serial/Makefile
@@ -58,3 +58,4 @@ obj-$(CONFIG_SERIAL_SGI_IOC3) += ioc3_se
 obj-$(CONFIG_SERIAL_ATMEL) += atmel_serial.o
 obj-$(CONFIG_SERIAL_UARTLITE) += uartlite.o
 obj-$(CONFIG_SERIAL_NETX) += netx-serial.o
+obj-$(CONFIG_SERIAL_OF_PLATFORM) += of_serial.o
Index: linux-2.6/drivers/serial/of_serial.c
===
--- /dev/null
+++ linux-2.6/drivers/serial/of_serial.c
@@ -0,0 +1,143 @@
+/*
+ *  Serial Port driver for Open Firmware platform devices
+ *
+ *Copyright (C) 2006 Arnd Bergmann [EMAIL PROTECTED], IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+#include linux/init.h
+#include linux/module.h
+#include linux/serial_core.h
+#include linux/serial_8250.h
+
+#include asm/of_platform.h
+#include asm/prom.h
+
+/*
+ * Fill a struct uart_port for a given device node
+ */
+static int __devinit of_platform_serial_setup(struct of_device *ofdev,
+   int type, struct uart_port *port)
+{
+   struct resource resource;
+   struct device_node *np = ofdev-node;
+   const unsigned int *clk, *spd;
+   int ret;
+
+   memset(port, 0, sizeof *port);
+   spd = get_property(np, current-speed, NULL);
+   clk = get_property(np, clock-frequency, NULL);
+   if (!clk) {
+   dev_warn(ofdev-dev, no clock-frequency property set\n);
+   return -ENODEV;
+   }
+
+   ret = of_address_to_resource(np, 0, resource);
+   if (ret) {
+   dev_warn(ofdev-dev, invalid address\n);
+   return ret;
+   }
+
+   spin_lock_init(port-lock);
+   port-mapbase = resource.start;
+   port-irq = irq_of_parse_and_map(np, 0);
+   port-iotype = UPIO_MEM;
+   port-type = type;
+   port-uartclk = *clk;
+   port-flags = UPF_SHARE_IRQ | UPF_BOOT_AUTOCONF | UPF_IOREMAP;
+   port-dev = ofdev-dev;
+   port-custom_divisor = *clk / (16 * (*spd));
+
+   return 0;
+}
+
+/*
+ * Try to register a serial port
+ */
+static int __devinit of_platform_serial_probe(struct of_device *ofdev,
+   const struct of_device_id *id)
+{
+   struct uart_port port;
+   int port_type;
+   int ret;
+
+   if (of_find_property(ofdev-node, used-by-rtas, NULL))
+   return -EBUSY;
+
+   port_type = (unsigned long)id-data;
+   ret = of_platform_serial_setup(ofdev, port_type, port);
+   if (ret)
+   goto out;
+
+   switch (port_type) {
+   case PORT_UNKNOWN:
+   dev_info(ofdev-dev, Unknown serial port found, 
+   attempting to use 8250 driver\n);
+   /* fallthrough */
+   case PORT_8250 ... PORT_MAX_8250:
+   ret = serial8250_register_port(port);
+   break;
+   default:
+   /* need to add code for these */
+   ret = -ENODEV;
+   break;
+   }
+   if (ret  0)
+   goto out;
+
+   ofdev-dev.driver_data = (void *)(unsigned long)ret;
+   return 0;
+out:
+   irq_dispose_mapping(port.irq);
+   return ret;
+}
+
+/*
+ * Release a line
+ */
+static int of_platform_serial_remove(struct of_device *ofdev)
+{
+   int line = (unsigned long)ofdev-dev.driver_data;
+   serial8250_unregister_port(line);
+   return 0;
+}
+
+/*
+ * A few common types, add more as needed.
+ */
+static struct of_device_id __devinitdata of_platform_serial_table[] = {
+   { .type = serial, .compatible = ns8250,   .data

Re: export of_find_property

2007-02-14 Thread Arnd Bergmann

On Wednesday 14 February 2007 22:54, Dave Jones wrote:
 Without this, building drivers/serial/of_serial.c as a module fails.
 
 WARNING: .of_find_property [drivers/serial/of_serial.ko] undefined!
 
 Signed-off-by: Dave Jones [EMAIL PROTECTED]

Acked-by: Arnd Bergmann [EMAIL PROTECTED]

Sorry about that one. This was introduced by a last-minute change,
and I didn't retest building as a module with it.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

2007-02-15 Thread Arnd Bergmann

On Thursday 15 February 2007 00:52, Carl Love wrote:


 --- linux-2.6.20-rc1.orig/arch/powerpc/oprofile/Kconfig   2007-01-18 
 16:43:14.0 -0600
 +++ linux-2.6.20-rc1/arch/powerpc/oprofile/Kconfig2007-02-13 
 19:04:46.271028904 -0600
 @@ -7,7 +7,8 @@
  
  config OPROFILE
   tristate OProfile system profiling (EXPERIMENTAL)
 - depends on PROFILING
 + default m
 + depends on SPU_FS  PROFILING
   help
 OProfile is a profiling system capable of profiling the
 whole system, include the kernel, kernel modules, libraries,

Milton already commented on this being wrong. I think what you want
is
depends on PROFILING  (SPU_FS = n || SPU_FS)

that should make sure that when SPU_FS=y that OPROFILE can not be 'm'.

 @@ -15,3 +16,10 @@
  
 If unsure, say N.
  
 +config OPROFILE_CELL
 + bool OProfile for Cell Broadband Engine
 + depends on SPU_FS  OPROFILE
 + default y
 + help
 +   OProfile for Cell BE requires special support enabled
 +   by this option.

You should at least mention that this allows profiling the spus.
  
 +#define EFWCALL  ENOSYS /* Use an existing error number that is as
 +  * close as possible for a FW call that failed.
 +  * The probability of the call failing is
 +  * very low.  Passing up the error number
 +  * ensures that the user will see an error
 +  * message saying OProfile did not start.
 +  * Dmesg will contain an accurate message
 +  * about the failure.
 +  */

ENOSYS looks wrong though. It would appear to the user as if the oprofile
function in the kernel was not present. I'd suggest EIO, and not use 
an extra define for that.


  static int
  rtas_ibm_cbe_perftools(int subfunc, int passthru,
  void *address, unsigned long length)
  {
   u64 paddr = __pa(address);
  
 - return rtas_call(pm_rtas_token, 5, 1, NULL, subfunc, passthru,
 -  paddr  32, paddr  0x, length);
 + pm_rtas_token = rtas_token(ibm,cbe-perftools);
 +
 + if (unlikely(pm_rtas_token == RTAS_UNKNOWN_SERVICE)) {
 + printk(KERN_ERR
 +%s: rtas token ibm,cbe-perftools unknown\n,
 +__FUNCTION__);
 + return -EFWCALL;
 + } else {
 +
 + return rtas_call(pm_rtas_token, 5, 1, NULL, subfunc, 
 +  passthru, paddr  32, paddr  0x, length); 
 + }
  }

Are you now reading the rtas token every time you call rtas? that seems
like a waste of time.


 +#define size 24
 +#define ENTRIES  (0x18) /* 256 */
 +#define MAXLFSR  0xFF
 +
 +int initial_lfsr[] =
 +{16777215, 3797240, 13519805, 11602690, 6497030, 7614675, 2328937, 2889445,
 + 12364575, 8723156, 2450594, 16280864, 14742496, 10904589, 6434212, 4996256,
 + 5814270, 13014041, 9825245, 410260, 904096, 15151047, 15487695, 3061843,
 + 16482682, 7938572, 4893279, 9390321, 4320879, 5686402, 1711063, 10176714,
 + 4512270, 1057359, 16700434, 5731602, 2070114, 16030890, 1208230, 15603106,
 + 11857845, 6470172, 1362790, 7316876, 8534496, 1629197, 10003072, 1714539,
 + 1814669, 7106700, 5427154, 3395151, 3683327, 12950450, 16620273, 12122372,
 + 7194999, 9952750, 3608260, 13604295, 2266835, 14943567, 7079230, 777380,
 + 4516801, 1737661, 8730333, 13796927, 3247181, 9950017, 3481896, 16527555,
 + 13116123, 14505033, 9781119, 4860212, 7403253, 13264219, 12269980, 100120,
 + 664506, 607795, 8274553, 13133688, 6215305, 13208866, 16439693, 3320753,
 + 8773582, 13874619, 1784784, 4513501, 11002978, 9318515, 3038856, 14254582,
 + 15484958, 15967857, 13504461, 13657322, 14724513, 13955736, 5695315, 
 7330509,
 + 12630101, 6826854, 439712, 4609055, 13288878, 1309632, 4996398, 11392266,
 + 793740, 7653789, 2472670, 14641200, 5164364, 5482529, 10415855, 1629108,
 + 2012376, 13661123, 14655718, 9534083, 16637925, 2537745, 9787923, 12750103,
 + 4660370, 3283461, 14862772, 7034955, 6679872, 8918232, 6506913, 103649,
 + 6085577, 13324033, 14251613, 11058220, 11998181, 3100233, 468898, 7104918,
 + 12498413, 14408165, 1208514, 15712321, 3088687, 14778333, 3632503, 11151952,
 + 98896, 9159367, 8866146, 4780737, 4925758, 12362320, 4122783, 8543358,
 + 7056879, 10876914, 6282881, 1686625, 5100373, 4573666, 9265515, 13593840,
 + 5853060, 110, 4237111, 1576, 14344137, 4608332, 6590210, 13745050,
 + 10916568, 12340402, 7145275, 4417153, 2300360, 12079643, 7608534, 15238251,
 + 4947424, 7014722, 3984546, 7168073, 10759589, 16293080, 3757181, 4577717,
 + 5163790, 2488841, 4650617, 3650022, 5440654, 1814617, 6939232, 15540909,
 + 501788, 1060986, 5058235, 5078222, 3734500, 10762065, 390862, 5172712,
 + 1070780, 7904429, 1669757, 3439997, 2956788, 14944927, 12496638, 994152,
 + 8901173, 11827497, 4268056, 15725859, 1694506,

Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

2007-02-15 Thread Arnd Bergmann

On Thursday 15 February 2007 17:15, Maynard Johnson wrote:
 +void spu_set_profile_private(struct spu_context * ctx, void * profile_info,
 +      struct kref * prof_info_kref,
 +      void (* prof_info_release) (struct kref * kref))
 +{
 + ctx-profile_private = profile_info;
 + ctx-prof_priv_kref = prof_info_kref;
 + ctx-prof_priv_release = prof_info_release;
 +}
 +EXPORT_SYMBOL_GPL(spu_set_profile_private);
     
 
 
 I think you don't need the profile_private member here, if you just use
 container_of with ctx-prof_priv_kref in all users.
   
 
 Sorry, I don't follow. We want the profile_private to be stored in the 
 spu_context, don't we?  How else would I be able to do that?  And 
 besides, wouldn't container_of need the struct name of profile_private?  
 SPUFS doesn't have access to the type.

The idea was to have spu_get_profile_private return the kref pointer,
and then change the user of that to do

+   if (!spu_info[spu_num]  the_spu) {
+   spu_info[spu_num] = container_of(
+   spu_get_profile_private(the_spu-ctx),
+   struct cached_info, cache_kref);
+   if (spu_info[spu_num])
+   kref_get(spu_info[spu_num]-cache_ref);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

2007-02-15 Thread Arnd Bergmann

On Thursday 15 February 2007 21:21, Carl Love wrote:

 I have done some quick measurements.  The above method limits the loop
 to at most 2^16 iterations.  Based on running the algorithm in user
 space, it takes about 3ms of computation time to do the loop 2^16 times.
 
 At the vary least, we need to put the resched in say every 10,000
 iterations which would be about every 0.5ms.  Should we do a resched
 more often?  

Yes, just to be on the safe side, I'd suggest to do it every 1000
iterations.
 
 Additionally we could up the size of the table to 512 which would reduce
 the maximum time to about 1.5ms.  What do people think about increasing
 the table size?

No, that won't help too much. I'd say 256 or 128 entries is the most
we should have.

 As for using a logarithmic spacing of the precomputed values, this
 approach means that the space between the precomputed values at the high
 end would be much larger then 2^14, assuming 256 precomputed values.
 That means it could take much longer then 3ms to get the needed LFSR
 value for a large N.  By evenly spacing the precomputed values, we can
 ensure that for all N it will take less then 3ms to get the value.
 Personally, I am more comfortable with a hard limit on the compute time
 then a variable time that could get much bigger then the 1ms threshold
 that Arnd wants for resched.  Any thoughts?

When using precomputed values on a logarithmic scale, I'd recommend
just rounding to the closest value and accepting the relative inaccuracy,
instead of using the precomputed value as the base and then calculating
from there.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

2007-02-15 Thread Arnd Bergmann

On Thursday 15 February 2007 22:50, Paul E. McKenney wrote:
 Is this 1.5ms with interrupts disabled?  This time period is problematic
 from a realtime perspective if so -- need to be able to preempt.

No, interrupts should be enabled here. Still, 1.5ms is probably a little
too long without a cond_resched() in case kernel preemption is disabled.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

2007-02-16 Thread Arnd Bergmann

On Friday 16 February 2007 01:32, Maynard Johnson wrote:
 config OPROFILE_CELL
         bool OProfile for Cell Broadband Engine
         depends on OPROFILE  SPU_FS
         default y if ((SPU_FS = y  OPROFILE = y) || (SPU_FS = m  
 OPROFILE = m))
         help
           Profiling of Cell BE SPUs requires special support enabled
           by this option.  Both SPU_FS and OPROFILE options must be
           set 'y' or both be set 'm'.
 =
 
 Can anyone see a problem with any of this . . . or perhaps a suggestion 
 of a better way?

The text suggests it doesn't allow SPU_FS=y with OPROFILE=m, which I think
should be allowed. I also don't see any place in the code where you actually
use CONFIG_OPROFILE_CELL.

Ideally, you should be able to have an oprofile_spu module that can be
loaded after spufs.ko and oprofile.ko. In that case you only need

config OPROFILE_SPU
depends on OPROFILE  SPU_FS
default y

and it will automatically build oprofile_spu as a module if one of the two
is a module and won't build it if one of them is disabled.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] killing the NR_IRQS arrays.

2007-02-16 Thread Arnd Bergmann

On Friday 16 February 2007 13:10, Eric W. Biederman wrote:
 To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/
 throughout the entire kernel.  Getting the arch specific code and the
 generic kernel infrastructure fixed and ready for that change looks
 like a pain but pretty doable.

We did something like this a few years back on the s390 architecture, which
happens to be lucky enough not to share any interrupt based drivers with
any of the other architectures.

It helped a lot on s390, and I think the change will be beneficial on others
as well, e.g. powerpc already uses 'virtual' interrupt numbers to collapse
the large (sparse) range of interrupt numbers into 512 unique numbers. This
could easily be avoided if there was simply an array of irq_desc structures
per interrupt controller.

However, I also think we should maintain the old interface, and introduce
a new one to deal only with those cases that benefit from it (MSI, Xen,
powerpc VIO, ...). This means one subsystem can be converted at a time.

I don't think there is a point converting the legacy ISA interrupts to
a different interface, as the concept of IRQ numbers is part of the 
subsystem itself (if you want to call ISA a subsystem...).

For PCI, it makes a lot more sense to use something else, considering
that PCI interrupts are defined as 'pins' instead of 'lines', and
while an interrupt pin is defined per slot, while the line is per
bus, in a system with multiple PCI buses, the line is still not
necessarily unique.

One interface I could imagine for PCI devices would be

/* generic functions */
int request_irq_desc(struct irq_desc *desc, irq_handler_t handler,
unsigned long irqflags, const char *devname, void *dev_id);
int free_irq_desc(struct irq_desc *desc, void *dev_id);

/* legacy functions */
int request_irq(int irq, irq_handler_t handler,
unsigned long irqflags, const char *devname, void *dev_id)
{
return request_irq_desc(lookup_irq_desc(irq), handler, irqflags,
devname, dev_id);
}
int free_irq(int irq, void *dev_id)
{
return free_irq_desc(lookup_irq_desc(irq), dev_id);
}

/* pci specific */
struct irq_desc *pci_request_irq(struct pci_device *dev, int pin,
 irq_handler_t handler)
{
struct irq_desc *desc = pci_lookup_irq(dev, pin);
int ret;

if (!desc)
return NULL;

ret = request_irq_desc(desc, handler, IRQF_SHARED,
dev-dev.bus_id, dev);
if (ret  0)
return NULL;
return desc;
}
int pci_free_irq(struct pci_device *dev, int pin)
{
return free_irq_desc(pci_lookup_irq(dev, pin), dev);
}

Now I don't know enough about MSI yet, but I could imagine
that something along these lines would work as well, and we
could simply require all drivers that want to support MSI
to use the new interfaces.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] killing the NR_IRQS arrays.

2007-02-16 Thread Arnd Bergmann

On Friday 16 February 2007 20:52, Russell King wrote:
 On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote:
  We did something like this a few years back on the s390 architecture, which
  happens to be lucky enough not to share any interrupt based drivers with
  any of the other architectures.
 
 What you're proposing is looking similar to a proposal I put forward some
 4 years ago, but was rejected.  Maybe times have changed and there's a
 need for it now.

Yes, I think times have changed, with the increased popularity of MSI
and paravirtualized devices. A few points on your old proposal though:

- Doing it per architecture no longer sounds feasible, I think it would
  need to be done per subsystem so that the drivers can be adapted to
  a new interface, and most drivers are used across multiple architectures.
- struct irq sounds much more fitting than struct irq_desc
- creating new irq_foo() functions to replace foo_irq() also sounds right.
- I don't see the point in splitting request_irq into irq_request and
  irq_register.
- doing subsystem specific abstractions ideally allows the drivers to
  not even need to worry about the irq pointer, significantly simplifying
  the interface for register/unregister.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] killing the NR_IRQS arrays.

2007-02-16 Thread Arnd Bergmann

On Friday 16 February 2007 23:37, Benjamin Herrenschmidt wrote:
 You might want to have a look at the powerpc API with it's remaping
 capabilities. It's very nice for handling multiple domain spaces. It
 might be of some use for you.

I don't consider the powerpc virtual IRQs a solution for the problem.
While I believe you did the right thing for powerpc with generalizing
this over all its platforms, it really isn't more than a workaround
for the problem that we can't deal well with the static irq_desc
array.

When that problem is now getting worse on other architectures, we
should try to get it right on all of them, rather than spreading
the workaround further.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 12/44 take 2] [UBI] allocation unit implementation

2007-02-17 Thread Arnd Bergmann

On Saturday 17 February 2007 17:55, Artem Bityutskiy wrote:
 diff -auNrp tmp-from/drivers/mtd/ubi/alloc.c tmp-to/drivers/mtd/ubi/alloc.c

 +#include ubi.h
 +#include alloc.h
 +#include io.h
 +#include background.h
 +#include wl.h
 +#include debug.h
 +#include eba.h
 +#include scan.h

I don't see much point in having one local header for each of these,
you could simply put all of the declarations into one header in the
ubi directory.

 +
 +#define BGT_WORK_SLAB_NAMEubi_bgt_work_slab
 +#define WL_ERASE_WORK_SLAB_NAME   ubi_wl_erase_work_slab
 +#define WL_ENTRY_SLAB_NAMEubi_wl_entry_slab
 +#define WL_PROT_ENTRY_SLAB_NAME   ubi_wl_prow_entry_slab
 +#define EBA_LTREE_ENTRY_SLAB_NAME ubi_eba_ltree_entry_slab
 +#define SCAN_EB_SLAB_NAME ubi_scan_leb
 +#define SCAN_VOLUME_SLAB_NAME ubi_scan_volume

These macros seem rather pointless, each of them is only used
once, and the macro name directly corresponds to the contents.

 +static struct kmem_cache *bgt_work_slab;
 +static struct kmem_cache *wl_erase_work_slab;
 +static struct kmem_cache *wl_entries_slab;
 +static struct kmem_cache *wl_prot_entry_slab;
 +static struct kmem_cache *eba_ltree_entry_slab;
 +static struct kmem_cache *scan_eb_slab;
 +static struct kmem_cache *scan_volume_slab;

Do you really need all these slab caches? If a cache only contains
a small number of objects, e.g. one per volume, then you're much
better off using a regular kmalloc.

 +void *ubi_kzalloc(size_t size)
 +{
 + void *ret;
 +
 + ret = kzalloc(size, GFP_KERNEL);
 + if (unlikely(!ret)) {
 + ubi_err(cannot allocate %zd bytes, size);
 + dump_stack();
 + return NULL;
 + }
 +
 + return ret;
 +}
 +
 +void *ubi_kmalloc(size_t size)
 +{
 + void *ret;
 +
 + ret = kmalloc(size, GFP_KERNEL);
 + if (unlikely(!ret)) {
 + ubi_err(cannot allocate %zd bytes, size);
 + dump_stack();
 + return NULL;
 + }
 +
 + return ret;
 +}
 +
 +void ubi_kfree(const void *obj)
 +{
 + if (unlikely(!obj))
 + return;
 + kfree(obj);
 +}

These look somewhat too complex. Don't introduce your own generic
infrastructure if you can help it. IIRC, when kmalloc fails, you
already get the full stack trace from the buddy allocator, so
this is just duplication. Better use the regular kzalloc/kfree
calls directly.

 +struct ubi_ec_hdr *ubi_zalloc_ec_hdr(const struct ubi_info *ubi)
 +{
 + struct ubi_ec_hdr *ec_hdr;
 + const struct ubi_io_info *io = ubi-io;
 +
 + ec_hdr = kzalloc(io-ec_hdr_alsize, GFP_KERNEL);
 + if (unlikely(!ec_hdr)) {
 + ubi_err(cannot allocate %d bytes, io-ec_hdr_alsize);
 + dump_stack();
 + return NULL;
 + }
 +
 + return ec_hdr;
 +}
 +
 +void ubi_free_ec_hdr(const struct ubi_info *ubi, struct ubi_ec_hdr *ec_hdr)
 +{
 + if (unlikely(!ec_hdr))
 + return;
 + kfree(ec_hdr);
 +}

same for this and the others. Unless the allocation is done in many
places in the code from a single slab cache, just call kmem_cache_alloc
or kmalloc directly.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/44 take 2] [UBI] debug unit implementation

2007-02-17 Thread Arnd Bergmann

On Saturday 17 February 2007 17:55, Artem Bityutskiy wrote:
 
 diff -auNrp tmp-from/drivers/mtd/ubi/debug.c tmp-to/drivers/mtd/ubi/debug.c
 --- tmp-from/drivers/mtd/ubi/debug.c1970-01-01 02:00:00.0 +0200
 +++ tmp-to/drivers/mtd/ubi/debug.c  2007-02-17 18:07:26.0 +0200

This whole file looks like it can be removed, as nothing in here
is really relevant for regular operation. I'm sure that much of it
was a good help in developing the code and finding the bugs in here,
but why would you want to merge it into the mainline kernel?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/44 take 2] [UBI] internal common header

2007-02-17 Thread Arnd Bergmann

On Saturday 17 February 2007 17:54, Artem Bityutskiy wrote:

 +/* Maximum number of supported UBI devices */
 +#define UBI_MAX_INSTANCES 32

Does this need to be limited?

 +/* UBI messages printk level */
 +#define UBI_MSG_LEVEL  KERN_INFO
 +#define UBI_WARN_LEVEL KERN_WARNING
 +#define UBI_ERR_LEVEL  KERN_ERR
 +
 +/* Prefixes of UBI messages */
 +#define UBI_MSG_PREF  UBI:
 +#define UBI_WARN_PREF UBI warning:
 +#define UBI_ERR_PREF  UBI error:
 +
 +/* Normal UBI messages */
 +#define ubi_msg(fmt, ...)   \
 + printk(UBI_MSG_LEVEL UBI_MSG_PREF   fmt \n, ##__VA_ARGS__)
 +/* UBI warning messages */
 +#define ubi_warn(fmt, ...)  \
 + printk(UBI_WARN_LEVEL UBI_WARN_PREF  %s:  fmt \n, __FUNCTION__, \
 +##__VA_ARGS__)
 +/* UBI error messages */
 +#define ubi_err(fmt, ...)   \
 + printk(UBI_ERR_LEVEL UBI_ERR_PREF  %s  fmt \n, __FUNCTION__,\
 +##__VA_ARGS__)

You shouldn't need these helpers, just use the regular dev_dbg, dev_info
and related macros.

 +/**
 + * struct ubi_info - UBI device description structure
 + *
 + * @ubi_num: number of the UBI device
 + * @io: input/output unit information
 + * @bgt: background thread unit information
 + * @wl: wear-leveling unit information
 + * @beb: bad eraseblock handling unit information
 + * @vmt: volume management unit information
 + * @ivol: internal volume management unit information
 + * @vtbl: volume table unit information
 + * @acc: accounting unit information
 + * @upd: update unit information
 + * @eba: EBA unit information
 + * @uif: user interface unit information
 + */
 +struct ubi_info {
 + int ubi_num;
 + struct ubi_io_info   *io;
 + struct ubi_bgt_info  *bgt;
 + struct ubi_wl_info   *wl;
 + struct ubi_beb_info  *beb;
 + struct ubi_vmt_info  *vmt;
 + struct ubi_ivol_info *ivol;
 + struct ubi_vtbl_info *vtbl;
 + struct ubi_acc_info  *acc;
 + struct ubi_upd_info  *upd;
 + struct ubi_eba_info  *eba;
 + struct ubi_uif_info  *uif;
 +};

I don't know what went wrong here, but this does not at all
look ok. The members in here probably should all be part
of the ubi_info structure itself.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 41/44 take 2] [UBI] gluebi unit header

2007-02-17 Thread Arnd Bergmann

On Saturday 17 February 2007 17:57, Artem Bityutskiy wrote:
 + * This unit is responsible for emulating MTD devices on top of UBI devices.
 + * This sounds strange, but it is in fact quite useful to make legacy 
 software
 + * work on top of UBI. New software should use native UBI API instead.
 + *
 + * Gluebi emulated MTD devices of MTD_UBIVOLUME type. Their minimal I/O 
 unit
 + * size (mtd-writesize) is equivalent to the underlying flash minimal I/O
 + * unit. The eraseblock size is equivalent to the logical UBI volume 
 eraseblock
 + * size.

This approach doesn't seem to make sense at all. If the MTD device interface
is flawed, the right approach should be to fix that instead. After all,
there are not many users of the MTD interface, so you should be able to
adapt them.

In fact, I would expect that there is much more reason to merge the existing
MTD interface with the block interface in the kernel, but you now introduce
a third interface that is unrelated to the first two, and make another
conversion to convert it back?

Let's assume I want to use the wear levelling capabilities of UBI on top
of an SD card, and use the ext3 file system on top of it. I get a stack of

1. MMC
2. block2mtd
3. UBI
4. gluebi
5. mtdblock
6. VFS

when in an ideal world, it should just be

1. MMC
2. UBI
3. VFS

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/44 take 2] [UBI] debug unit header

2007-02-17 Thread Arnd Bergmann

On Saturday 17 February 2007 17:55, Artem Bityutskiy wrote:
 +
 +/**
 + * UBI debugging unit.
 + *
 + * UBI provides rich debugging capabilities which are implemented in
 + * this unit.

Stop right here. You should be doing one thing and do it right.
Since the point of your patches is to do volume management for MTD,
it should do just that.

If you feel that Linux needs rich debugging capabilities, then submit
a patch for that independent of UBI.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/44 take 2] [UBI] user-space API header

2007-02-17 Thread Arnd Bergmann

On Saturday 17 February 2007 17:54, Artem Bityutskiy wrote:
 +struct ubi_mkvol_req {
 +   int32_t vol_id;
 +   int32_t alignment;
 +   int64_t bytes;
 +   int8_t vol_type;
 +   int8_t padding[9];
 +   int16_t name_len;
 +   __user const char *name;
 +} __attribute__ ((packed));

This structure is not suitable for an ioctl call, because it has
incompatible layout between 32 and 64 bit processes. The easiest
fix for this would be to change the 'name' field to an array
instead of a pointer.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 41/44 take 2] [UBI] gluebi unit header

2007-02-17 Thread Arnd Bergmann

On Sunday 18 February 2007 03:04, Josh Boyer wrote:
 No, the MTD interface isn't flawed.  gluebi is present to make things like
 JFFS2 work on top of UBI volumes with very little adaptations.  If you go
 changing _every_ MTD user to now use either an MTD device or a native UBI
 device, then the code for those users just gets bloated.

Right, that was my point. If the MTD API in the kernel is not flawed, why
do we need the 'native' UBI interface? Just merge gluebi into UBI and
get rid of the extra abstraction.

 Assuming your SD card isn't doing wear-leveling itself within the device,
 yes that is what you would get.  

While probably all modern SD cards have some amount of wear leveling
built in, I wouldn't want to rely on that for anything but the simple
large-file-on-fatfs (jpeg or mp3) case. Using UBI on top of the
native wear-leveling sounds like the right solution.

 Or you could do something slightly more sane 
 and use:
 
 1. MMC
 2. block2mtd
 3. JFFS2

Not on a 4GB SD medium, with the current jffs2 version. The problem
is that jffs2 doesn't scale that well, so you want a different fs.
Since logfs isn't stable yet, you end up with something like ext3,
which in turn means that you need a UBI-like concept to avoid
wearing out the blocks that store your metadata.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 41/44 take 2] [UBI] gluebi unit header

2007-02-18 Thread Arnd Bergmann

On Sunday 18 February 2007 04:02:17 Josh Boyer wrote:
 On Sun, Feb 18, 2007 at 03:15:23AM +0100, Arnd Bergmann wrote:
  On Sunday 18 February 2007 03:04, Josh Boyer wrote:
   No, the MTD interface isn't flawed.  gluebi is present to make things
   like JFFS2 work on top of UBI volumes with very little adaptations.  If
   you go changing _every_ MTD user to now use either an MTD device or a
   native UBI device, then the code for those users just gets bloated.
 
  Right, that was my point. If the MTD API in the kernel is not flawed, why
  do we need the 'native' UBI interface? Just merge gluebi into UBI and
  get rid of the extra abstraction.

 That suggestion came up several times.  gluebi represents a compromise
 between the two groups.  IIRC, the issue was that representing UBI volumes
 as MTD devices only makes sense in the dynamic volume case.  Static UBI
 volumes require special write/update handling and so there was a need for
 a native interface anyway.

Which brings be back to my original point ;-)

I'm sure this has been discussed before, but I'd still like to understand
what is so special with 'static UBI volumes' that they can't be used with
a slightly extended MTD interface.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/13] signal/timer/event fds v6 - anonymous inode source ...

2007-03-17 Thread Arnd Bergmann

On Friday 16 March 2007 01:22:15 Davide Libenzi wrote:

 +
 +static int ainofs_delete_dentry(struct dentry *dentry);
 +static struct inode *aino_getinode(void);
 +static struct inode *aino_mkinode(void);
 +static int ainofs_get_sb(struct file_system_type *fs_type, int flags,
 +  const char *dev_name, void *data, struct vfsmount 
 *mnt);
 +

In general, it would be good if you could just reorder your functions
so that you don't need any forward declarations like these. It makes
reviewing from bottom to top a little easier and it becomes obvious
that there are no recursions in the code.

 +static struct vfsmount *aino_mnt __read_mostly;
 +static struct inode *aino_inode;
 +static struct file_operations aino_fops = { };

Iirc, file_operations should be const.

 +int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile,
 +char const *name, const struct file_operations *fops, void *priv)
 +{

Since this is meant to be a generic interface that can be used
from other subsystems, a kerneldoc style comment would be nice

 +static int __init aino_init(void)
 +{
 +
 + if (register_filesystem(aino_fs_type))
 + goto epanic;
 +
 + aino_mnt = kern_mount(aino_fs_type);
 + if (IS_ERR(aino_mnt))
 + goto epanic;
 +
 + aino_inode = aino_mkinode();
 + if (IS_ERR(aino_inode))
 + goto epanic;
 +
 + return 0;
 +
 +epanic:
 + panic(aino_init() failed\n);
 +}

panic() is a little harsh from a loadable module. If you mean
the aino support to be used as a module, this should probably
just return an error.

 +static void __exit aino_exit(void)
 +{
 + iput(aino_inode);
 + unregister_filesystem(aino_fs_type);
 + mntput(aino_mnt);
 +}

but since the Makefile always has it as built-in, maybe you should
instead just kill the exit function and use fs_initcall instead
of init_module().

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/13] signal/timer/event fds v6 - signalfd core ...

2007-03-17 Thread Arnd Bergmann

On Friday 16 March 2007 01:22:15 Davide Libenzi wrote:

 +
 +static struct sighand_struct *signalfd_get_sighand(struct signalfd_ctx
 *ctx, +  unsigned long 
 *flags);
 +static void signalfd_put_sighand(struct signalfd_ctx *ctx,
 +  struct sighand_struct *sighand,
 +  unsigned long *flags);
 +static void signalfd_cleanup(struct signalfd_ctx *ctx);
 +static int signalfd_close(struct inode *inode, struct file *file);
 +static unsigned int signalfd_poll(struct file *file, poll_table *wait);
 +static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
 +  siginfo_t const *kinfo);
 +static ssize_t signalfd_read(struct file *file, char __user *buf, size_t
 count, +   loff_t *ppos);
 +

see my comment about forward declarations in the previous mail

 +asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t
 sizemask) +{
 + int error;
 + unsigned long flags;
 + sigset_t sigmask;
 + struct signalfd_ctx *ctx;
 + struct sighand_struct *sighand;
 + struct file *file;
 + struct inode *inode;
 +
 + error = -EINVAL;
 + if (sizemask != sizeof(sigset_t) ||
 + copy_from_user(sigmask, user_mask, sizeof(sigmask)))
 + goto err_exit;

sizeof(sigset_t) may be different for native and 32-bit compat code.
It would be good if you could handle sizemask==4  sizeof(sigset_t)==8
in this code, so that there is no need for an extra compat_sys_signalfd
function.

 + if ((sighand = signalfd_get_sighand(ctx, flags)) != NULL) {
 + if (next_signal(ctx-tsk-pending, ctx-sigmask)  0 ||
 + next_signal(ctx-tsk-signal-shared_pending,
 + ctx-sigmask)  0)
 + events |= POLLIN;
 + signalfd_put_sighand(ctx, sighand, flags);
 + } else
 + events |= POLLIN;
 +
 + return events;
 +}

I never really understood the events mask, but other subsystems often
use (POLLIN | POLLRDNORM) instead of just POLLIN. Is there a reason
for not returning POLLRDNORM here?

 +static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
 +  siginfo_t const *kinfo)
 +{
 + long err;
 +
 + err = __clear_user(uinfo, sizeof(*uinfo));
 +
 + /*
 +  * If you change siginfo_t structure, please be sure
 +  * this code is fixed accordingly.
 +  */
 + err |= __put_user(kinfo-si_signo, uinfo-signo);
 + err |= __put_user(kinfo-si_errno, uinfo-err);
 + err |= __put_user((short)kinfo-si_code, uinfo-code);
 + switch (kinfo-si_code  __SI_MASK) {
 + case __SI_KILL:
 + err |= __put_user(kinfo-si_pid, uinfo-pid);
 + err |= __put_user(kinfo-si_uid, uinfo-uid);
 + break;
 + case __SI_TIMER:
 +  err |= __put_user(kinfo-si_tid, uinfo-tid);
 +  err |= __put_user(kinfo-si_overrun, uinfo-overrun);
 +  err |= __put_user(kinfo-si_ptr, uinfo-svptr);
 + break;
 + case __SI_POLL:
 + err |= __put_user(kinfo-si_band, uinfo-band);
 + err |= __put_user(kinfo-si_fd, uinfo-fd);
 + break;
 + case __SI_FAULT:
 + err |= __put_user(kinfo-si_addr, uinfo-addr);
 +#ifdef __ARCH_SI_TRAPNO
 + err |= __put_user(kinfo-si_trapno, uinfo-trapno);
 +#endif
 + break;
 + case __SI_CHLD:
 + err |= __put_user(kinfo-si_pid, uinfo-pid);
 + err |= __put_user(kinfo-si_uid, uinfo-uid);
 + err |= __put_user(kinfo-si_status, uinfo-status);
 + err |= __put_user(kinfo-si_utime, uinfo-utime);
 + err |= __put_user(kinfo-si_stime, uinfo-stime);
 + break;
 + case __SI_RT: /* This is not generated by the kernel as of now. */
 + case __SI_MESGQ: /* But this is */
 + err |= __put_user(kinfo-si_pid, uinfo-pid);
 + err |= __put_user(kinfo-si_uid, uinfo-uid);
 + err |= __put_user(kinfo-si_ptr, uinfo-svptr);
 + break;
 + default: /* this is just in case for now ... */
 + err |= __put_user(kinfo-si_pid, uinfo-pid);
 + err |= __put_user(kinfo-si_uid, uinfo-uid);
 + break;
 + }
 +
 + return err ? -EFAULT: sizeof(*uinfo);
 +}

Doing it this way looks rather inefficient to me. I think it's
better to just prepare the signalfd_siginfo on the stack and
do a single copy_to_user.

Also, what's the reasoning behind defining a new structure
instead of just returning siginfo_t? Sure siginfo_t is ugly
but it is a well-defined structure and users already deal
with the problems it causes.

 +static void __exit signalfd_exit(void)
 +{
 + kmem_cache_destroy(signalfd_ctx_cachep);
 +}
 +
 +module_init(signalfd_init);
 +module_exit(signalfd_exit);
 +
 +MODULE_LICENSE(GPL);

Since this file defines a syscall, it can't

Re: [patch 6/13] signal/timer/event fds v6 - timerfd core ...

2007-03-17 Thread Arnd Bergmann

On Friday 16 March 2007 01:22:15 Davide Libenzi wrote:
 This patch introduces a new system call for timers events delivered
 though file descriptors. This allows timer event to be used with
 standard POSIX poll(2), select(2) and read(2). As a consequence of
 supporting the Linux f_op-poll subsystem, they can be used with
 epoll(2) too.

Half of my comments about signalfd also apply to the code in here.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/13] signal/timer/event fds v6 - signalfd core ...

2007-03-17 Thread Arnd Bergmann

On Saturday 17 March 2007 22:35:08 Arnd Bergmann wrote:
 Also, what's the reasoning behind defining a new structure
 instead of just returning siginfo_t? Sure siginfo_t is ugly
 but it is a well-defined structure and users already deal
 with the problems it causes.

Ok, found the answer myself, fops-read() must not do the
conversion to compat_siginfo_t on a 64 bit kernel, that would
just be too ugly for words.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/13] signal/timer/event fds v6 - signalfd core ...

2007-03-18 Thread Arnd Bergmann

On Sunday 18 March 2007, Davide Libenzi wrote:
 bah, __put_user is basically a move, so I don't think that efficency would 
 be that different (assuming that it'd matter in this case). The only thing 
 many __put_user do, is increase the exception table sizes.

The cost of user access functions varies a lot depending on the
architectures. Those platforms with a 4G/4G split e.g. need to do more
than a simple move, and for s390 it may even come down to an indirect
function call, which incurs significant register pressure.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/4] Blackfin: architecture update patch

2007-03-21 Thread Arnd Bergmann

On Wednesday 21 March 2007, Wu, Bryan wrote:

 @@ -97,6 +97,11 @@ static inline void leds_switch(int flag)
  /*
   * The idle loop on BFIN
   */
 +#ifdef CONFIG_IDLE_L1
 +static inline void default_idle(void)__attribute__((l1_text));
 +void cpu_idle(void)__attribute__((l1_text));
 +#endif
 +

A forward declaration for an inline function seems rather pointless.
Moreover, marking default_idle both l1_text and inline seems
contradicting, right?

 diff -purN linux-2.6-orig/include/asm-blackfin/asm-offsets.h 
 linux-2.6/include/asm-blackfin/asm-offsets.h
 --- linux-2.6-orig/include/asm-blackfin/asm-offsets.h 1970-01-01 
 08:00:00.0 +0800
 +++ linux-2.6/include/asm-blackfin/asm-offsets.h  2007-03-21 
 15:21:10.0 +0800
 @@ -0,0 +1,89 @@
 +#ifndef __ASM_OFFSETS_H__
 +#define __ASM_OFFSETS_H__
 +/*
 + * DO NOT MODIFY.
 + *
 + * This file was generated by Kbuild

This file should be in the exclude list for your diff, it is generally not
shipped with the kernel sources.

 +#ifndef __ASSEMBLY__
 +
 +static inline unsigned char readb(volatile unsigned char *addr)
 +{

The prototype for this should normally contain an __iomem.
This kind of error is normally caught by running 'make C=1'
to use the 'sparse' tool. If you have not run that yet,
you should start to, as it finds a number of common bugs.

 +/*
 + * Map some physical address range into the kernel address space.
 + */
 +static inline void *__ioremap(unsigned long physaddr, unsigned long size,
 + int cacheflag)
 +{
 + return (void *)physaddr;
 +}

Likewise, this should return an __iomem pointer.

The rest of the patch looks good to me.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/4] Blackfin: architecture update patch

2007-03-21 Thread Arnd Bergmann

On Wednesday 21 March 2007, Wu, Bryan wrote:
 I sent 4 mail to LKML, but this one lost. Arnd, can you receive this
 email from LKML.

The mail was around 400kb, while the limit for lkml is 100kb.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/4] Blackfin: architecture update patch

2007-03-21 Thread Arnd Bergmann

On Wednesday 21 March 2007, Wu, Bryan wrote:
 1) Some issues are fixed according to LKML patch review.
 2) Remove not supported BF535 code
 3) Fixed some bugs from blackfin.uclinux.org SVN update
 Here is the updated patch for 2.6.21-rc4-mm1

One rather general but important comment:

You need to get used to providing smaller, one fix per mail, patches.
As long as the full tree is waiting in -mm, this is not as important,
as I assume that Andrew will fold the whole architecture support into
a big patch before submitting to Linus, but as soon as it's in, such
big patches will not be acceptable any more.

If you're not already doing it, look into how 'quilt' or similar tools
help you with this.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: remove-unused-header-file-include-linux-elfnoteh.patch

2007-03-21 Thread Arnd Bergmann

On Wednesday 21 March 2007 22:36:42 Jeremy Fitzhardinge wrote:
 Please don't.  We need it.

 BTW, I didn't see this one go by, and I couldn't see it searching
 around.  Did it get posted to lkml?

I think it was only on the janitor list. It was considered
obviously correct since it does not get installed by
headers_installed and did not seem to be used anywhere in
the kernel.

Could you explain how this file is used in the kernel? Robert
probably wants to update his script to handle this correctly.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BLK_DEV_MD with CONFIG_NET

2007-03-21 Thread Arnd Bergmann

On Wednesday 21 March 2007 13:02:46 Sam Ravnborg wrote:
  Anything which is every exported to modules, which ought to
  be the situation in this case, should be obj-y not lib-y
  right?

 That is also my understanding of lib-y - I should update makefiles.txt
 to reflect this..

Strictly speaking, it could well be obj-m instead of obj-y if it
is _only_ used by modules. OTOH, it makes the Makefile a lot simpler
to not optimize for this case.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] Blackfin arch: add kdebug header file

2007-03-26 Thread Arnd Bergmann

I can see nothing wrong with your patches, but you should make the
patch descriptions a little clearer:

On Monday 26 March 2007, Wu, Bryan wrote:
 Hi folks,

No need for this line, if it's there, Andrew just needs to remove
it from the changelog.

 This patch adds kdebug.h header file to blackfin architecture.

This line is completely redundant, as it states the same information
as the subject. You should give some background information here,
like:

kdebug.h is needed for kprobes.

For trivial patches where the subject already tells the whole story
(e.g. 'remove redundant declaration of foo'), just leave out the
description entirely except for the Signed-off-by.

Arnd 
you can even leave out the description
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Questions about porting perfmon2 to powerpc

2007-04-05 Thread Arnd Bergmann

On Thursday 05 April 2007, Kevin Corry wrote:

 First, the stock 2.6.20 kernel has a prototype in include/linux/smp.h for a 
 function called smp_call_function_single(). However, this routine is only 
 implemented on i386, x86_64, ia64, and mips. Perfmon2 apparently needs to 
 call this to run a function on a specific CPU. Powerpc provides an 
 smp_call_function() routine to run a function on all active CPUs, so I used 
 that as a basis to add an smp_call_function_single() routine. I've included 
 the patch below and was wondering if it looked like a sane approach.

The function itself looks good, but since it's very similar to the existing
smp_call_function(), you should probably try to share some of the code,
e.g. by making a helper function that gets an argument to decide whether
to run on a specific CPU or on all CPUs.

 Next, we ran into a problem related to Perfmon2 initialization and sysfs. The 
 problem turned out to be that the powerpc version of topology_init() is 
 defined as an __initcall() routine, but Perfmon2's initialization is done as 
 a subsys_initcall() routine. Thus, Perfmon2 tries to initialize its sysfs 
 information before some of the powerpc cpu information has been initialized. 
 However, on all other architectures, topology_init() is defined as a 
 subsys_initcall() routine, so this problem was not seen on any other 
 platforms. Changing the powerpc version of topology_init() to a 
 subsys_initcall() seems to have fixed the bug. However, I'm not sure if that 
 is going to cause problems elsewhere in the powerpc code. I've included the 
 patch below (after the smp-call-function-single patch). Does anyone know if 
 this change is safe, or if there was a specific reason that topology_init() 
 was left as an __initcall() on powerpc?

In general, it's better to do initcalls as late as possible, so __initcall()
is preferred over subsys_initcall() if both work. Have you tried doing it
the other way and starting perfmon2 from a regular __initcall()?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Questions about porting perfmon2 to powerpc

2007-04-05 Thread Arnd Bergmann

On Thursday 05 April 2007, Kevin Corry wrote:
 For the moment, I made the change to topology_init() since it was the 
 simplest 
 fix to get things working. I have considered switching the perfmon2 
 initialization to __initcall(), but there are apparently some timing issues 
 with ensuring that the perfmon2 core code is initialized before any of its 
 sub-modules. Since they could all be compiled statically in the kernel, I'm 
 not sure if there's a way to ensure the ordering of calls within a single 
 initcall level. I'll need to ask Stephane if there were any other reasons why 
 subsys_initcall() was used for perfmon2.

If they all come from the same directory, you can simply order them in
the Makefile. If a module in arch/ needs to be initialized after one in
drivers/, that's not possible though, and changing topology_init() should
be the best option.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/01] New FBDev driver for Intel Vermilion Range

2007-04-05 Thread Arnd Bergmann

On Thursday 05 April 2007, Alan Hourihane wrote:
 As for the above, I've noticed that drivers/video/epson1355fb.c also has
 this wording and is under the GPL.

Yes, many files have it, but that doesn't make it right ;-)

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] merge compat_ioctl.h into compat_ioctl.c

2007-04-08 Thread Arnd Bergmann

On Sunday 08 April 2007, Christoph Hellwig wrote:
 
 Now that there is no arch-specific compat ioctl handling left there
 is not point in having a separate copat_ioctl.h, so merge it into
 compat_ioctl.c

Yes, definitely a good idea.

 Signed-off-by: Christoph Hellwig [EMAIL PROTECTED]
 
Acked-by: Arnd Bergmann [EMAIL PROTECTED]

On a similar subject, how about merging include/linux/ioctl32.h and the ioctl
bits of fs/compat.c into fs/compat_ioctl.c as well to make it completely
self-contained?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/44 take 2] [UBI] user-space API header

2007-02-20 Thread Arnd Bergmann

On Tuesday 20 February 2007 14:07, Artem Bityutskiy wrote:
 
  This structure is not suitable for an ioctl call, because it has
  incompatible layout between 32 and 64 bit processes. The easiest
  fix for this would be to change the 'name' field to an array
  instead of a pointer.
 
 Will be fixed thanks. Just out of curiosity, could you please provide an
 example when this may be a problem.

On a 64 bit process with a 32 bit user app calling this ioctl, the kernel
would read the pointer value from the 8 bytes at the end, which means that
it will read four bytes after the end of the structure and interpret
whatever it finds as a pointer, instead of using only the first four
bytes as the lower half.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

2007-02-26 Thread Arnd Bergmann

On Thursday 22 February 2007, Carl Love wrote:
 This patch updates the existing arch/powerpc/oprofile/op_model_cell.c
 to add in the SPU profiling capabilities.  In addition, a 'cell' subdirectory
 was added to arch/powerpc/oprofile to hold Cell-specific SPU profiling
 code.

There was a significant amount of whitespace breakage in this patch,
which I cleaned up. The patch below consists of the other things
I changed as a further cleanup. Note that I changed the format
of the context switch record, which I found too complicated, as
I described on IRC last week.

Arnd 

--
Subject: cleanup spu oprofile code

From: Arnd Bergmann [EMAIL PROTECTED]
This cleans up some of the new oprofile code. It's mostly
cosmetic changes, like way multi-line comments are formatted.
The most significant change is a simplification of the
context-switch record format.

It does mean the oprofile report tool needs to be adapted,
but I'm sure that it pays off in the end.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]
Index: linux-2.6/arch/powerpc/oprofile/cell/spu_task_sync.c
===
--- linux-2.6.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ linux-2.6/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -61,11 +61,12 @@ static void destroy_cached_info(struct k
 static struct cached_info * get_cached_info(struct spu * the_spu, int spu_num)
 {
struct kref * ref;
-   struct cached_info * ret_info = NULL;
+   struct cached_info * ret_info;
if (spu_num = num_spu_nodes) {
printk(KERN_ERR SPU_PROF: 
   %s, line %d: Invalid index %d into spu info cache\n,
   __FUNCTION__, __LINE__, spu_num);
+   ret_info = NULL;
goto out;
}
if (!spu_info[spu_num]  the_spu) {
@@ -89,9 +90,9 @@ static struct cached_info * get_cached_i
 static int
 prepare_cached_spu_info(struct spu * spu, unsigned int objectId)
 {
-   unsigned long flags = 0;
+   unsigned long flags;
struct vma_to_fileoffset_map * new_map;
-   int retval = 0;
+   int retval;
struct cached_info * info;
 
/* We won't bother getting cache_lock here since
@@ -112,6 +113,7 @@ prepare_cached_spu_info(struct spu * spu
printk(KERN_ERR SPU_PROF: 
   %s, line %d: create vma_map failed\n,
   __FUNCTION__, __LINE__);
+   retval = -ENOMEM;
goto err_alloc;
}
new_map = create_vma_map(spu, objectId);
@@ -119,6 +121,7 @@ prepare_cached_spu_info(struct spu * spu
printk(KERN_ERR SPU_PROF: 
   %s, line %d: create vma_map failed\n,
   __FUNCTION__, __LINE__);
+   retval = -ENOMEM;
goto err_alloc;
}
 
@@ -144,7 +147,7 @@ prepare_cached_spu_info(struct spu * spu
goto out;
 
 err_alloc:
-   retval = -1;
+   kfree(info);
 out:
return retval;
 }
@@ -215,11 +218,9 @@ static inline unsigned long fast_get_dco
 static unsigned long
 get_exec_dcookie_and_offset(struct spu * spu, unsigned int * offsetp,
unsigned long * spu_bin_dcookie,
-   unsigned long * shlib_dcookie,
unsigned int spu_ref)
 {
unsigned long app_cookie = 0;
-   unsigned long * image_cookie = NULL;
unsigned int my_offset = 0;
struct file * app = NULL;
struct vm_area_struct * vma;
@@ -252,24 +253,17 @@ get_exec_dcookie_and_offset(struct spu *
 my_offset, spu_ref,
 vma-vm_file-f_dentry-d_name.name);
*offsetp = my_offset;
-   if (my_offset == 0)
-   image_cookie = spu_bin_dcookie;
-   else if (vma-vm_file != app)
-   image_cookie = shlib_dcookie;
break;
}
 
-   if (image_cookie) {
-   *image_cookie = fast_get_dcookie(vma-vm_file-f_dentry,
+   *spu_bin_dcookie = fast_get_dcookie(vma-vm_file-f_dentry,
 vma-vm_file-f_vfsmnt);
-   pr_debug(got dcookie for %s\n,
-vma-vm_file-f_dentry-d_name.name);
-   }
+   pr_debug(got dcookie for %s\n, vma-vm_file-f_dentry-d_name.name);
 
- out:
+out:
return app_cookie;
 
- fail_no_image_cookie:
+fail_no_image_cookie:
printk(KERN_ERR SPU_PROF: 
%s, line %d: Cannot find dcookie for SPU binary\n,
__FUNCTION__, __LINE__);
@@ -285,18 +279,18 @@ get_exec_dcookie_and_offset(struct spu *
 static int process_context_switch(struct spu * spu, unsigned int objectId)
 {
unsigned long flags;
-   int retval = 0;
-   unsigned int offset = 0;
-   unsigned long spu_cookie = 0, app_dcookie = 0, shlib_cookie = 0;
+   int retval;
+   unsigned int offset

Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

2007-02-27 Thread Arnd Bergmann

On Tuesday 27 February 2007, Maynard Johnson wrote:
 I have applied the cleanup patch that Arnd sent, but had to fix up a 
 few things:
    -  Bug fix:  Initialize retval in spu_task_sync.c, line 95, otherwise 
 OProfile this function returns non-zero and OProfile fails.
    -  Remove unused codes in include/linux/oprofile.h
    -  Compile warnings:  Initialize offset and spu_cookie at lines 283 
 and 284 in spu_task_sync.c
 
 With these changes and some userspace changes that were necessary to 
 correspond with Arnd's changes, our testing was successful.
 
 A fixup patch is attached.
 

The patch does not contain any of the metadata I need to apply it
(subject, description, signed-off-by).

 @@ -280,8 +280,8 @@ static int process_context_switch(struct
  {
 unsigned long flags;
 int retval;
 -   unsigned int offset;
 -   unsigned long spu_cookie, app_dcookie;
 +   unsigned int offset = 0;
 +   unsigned long spu_cookie = 0, app_dcookie;
 retval = prepare_cached_spu_info(spu, objectId);
 if (retval)
 goto out;

No, this is wrong. Leaving the variables uninitialized at least warns
you about the bug you have in this function: when there is anything wrong,
you just continue writing the record with zero offset and dcookie values
in it. Instead, you should get handle the error condition somewhere
down the code.

It's harmless most of the time, but you really should not be painting
over your bugs by blindly initializing variables.

 diff -paur linux-orig/include/linux/oprofile.h 
 linux-new/include/linux/oprofile.h
 --- linux-orig/include/linux/oprofile.h 2007-02-27 14:41:29.0 -0600
 +++ linux-new/include/linux/oprofile.h  2007-02-27 14:43:18.0 -0600
 @@ -36,9 +36,6 @@
  #define XEN_ENTER_SWITCH_CODE  10
  #define SPU_PROFILING_CODE 11
  #define SPU_CTX_SWITCH_CODE12
 -#define SPU_OFFSET_CODE13
 -#define SPU_COOKIE_CODE14
 -#define SPU_SHLIB_COOKIE_CODE  15
  
  struct super_block;
  struct dentry;
 
Right, I forgot about this.

Arnd 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] killing the NR_IRQS arrays.

2007-02-27 Thread Arnd Bergmann

On Tuesday 27 February 2007, Eric W. Biederman wrote:
 * Add a variation of the API in interrupt.h that uses
   struct irq *irq instead of unsigned int irq
   
   Probably replacing request_irq with irq_request or something
   trivial like that.
 
   This will need to touch all of different irq implementation back
   ends, but only very lightly.
 
 * Convert the generic irq code to use struct irq * everywhere it
   current uses unsigned int irq.
 
 * Start on the conversions of drivers and subsystems picking on
   the easy ones first :)

Introducing the irq_request() etc. functions that take a struct irq*
instead of an int sounds good, but I'd hope we can avoid using those
in device drivers and do a separate abstraction for each bus_type
that deals with interrupts. I'm not sure if that's possible for
each bus_type, but the ones I have worked with in the past should
allow that:

pci: each device/function has a unique irq, drivers need not know
 about it afaics.
isa/pnp: numbers from 1 to 15 are the right abstraction here, that
 how isa has worked for ages.
s390: got rid of irq numbers already
ofw: an open firmware device can have a number of interrupts, but
 like PCI, the driver only needs to know things like 'first
 irq of this device', not how it's connected
ps3: irqs are requested from the firmware for each device, this
 can happen under the covers.
mmc, usb, phy, ieee1394: these already have a higl-level abstraction
 for interrupt events
platform: dunno, probably these really should use the struct irq
 directly
eisa, mca, pcmcia, zorro, ...: no idea, but possibly similar to PCI.

Note that we can even start converting device drivers first, before
moving away from irq numbers. A typical PCI driver should get
somewhat simpler by the conversion, and when they are all converted,
we can replace pci_dev-irq with a struct irq* under the covers.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] killing the NR_IRQS arrays.

2007-02-28 Thread Arnd Bergmann

On Wednesday 28 February 2007, Eric W. Biederman wrote:
 Arnd Bergmann [EMAIL PROTECTED] writes:
 
 
  Introducing the irq_request() etc. functions that take a struct irq*
  instead of an int sounds good, but I'd hope we can avoid using those
  in device drivers and do a separate abstraction for each bus_type
  that deals with interrupts. I'm not sure if that's possible for
  each bus_type, but the ones I have worked with in the past should
  allow that:
 
  pci: each device/function has a unique irq, drivers need not know
   about it afaics.
 Then there is msi and with msi-x you can have up to 4K irqs.

I have to admit I still don't really understand how this works
at all. Can a driver that uses msi-x have different handlers
for each of those interrupts registered simultaneously?

I would expect that instead there should be only one 'struct irq'
for the device, with the handler getting a 12 bit number argument.

  s390: got rid of irq numbers already
 
 Yes.  I should really look at that more and see if I could bring
 s390 into the generic irq code with my planned changes.

I don't think there is much point in changing the s390 code, but
the way it is solved there may be interesting for other buses
as well. The interrupt handler there is not being registered
explicitly, but is part of the driver (in case of subchannel)
or of the device (in case of ccw_device) data structure.

Similarly, in a pci device, one could imagine that the
struct pci_driver contains a irq_handler_t member that
is registered from the pci_device_probe() function
if present.

  Note that we can even start converting device drivers first, before
  moving away from irq numbers. A typical PCI driver should get
  somewhat simpler by the conversion, and when they are all converted,
  we can replace pci_dev-irq with a struct irq* under the covers.
 
 Reasonable if it is easy and straight forward.
 Something like pci_request_irq(dev,) and the helper looks at
 dev-irq under the covers and calls request_irq or whatever makes
 sense.  Is this what you are thinking.  Examples would help me here.

Ok, I had an example in on of my previous posts, but based on the
discussion since then, it has become significantly simpler, basically
reducing the work to

struct irq *pci_irq_request(struct pci_device *dev,
irq_handler_t handler)
{
if (!dev-irq)
return -ENODEV;

return irq_request(irq, handler, IRQF_SHARED,
  dev-driver-name, dev);
}
int pci_irq_free(struct pci_device *dev)
{
return irq_free(dev-irq, dev);
}

The most significant change of this to the current code
would be that we can pass arguments down to irq_request
automatically, e.g. the irq handler can always get the
pci_device as its dev_id.

 For talking to user space I expect we will have numbers for a long time
 to come yet.

I was wondering about that. Do you only mean /proc/interrupts or
are there other user interfaces we need to worry about?
For /proc/interrupts, what could break if we have interrupt numbers
only local to each controller and potentially duplicate numbers
in the list? It's good to be paranoid about changes to proc files,
but I can definitely see value in having meaningful interrupt
numbers in there instead of making up a more or less random mapping
to a flat number space.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Cbe-oss-dev] [PATCH 14/22] spufs: use SPU master control to prevent wild SPU execution

2007-03-01 Thread Arnd Bergmann

On Thursday 01 March 2007, Michael Ellerman wrote:
 On Mon, 2006-11-20 at 18:45 +0100, Arnd Bergmann wrote:
  plain text document attachment (spufs-master-control.diff)
  When the user changes the runcontrol register, an SPU might be
  running without a process being attached to it and waiting for
  events. In order to prevent this, make sure we always disable
  the priv1 master control when we're not inside of spu_run.
 
 Hi Arnd,
 
 Sorry I didn't comment on this when you sent it, I wasn't paying enough
 attention. This patch confuses me, you say we should make sure we always
 disable the master control when we're not inside spu_run, but I see
 several exit paths where we leave the master run bit enabled - or maybe
 I'm reading it wrong.

I think you're right, there is at least one path that I now saw
getting out of spufs_run_spu incorrectly. In particular, when
spu_reacquire_runnable() fails, we never call the master stop,
which is a bug, but should happen very infrequently in practice.
Do you see another case where we end up with the same problem?

If not, I'll prepare a patch to fix this one case.

Arnd 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update

2007-03-03 Thread Arnd Bergmann

On Thursday 01 March 2007 05:14:40 Wu, Bryan wrote:
 The whole patch is located at URL:
 https://blackfin.uclinux.org/gf/download/frsrelease/39/2583/blackfin-arch.p
atch The incremental patch is located at URL:
 https://blackfin.uclinux.org/gf/download/frsrelease/39/2584/blackfin-arch-m
m2-update.patch

I'm not sure if that was intentional, but the second patch does not apply
on top of the -mm kernel but rather patch the the patch old itself.
This basically makes it impossible to review just that part, so better
provide the diff between the kernel with the old patch and the kernel
with the new patch next time.

OTOH, from what I could see from the contents, the changes themselves
look pretty good, I'd probably add my 'Acked-by' if I could read that
patch more easily ;-)

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update

2007-03-03 Thread Arnd Bergmann

On Thursday 01 March 2007 05:14:40 Wu, Bryan wrote:
 Here is the update version of blackfin-arch.patch in -mm tree.
 simply add support to utrace and it was tested on blackfin STAMP board
 as well as other following patches.

Wow, this has come a long way since I looked at the patches last
year, good work!

I've gone through the complete patch again now, and these are the
issues I've found in it. None of these are show-stoppers and I'd
like to see it all go in during the next merge window. There should
be enough time until then to address these points:

 +EXPORT_SYMBOL(__ioremap);
 +EXPORT_SYMBOL(strcmp);
 +EXPORT_SYMBOL(strncmp);
 +EXPORT_SYMBOL(dump_thread);
 +
 +EXPORT_SYMBOL(ip_fast_csum);
 +
 +EXPORT_SYMBOL(kernel_thread);
 +
 +EXPORT_SYMBOL(__up);
 +EXPORT_SYMBOL(__down);
 +EXPORT_SYMBOL(__down_trylock);
 +EXPORT_SYMBOL(__down_interruptible);
 +
 +EXPORT_SYMBOL(is_in_rom);

In general, please put EXPORT_SYMBOL lines below the definition
of the symbol itself. This list of exports should only be used
for symbols that come from assembly files.

You should probably also think about whether some of them are
better done as EXPORT_SYMBOL_GPL.

 + pending = bfin_read_IPEND()  ~0x8000;
 + other_ints = pending  (pending - 1);
 + if (other_ints == 0)
 + lower_to_irq14();
 + irq_exit();
 + 

The last line here has trailing whitespace. While this gets automatically
removed by akpm's scripts, you're normally better off not adding it
in the first place, because it may cause your follow-on patches not
to apply, aside from being wrong to start with.

 +void machine_halt(void)
 +{
 + for (;;)
 + /* nothing */ ;
 +}
 +
 +void machine_power_off(void)
 +{
 + for (;;)
 + /* nothing */ ;
 +}

It might be nicer to make this

for (;;)
asm volatile (idle);

Otherwise you end up burning CPU cycles after a halt without
any particular need.

 +#if defined(CONFIG_MTD_UCLINUX)
 + /* generic memory mapped MTD driver */
 + memory_mtd_end = memory_end;
 +
 + mtd_phys = _ramstart;
 + mtd_size = PAGE_ALIGN(*((unsigned long *)(mtd_phys + 8)));
 +
 +# if defined(CONFIG_EXT2_FS) || defined(CONFIG_EXT3_FS)
 + if (*((unsigned short *)(mtd_phys + 0x438)) == EXT2_SUPER_MAGIC)
 + mtd_size =
 + PAGE_ALIGN(*((unsigned long *)(mtd_phys + 0x404))  10);
 +# endif
 +
 +# if defined(CONFIG_CRAMFS)
 + if (*((unsigned long *)(mtd_phys)) == CRAMFS_MAGIC)
 + mtd_size = PAGE_ALIGN(*((unsigned long *)(mtd_phys + 0x4)));
 +# endif
 +
 +# if defined(CONFIG_ROMFS_FS)
 + if (((unsigned long *)mtd_phys)[0] == ROMSB_WORD0
 +  ((unsigned long *)mtd_phys)[1] == ROMSB_WORD1)
 + mtd_size =
 + PAGE_ALIGN(be32_to_cpu(((unsigned long *)mtd_phys)[2]));

This detection seems to me like a strange thing to do in setup_arch().
It should be possible to do this much later, at a point where the system
is much less fragile and e.g. printk works. It could even be moved into
some place in the mtd code itself, since other architectures might want
to do the same thing.

 +#if defined(CONFIG_BF561)
 +static struct cpu cpu[2];
 +#else
 +static struct cpu cpu[1];
 +#endif
 +static int __init topology_init(void)
 +{
 +#if defined (CONFIG_BF561)
 + register_cpu(cpu[0], 0);
 + register_cpu(cpu[1], 1);
 + return 0;
 +#else
 + return register_cpu(cpu, 0);
 +#endif
 +}

I think you should try to avoid the special-case stuff here. You can
have CONFIG_NR_CPUS in Kconfig set dependent on CONFIG_BF561 and change
the code here (and similarly in other places) to

static struct cpu cpu[NR_CPUS];
static int __init topology_init(void)
{
int i;
for (i=0; i NR_CPUS; i++) {
register_cpu(cpu[i], i);
return 0;
}

 + for (i = ZERO_P; i = L2_MEM; i++) {
 +
 + if (cplb_data[i].valid) {
 +
 + as_1m = cplb_data[i].start % SIZE_1M;
 +
 + /* We need to make sure all sections are properly 1M 
 aligned
 +  * However between Kernel Memory and the Kernel mtd 
 section, depending 
on the
 +  * rootfs size, there can be overlapping memory areas.
 +  */
 +
 + if (as_1m) {
 +#ifdef CONFIG_MTD_UCLINUX
 + if (i == SDRAM_RAM_MTD) {
 + if ((cplb_data[SDRAM_KERN].end + 1)  
 cplb_data[SDRAM_RAM_MTD].start)
 + cplb_data[SDRAM_RAM_MTD].start 
 = (cplb_data[i].start  (-2*SIZE_1M)) 
+ SIZE_1M;

I count 6 levels of indentation, which severely limits readability,
especially when you have terms this complex in the last level.
Please try to split up functions like this into smaller units.

 +/*
 + * ++roman (07/09/96): implemented signal stacks (specially for tosemu on
 + * Atari :-) Current limitation: Only one sigstack can be active at one 
time.

Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update

2007-03-03 Thread Arnd Bergmann

On Saturday 03 March 2007 23:50:02 bert hubert wrote:
    for (;;)
    asm volatile (idle);

 This looks remarkably like relax_cpu()

Actually not: cpu_relax() is defined as barrier(), it can't
call idle because that might make it sleep for a indefinite
amount of time (until the next interrupt, but only if they
are enabled).

Some nice architectures provide a hardware mechanism to do
cpu_relax, like going to low-power mode for a few microseconds,
but this one doesn't seem to have it.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Heads up on sys_fallocate()

2007-03-03 Thread Arnd Bergmann

On Friday 02 March 2007 00:38:19 Christoph Hellwig wrote:
  Forgive me if I haven't put enough thought into it, but would it be
  useful to create a generic_fallocate() that writes zeroed pages for any
  non-existent pages in the range?  I don't know how glibc currently
  implements posix_fallocate(), but maybe the kernel could do it more
  efficiently, even in generic code.  Maybe we don't care, since the major
  file systems can probably do something better in their own code.

 I'd be more happy to have the write out zeroes loop in glibc.  And
 glibc needs to have it anyway, for older kernels.

A generic_fallocate makes sense to me iff we can do it in the kernel
more significantly more efficiently than in glibc, e.g. by using only
a single page in page cache instead of one for each page to be preallocated.

If  glibc is smart enough to do an optimal implementation, I fully agree
with you.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] configfs: add missing mutex_unlock()

2007-03-04 Thread Arnd Bergmann

On Sunday 04 March 2007 14:38:12 Akinobu Mita wrote:
 @@ -1168,8 +1168,10 @@ int configfs_register_subsystem(struct c
  
 err = -ENOMEM;
 dentry = d_alloc(configfs_sb-s_root, name);
 -   if (!dentry)
 +   if (!dentry) {
 +   mutex_unlock(configfs_sb-s_root-d_inode-i_mutex);
 goto out_release;
 +   }
  
 d_add(dentry, NULL);

This should be changed to jump to a new exit point, before the mutex_unlock
at the end of the function. Having multiple places in the function that
release the same lock easily leads to the kind of bug you are fixing here.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/9] mtd: Allow mtd block device drivers to have a custom ioctl function

2007-03-04 Thread Arnd Bergmann

On Friday 02 March 2007 16:55:02 Richard Purdie wrote:
 Allow mtd block drivers to customise their ioctl functions. Also
 allow the drivers to obtain the gendisk struct since ioctl
 functions can need this.

Are you sure that this is a good idea? I'd rather not open
up this method of letting the individual drivers to bad things.

 This also moves the mtd ioctl functions from locked to unlocked.
 As far as I can see, nothing in the mtd code has locking problems.

This part looks fine to me.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Heads up on sys_fallocate()

2007-03-04 Thread Arnd Bergmann

On Sunday 04 March 2007, Anton Altaparmakov wrote:
  A generic_fallocate makes sense to me iff we can do it in the kernel
  more significantly more efficiently than in glibc, e.g. by using only
  a single page in page cache instead of one for each page to be  
  preallocated.
 
  If  glibc is smart enough to do an optimal implementation, I fully  
  agree
  with you.
 
 glibc cannot ever be smart enough because a file system driver will  
 always know better and be able to do things in a much more optimized  
 way.

Ok, that's not what I meant. It's obvious that the file system itself
can do better than both VFS and glibc. The question is whether VFS can
be better than glibc on file systems that don't offer their own
implementation of the fallocate operation.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/3] RFC: using hrtimers for in-kernel timeouts

2007-03-04 Thread Arnd Bergmann

I've played around with the new timer statistics to see which timers might
benefit of being moved from traditional timers to hrtimers.

Since my understanding is that timer_list timers are not really meant to
expire, this seems to include a lot of what comes in through
schedule_timeout, in particular select() and futex wait.

I have no idea if what I was attempting is even the right approach to
start with, but I want to share the patches in case it is ;-).

Maybe someone is interested in running some low-level benchmarks on this
or point out any bugs in the code.

Arnd 

--

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 2/3] use hrtimer in select and pselect

2007-03-04 Thread Arnd Bergmann

This changes the select and pselect system calls to use the
new schedule_timeout_hr function. Since many applications
use the select function instead of nanosleep, this provides
a higher resolution sleep to them.

BUG: the same needs to be done for the compat syscalls, the
current patch breaks building on 64 bit machines.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]

Index: linux-cg/fs/select.c
===
--- linux-cg.orig/fs/select.c
+++ linux-cg/fs/select.c
@@ -189,7 +189,7 @@ get_max:
 #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
 #define POLLEX_SET (POLLPRI)
 
-int do_select(int n, fd_set_bits *fds, s64 *timeout)
+int do_select(int n, fd_set_bits *fds, ktime_t *timeout)
 {
struct poll_wqueues table;
poll_table *wait;
@@ -205,12 +205,11 @@ int do_select(int n, fd_set_bits *fds, s
 
poll_initwait(table);
wait = table.pt;
-   if (!*timeout)
+   if (timeout  !timeout-tv64)
wait = NULL;
retval = 0;
for (;;) {
unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
-   long __timeout;
 
set_current_state(TASK_INTERRUPTIBLE);
 
@@ -266,27 +265,19 @@ int do_select(int n, fd_set_bits *fds, s
*rexp = res_ex;
}
wait = NULL;
-   if (retval || !*timeout || signal_pending(current))
+   if (retval || (timeout  !timeout-tv64)
+   || signal_pending(current))
break;
if(table.error) {
retval = table.error;
break;
}
 
-   if (*timeout  0) {
+   if (!timeout || timeout-tv64  0)
/* Wait indefinitely */
-   __timeout = MAX_SCHEDULE_TIMEOUT;
-   } else if (unlikely(*timeout = (s64)MAX_SCHEDULE_TIMEOUT - 1)) 
{
-   /* Wait for longer than MAX_SCHEDULE_TIMEOUT. Do it in 
a loop */
-   __timeout = MAX_SCHEDULE_TIMEOUT - 1;
-   *timeout -= __timeout;
-   } else {
-   __timeout = *timeout;
-   *timeout = 0;
-   }
-   __timeout = schedule_timeout(__timeout);
-   if (*timeout = 0)
-   *timeout += __timeout;
+   schedule();
+   else
+   *timeout = schedule_timeout_hr(*timeout);
}
__set_current_state(TASK_RUNNING);
 
@@ -307,7 +298,7 @@ int do_select(int n, fd_set_bits *fds, s
((unsigned long) (MAX_SCHEDULE_TIMEOUT / HZ)-1)
 
 static int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
-  fd_set __user *exp, s64 *timeout)
+  fd_set __user *exp, ktime_t *timeout)
 {
fd_set_bits fds;
void *bits;
@@ -384,7 +375,7 @@ out_nofds:
 asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp,
fd_set __user *exp, struct timeval __user *tvp)
 {
-   s64 timeout = -1;
+   ktime_t timeout, *timeoutp = NULL;
struct timeval tv;
int ret;
 
@@ -395,24 +386,20 @@ asmlinkage long sys_select(int n, fd_set
if (tv.tv_sec  0 || tv.tv_usec  0)
return -EINVAL;
 
+   timeout = timeval_to_ktime(tv);
/* Cast to u64 to make GCC stop complaining */
-   if ((u64)tv.tv_sec = (u64)MAX_INT64_SECONDS)
-   timeout = -1;   /* infinite */
-   else {
-   timeout = ROUND_UP(tv.tv_usec, USEC_PER_SEC/HZ);
-   timeout += tv.tv_sec * HZ;
-   }
+   if ((u64)tv.tv_sec  (u64)MAX_INT64_SECONDS)
+   timeoutp = timeout;
}
 
-   ret = core_sys_select(n, inp, outp, exp, timeout);
+   ret = core_sys_select(n, inp, outp, exp, timeoutp);
 
if (tvp) {
struct timeval rtv;
 
if (current-personality  STICKY_TIMEOUTS)
goto sticky;
-   rtv.tv_usec = jiffies_to_usecs(do_div((*(u64*)timeout), HZ));
-   rtv.tv_sec = timeout;
+   rtv = ktime_to_timeval(timeout);
if (timeval_compare(rtv, tv) = 0)
rtv = tv;
if (copy_to_user(tvp, rtv, sizeof(rtv))) {
@@ -438,7 +425,7 @@ asmlinkage long sys_pselect7(int n, fd_s
fd_set __user *exp, struct timespec __user *tsp,
const sigset_t __user *sigmask, size_t sigsetsize)
 {
-   s64 timeout = MAX_SCHEDULE_TIMEOUT;
+   ktime_t timeout, *timeoutp = NULL;
sigset_t ksigmask, sigsaved;
struct timespec ts;
int ret;
@@ -450,13 +437,11 @@ asmlinkage long sys_pselect7(int n, fd_s

[RFC PATCH 1/3] introduce schedule_timeout_hr

2007-03-04 Thread Arnd Bergmann

The new schedule_timeout_hr function is a variant of schedule_timeout
that uses hrtimers internally. Consequently, its argument and
return value are ktime_t.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]

Index: linux-cg/include/linux/sched.h
===
--- linux-cg.orig/include/linux/sched.h
+++ linux-cg/include/linux/sched.h
@@ -246,6 +246,8 @@ extern int in_sched_functions(unsigned l
 
 #defineMAX_SCHEDULE_TIMEOUTLONG_MAX
 extern signed long FASTCALL(schedule_timeout(signed long timeout));
+extern ktime_t FASTCALL(schedule_timeout_hr(ktime_t timeout));
+
 extern signed long schedule_timeout_interruptible(signed long timeout);
 extern signed long schedule_timeout_uninterruptible(signed long timeout);
 asmlinkage void schedule(void);
Index: linux-cg/kernel/hrtimer.c
===
--- linux-cg.orig/kernel/hrtimer.c
+++ linux-cg/kernel/hrtimer.c
@@ -1206,6 +1206,54 @@ void hrtimer_init_sleeper(struct hrtimer
 #endif
 }
 
+/**
+ * schedule_timeout_hr - sleep until timeout
+ * @timeout: timeout value
+ *
+ * Make the current task sleep until @timeout has elapsed.
+ * The routine will return immediately unless the current task
+ * state has been set (see set_current_state()).
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout is guaranteed to
+ * pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * in jiffies will be returned, or 0 if the timer expired in time
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ * In all cases the return value is guaranteed to be a non-negative
+ * time value.
+ */
+static ktime_t __sched __schedule_timeout_hr(ktime_t time, void *addr)
+{
+   struct hrtimer_sleeper t;
+   ktime_t remain;
+
+   hrtimer_init(t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer_init_sleeper(t, current);
+   __timer_stats_hrtimer_set_start_info(t.timer, addr);
+   hrtimer_start(t.timer, time, HRTIMER_MODE_REL);
+   schedule();
+   hrtimer_cancel(t.timer);
+   remain = hrtimer_get_remaining(t.timer);
+
+   if (ktime_to_ns(remain)  0)
+   return ktime_set(0, 0);
+   else
+   return remain;
+}
+
+fastcall ktime_t __sched schedule_timeout_hr(ktime_t time)
+{
+   return __schedule_timeout_hr(time, __builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(schedule_timeout_hr);
+
 static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode 
mode)
 {
hrtimer_init_sleeper(t, current);

--

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 3/3] change schedule_timeout to use hrtimers

2007-03-04 Thread Arnd Bergmann

According to the new timer statistics, many of the
timers that expire come from schedule_timeout.
Since the regular timer infrastructure is optimized
for timers that don't expire, this might be a useful
optimization.

This also changes the timer stats to show the caller
of schedule_timeout in the statistics rather than
schedule_timeout itself.

BUG: converting between jiffies and ktime is rather
 inefficient here.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]

Index: linux-cg/kernel/hrtimer.c
===
--- linux-cg.orig/kernel/hrtimer.c
+++ linux-cg/kernel/hrtimer.c
@@ -1254,6 +1254,96 @@ fastcall ktime_t __sched schedule_timeou
 }
 EXPORT_SYMBOL_GPL(schedule_timeout_hr);
 
+/**
+ * schedule_timeout - sleep until timeout
+ * @timeout: timeout value in jiffies
+ *
+ * Make the current task sleep until @timeout jiffies have
+ * elapsed. The routine will return immediately unless
+ * the current task state has been set (see set_current_state()).
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
+ * pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * in jiffies will be returned, or 0 if the timer expired in time
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
+ * the CPU away without a bound on the timeout. In this case the return
+ * value will be %MAX_SCHEDULE_TIMEOUT.
+ *
+ * In all cases the return value is guaranteed to be non-negative.
+ */
+fastcall signed long __sched schedule_timeout(signed long timeout)
+{
+   ktime_t time;
+   struct timespec ts;
+
+   switch (timeout)
+   {
+   case MAX_SCHEDULE_TIMEOUT:
+   /*
+* These two special cases are useful to be comfortable
+* in the caller. Nothing more. We could take
+* MAX_SCHEDULE_TIMEOUT from one of the negative value
+* but I' d like to return a valid offset (=0) to allow
+* the caller to do everything it want with the retval.
+*/
+   schedule();
+   goto out;
+   default:
+   /*
+* Another bit of PARANOID. Note that the retval will be
+* 0 since no piece of kernel is supposed to do a check
+* for a negative retval of schedule_timeout() (since it
+* should never happens anyway). You just have the printk()
+* that will tell you if something is gone wrong and where.
+*/
+   if (timeout  0) {
+   printk(KERN_ERR schedule_timeout: wrong timeout 
+   value %lx\n, timeout);
+   dump_stack();
+   current-state = TASK_RUNNING;
+   goto out;
+   }
+   }
+
+   /* FIXME: there ought to be an efficient ktime_to_jiffies
+*and ktime_to_jiffies */
+   jiffies_to_timespec(timeout, ts);
+   time = timespec_to_ktime(ts);
+   time = __schedule_timeout_hr(time, __builtin_return_address(0));
+   ts = ktime_to_timespec(time);
+   timeout = timespec_to_jiffies(ts);
+ out:
+   return timeout  0 ? 0 : timeout;
+}
+EXPORT_SYMBOL(schedule_timeout);
+
+/*
+ * We can use __set_current_state() here because schedule_timeout() calls
+ * schedule() unconditionally.
+ */
+signed long __sched schedule_timeout_interruptible(signed long timeout)
+{
+   __set_current_state(TASK_INTERRUPTIBLE);
+   return schedule_timeout(timeout);
+}
+EXPORT_SYMBOL(schedule_timeout_interruptible);
+
+signed long __sched schedule_timeout_uninterruptible(signed long timeout)
+{
+   __set_current_state(TASK_UNINTERRUPTIBLE);
+   return schedule_timeout(timeout);
+}
+EXPORT_SYMBOL(schedule_timeout_uninterruptible);
+
 static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode 
mode)
 {
hrtimer_init_sleeper(t, current);
Index: linux-cg/kernel/timer.c
===
--- linux-cg.orig/kernel/timer.c
+++ linux-cg/kernel/timer.c
@@ -1369,103 +1369,6 @@ asmlinkage long sys_getegid(void)
 
 #endif
 
-static void process_timeout(unsigned long __data)
-{
-   wake_up_process((struct task_struct *)__data);
-}
-
-/**
- * schedule_timeout - sleep until timeout
- * @timeout: timeout value in jiffies
- *
- * Make the current task sleep until @timeout jiffies have
- * elapsed. The routine will return immediately unless
- * the current task state has been set (see set_current_state()).
- *
- * You can set the task state as follows -
- *
- * %TASK_UNINTERRUPTIBLE - at least @timeout

Re: [Cbe-oss-dev] [PATCH 14/22] spufs: use SPU master control to prevent wild SPU execution

2007-03-04 Thread Arnd Bergmann

On Friday 02 March 2007, Michael Ellerman wrote:
 There's also the error case for spu_run_init() which skips the master
 stop. I guess that's ok because we've only set the master control in the
 backing store, and the only way that will ever get propagated to an
 actual spu is by coming back thorough spufs_run_spu().

Hmm, the correct way would be to switch off the master control in there,
afaics. Fixing it only in spu_run_init would mean that we also handle
the case of spu_reacquire_runnable along with it.

 What originally caught my eye on this was the output from xmon. When we
 drop into xmon with no spu programs running and stop the spus, it
 reports that they _all_ have the master run enabled,

That looks right, there is no problem to have master control enabled,
as long as user space can't access the spu through a context that is
bound to it.

 and some of them 
 have the runcntl enabled (those that have had spu programs run on them
 since boot it seems).

While this sounds wrong. Maybe the runcntl is active on those that have
_not_ run since boot, which would make more sense. We should investigate
this.

 It looks like the save/restore code sets the master bit in several
 places, but never sets/clears the runcntl, which seems bogus to me.
 
 So when we leave spufs_spu_run we do the master stop call:
 
 spu_mfc_sr1_set: spu: c0007ffdfc80 (15) sr1: 0x1b runcntl: 0x1
 Call Trace:
 [C196BAA0] [C000F920] .show_stack+0x68/0x1b0 (unreliable)
 [C196BB40] [D01475C0] .spu_hw_master_stop+0xa8/0x170 [spufs]
 [C196BBE0] [D0148598] .spufs_run_spu+0x5ec/0x770 [spufs]
 [C196BCC0] [D0144BA0] .do_spu_run+0xb4/0x180 [spufs]
 [C196BD80] [C003905C] .sys_spu_run+0xb0/0x108
 [C196BE30] [C0008634] syscall_exit+0x0/0x40
 
 
 But then the save/restore code sets it back on?

Right, the context save code needs to enable master control in order to
run on the spu. However, that should be after all mappings to user space
have been discarded.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Wanted: simple, safe x86 stack overflow detection

2007-03-04 Thread Arnd Bergmann

On Wednesday 28 February 2007, Chuck Ebbert wrote:
 Can we just put a canary in the threadinfo and check it on every
 task switch? What are the drawbacks?

It's not completely reliable, in case of functions that allocate
far too much stack space. You might want to take a look at the
gcc support that Andreas Krebbel implemented for s390 to check
for stack overflows:

http://gcc.gnu.org/ml/gcc-patches/2004-08/msg01308.html

I think there are some additions planned for the next gcc release,
but if you port this to i386, it will get you pretty far.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Heads up on sys_fallocate()

2007-03-04 Thread Arnd Bergmann

On Monday 05 March 2007, Jörn Engel wrote:
 That actually causes an interesting problem for compressing filesystems.
 The space consumed by blocks depends on their contents and how well it
 compresses.  At the moment, the only option I see to support
 posix_fallocate for LogFS is to set an inode flag disabling compression,
 then allocate the blocks.
 
 But if the file already contains large amounts of compressed data, I
 have a problem.  Disabling compression for a range within a file is not
 supported, so I can only return an error.  But which one?

Using the current glibc implementation on a compressed file system ideally
should be a very expensive no-op because you won't actually allocate much
space for a file when writing zeroes to it. You also don't benefit of a
contiguous allocation in logfs, since flash has uniform seek times over
all the medium.

I'd suggest you implement posix_fallocate as an real nop and just return
success without doing anything. You could also return ENOSPC in case
the blocks requested by posix_fallocate don't fit on the medium without
compression, but that is more or less just guesswork (like statfs is).

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Heads up on sys_fallocate()

2007-03-04 Thread Arnd Bergmann

On Monday 05 March 2007, Anton Altaparmakov wrote:
 An alternative would be to allocate blocks and then when the data is  
 written perform the compression and free any blocks you do not need  
 any more because the data has shrunk sufficiently.  Depending on the  
 implementation details this could potentially create horrible  
 fragmentation as you would allocate a large consecutive region and  
 then go and drop random blocks from that region thus making the file  
 fragmented.

Unfortunately, this is not as easy on logfs, because there is no point
in allocating a block when there is no data to write into it. Fragmentation
on flash media is free, but you can never modify a block in place without
erasing it first. This means it will always be written to a new location
on the next write access.

One option that might work (similar to what you describe in your other mail)
is to have a per-inode count of reserved blocks, without allocating specific
blocks for them. The journal then needs to maintain the number of total
reserved blocks for all files and keep that in sync with blocks that were
reserved for specific inodes.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update

2007-03-05 Thread Arnd Bergmann

On Monday 05 March 2007, Wu, Bryan wrote:
 
 So could please give us some information about the merge window
 schedule, we may try to catch this.
 
The merge window opens after 2.6.21 gets released and is open for
two weeks aftre that. The idea is however that you have everything
ready at the start of the merge window.

 Oh, if we should fix this issue, there are lots of work to do because
 tons of drivers rely on this. Maybe after some team internal discussion,
 we will give a solution to this.

You can probably use a short perl script (or similar) to automate the
conversion.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update

2007-03-05 Thread Arnd Bergmann

On Monday 05 March 2007, Aubrey Li wrote:
 On 3/4/07, Arnd Bergmann [EMAIL PROTECTED] wrote:

  In general, please put EXPORT_SYMBOL lines below the definition
  of the symbol itself. This list of exports should only be used
  for symbols that come from assembly files.
 
 What is the right way to export symbol coming from c files?
 
As I said, below the symbol definition, like

int global_var;
EXPORT_SYMBOL(global_var);

int global_function(void)
{
return 3;
}
EXPORT_SYMBOL(global_function);

  This detection seems to me like a strange thing to do in setup_arch().
  It should be possible to do this much later, at a point where the system
  is much less fragile and e.g. printk works. It could even be moved into
  some place in the mtd code itself, since other architectures might want
  to do the same thing.
 
 After download the rootfs image from host to the target ram, we need
 to move the image to the right place, so we need to know the size of
 the image at this time.

Well, it doesn't have to be in the modular part of the kernel, but some
place later than setup_arch() would be a step in the right direction.
If you need it before the file systems, an arch_initcall() might be
the right place.

  I'm curious: In your dual-core bf561, don't you actually need to implement
  something that maintains atomicity across cores rather than just across
  processes?
 
 Yes, bf561 is a dual-core processor, but we are using only one core of
 bf561 now.
 IMHO, BF561 architecture was not designed for SMP or NUMA.

Interesting, so what is the intended use of the other core? Does the
hardware have any way of supporting concurrency between the cores,
other than sending interrupts between them?

  How does this fit in with the generic SPI code? Does it duplicate stuff
  from there, or do you use it?
 
 We use our own. We have dma which can be used for SPI operations.

I just looked again at your code. My question was more directed at whether
you use your own SPI abstraction layer instead of drivers/spi, which
you fortunately don't. The piece I was missing however is the spi_bfin5xx.c
driver, which was not part of this patch, though you seem to rely on it.
Is that already part of the -mm kernel?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 1/5] Blackfin: blackfin architecture patch update

2007-03-05 Thread Arnd Bergmann

On Monday 05 March 2007, Wu, Bryan wrote:
 Maybe NUMA is a solution, but it is not a wonderful solution.

NUMA doesn't help you. Linux only runs on cache-coherent NUMA,
which this isn't.

 In some application product, BF561 core A is running Linux kernel
 +Applications while BF561 core B is just for some complicated
 video/audio codec algorithm.
 
 Any Linux multicore solution in BF561 situation is highly welcome.

You definitely can't use the cache mode in this case, but one idea
that should make atomic instructions work is to always do these
on one of the two cores, and use cross-core interrupts to trigger
an update. It's probably pretty inefficient and you also need to
do something about atomic updates (spinlock_t and atomic_t) when
interrupts are disabled.
 
 Another question: when is the merge point from -mm to linus mainline, is
 it the same as the merge window after 2.6.21 released?

It's the same.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [patch 4/6 -rt] powerpc 2.6.20-rt8: fix a runtime warnings for xmon

2007-03-07 Thread Arnd Bergmann

On Wednesday 07 March 2007, Ingo Molnar wrote:
 i'm not an xmon expert, but maybe it might make more sense to first 
 disable preemption, then interrupts - otherwise you could be preempted 
 right after having disabled these interrupts (and be scheduled to 
 another CPU, etc.). What is the difference between local_irq_save() and 
 the above 'disable interrupts' sequence? If it's not the same and 
 xmon_core() relied on having hardirqs disabled then it might make sense 
 to do a local_irq_save() there, instead of a preempt_disable().

Since relatively recently, powerpc does no longer actually disable
the hardware interrupts with local_irq_disable(), but rather sets
a per-cpu flag that will be checked if an actual interrupt comes
in as part of the critical section.

The mtmsr() sequence in xmon corresponds to hard_irq_disable()
and should probably changed to that, but then you still need
the extra preempt_disable() / preempt_enable().

I think you're right about the sequence having to be
1. preempt_disable()
2. hard_irq_disable()
3.
4. hard_irq_enable()
5. preempt_enable()

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux v2.6.21-rc3

2007-03-07 Thread Arnd Bergmann

On Wednesday 07 March 2007 16:39:00 Linus Torvalds wrote:
 So did you hunt it down to a particular cases where it triggers?

IIRC, it crashed on boot in the powerpc iommu code when slab
debugging is enabled. Not sure if it was on Cell or on benh's
powerbook though.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A set of standard virtual devices?

2007-04-03 Thread Arnd Bergmann

On Tuesday 03 April 2007, H. Peter Anvin wrote:
 However, one probably wants to think about what the heck one actually 
 means with virtualization in the absence of a lot of this stuff.  PCI 
 is probably the closest thing we have to a lowest common denominator for 
 device detection.

I think that's true outside of s390, but a standardized virtual device
interface should be able to work there as well. Interestingly, the
s390 channel I/O also uses two 16 bit numbers to identify a device
(type and model), just like PCI or USB, so in that light, we might
be able to use the same number space for something entirely different
depending on the virtual bus.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A set of standard virtual devices?

2007-04-03 Thread Arnd Bergmann

On Tuesday 03 April 2007, Cornelia Huck wrote:
  
  I think that's true outside of s390, but a standardized virtual device
  interface should be able to work there as well. Interestingly, the
  s390 channel I/O also uses two 16 bit numbers to identify a device
  (type and model), just like PCI or USB, so in that light, we might
  be able to use the same number space for something entirely different
  depending on the virtual bus.
 
 Even if we used those ids for cu_type and dev_type, it would still be
 ugly IMO. It would be much cleaner to just define a very simple, easy
 to implement virtual bus without dragging implementation details for
 other types of devices around.

Right, but an interesting point is the question what to do when running
another operating system as a guest under Linux, e.g. with kvm.

Ideally, you'd want to use the same interface to announce the presence
of the device, which can be done far more easily with PCI than using
a new bus type that you'd need to implement for every OS, instead of
just implementing the virtual PCI driver.

Using a 16 bit number to identify a specific interface sounds like
a good idea to me, if only for the reason that it is a widely used
approach. The alternative would be to use an ascii string, like we
have for open-firmware devices on powerpc or sparc.

I think in either way, we need to abstract the driver for the virtual
device from the underlying bus infrastructure, which is hypervisor
and/or platform dependent. The abstraction could work roughly like this:


==
virt_dev.h
==
struct virt_driver { /* platform independent */
struct device_driver drv;
struct pci_device_id *ids; /* not necessarily PCI */
};
struct virt_bus {
/* platform dependent */
long (*transfer)(struct virt_dev *dev, void *buffer,
unsigned long size, int type);
};
struct virt_dev {
struct device dev;
struct virt_driver *driver;
struct virt_bus *bus;
struct pci_device_id id;
int irq;
};
==
virt_example.c
==
static ssize_t virt_pipe_read(struct file *filp, char __user *buffer,
 size_t len, loff_t *off)
{
struct virt_dev *dev = filp-private_data;
ssize_t ret = dev-bus-transfer(dev, buffer, len, READ);
*off += ret;
return ret;
}
static struct file_operations virt_pipe_fops = {
.open = nonseekable_open,
.read = virt_pipe_read,
};
static int virt_pipe_probe(struct device *dev)
{
struct virt_dev *vdev = to_virt_dev(dev);
struct miscdev *mdev = kmalloc(sizeof(*dev), GFP_KERNEL);
mdev-name = virt_pipe;
mdev-fops = virt_pipe_fops;
mdev-parent = dev;
return register_miscdev(mdev);
}
static struct pci_device_id virt_pipe_id = {
.vendor = PCI_VENDOR_LINUX, .device = 0x3456,
};
MODULE_DEVICE_TABLE(pci, virt_pipe_id);
static struct virt_driver virt_pipe_driver = {
.drv = {
.name = virt_pipe,
.probe = virt_pipe_probe,
},
.ids = virt_pipe_id,
}
static int virt_pipe_init(void)
{
return virt_driver_register(virt_pipe_driver);
}
module_init(virt_pipe_init);
==
virt_devtree.c
==
static long virt_devtree_transfer(struct virt_dev *dev, void *buffer,
unsigned long size, int type)
{
long reg;
switch type {
case READ:
ret = hcall(HV_READ, dev-dev.platform_data, buffer, size);
break;
case WRITE:
ret = hcall(HV_WRITE, dev-dev.platform_data, buffer, size);
break;
default:
BUG();
}
return ret;
}
static struct virt_bus virt_devtree_bus = {
.transfer = virt_devtree_transfer,
};
static int virt_devtree_probe(struct of_device *ofdev,
struct of_device_id *match)
{
struct virt_dev *vdev = kzalloc(sizeof(*vdev);
vdev-bus = virt_devtree_bus;
vdev-dev.parent = ofdev-dev;
vdev.id.vendor = PCI_VENDOR_LINUX;
vdev.id.device = *of_get_property(ofdev, virt_dev_id),
vdev.irq = of_irq_parse_and_map(ofdev, 0);
return device_register(vdev-dev);
}
struct of_device_id virt_devtree_ids = {
.compatible = virt-dev,
};
static struct of_platform_driver virt_devtree_driver = {
.probe = virt_devtree_probe,
.match_table = virt_devtree_ids,
};
==
virt_pci.c
==
static long virt_pci_transfer(struct virt_dev *dev, void *buffer,
unsigned long size, int type)
{
struct virt_pci_regs __iomem *regs = dev-dev.platform_data;
switch type {
case READ:
mmio_insb(regs-read_port, buffer, size);
break;
case WRITE:
mmio_outsb(regs-write_port, buffer, size);
break;
default:
BUG();
}

Re: A set of standard virtual devices?

2007-04-03 Thread Arnd Bergmann

On Tuesday 03 April 2007, Cornelia Huck wrote:
 On Tue, 3 Apr 2007 14:15:37 +0200, Arnd Bergmann [EMAIL PROTECTED] wrote:
 
 That's OK for a virtualized architecture where the base architecture
 already supports PCI. But a traditional s390 OS would be as unhappy
 with a PCI device as with a device of a completely new type :)

Sure, that was my point from the start.

 There are several options for virtualized devices (and I don't know why
 they shouldn't coexist):
 
 1. Emulate a well-known device (like a e1000 network card on PCI or a
 model 3390 dasd on CCW). Existing operating systems can just use them,
 but it's a lot of work in the hypervisor.

Most hypervisors already do this, and it's an unrelated topic. 
What we're trying to achieve is to make sure not every hypervisor
and simulator has to introduce its own set of drivers.


  struct virt_bus {
  /* platform dependent */
  long (*transfer)(struct virt_dev *dev, void *buffer,
  unsigned long size, int type);
  };
 
 Should this embed a struct bus_type? Or reference a generic_virt_bus?

yes, that should embed the bus_type.

  struct virt_dev {
  struct device dev;
  struct virt_driver *driver;
  struct virt_bus *bus;
  struct pci_device_id id;
  int irq;
  };
 
 And that's where I have problems :) The notion of irq is far too
 platform specific. I can bend my mind round using PCI-like ids for
 non-PCI virtualized devices, but an integer is far too small and to
 specific for a way to access the device.

Sorry, I've been working too long on the lesser architectures.
IRQ number are evil indeed.
However, I'm pretty sure that we need _some_ abstraction of an
interrupt mechanism here. The easiest way is probably to have a
callback function like
int (*irq_handler)(struct virt_dev*, unsigned long message);
in the virt_dev.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A set of standard virtual devices?

2007-04-03 Thread Arnd Bergmann

On Tuesday 03 April 2007, Cornelia Huck wrote:
 On s390, it would be more than strangeness. There's no implementation
 of PCI at all, someone would have to cook it up - and it wouldn't have
 any use beyond those special devices. Since there isn't any bus type
 that is available on *all* architectures, a generic virtual bus with
 very simple probing seems much saner...

I think we need to separate two problems here:

1. Probing:
That's really what triggered the discussion, PCI probing is well-understood
and implemented on _most_ platforms, so there is some value in reusing it.
When you talk about 'very simple probing', I'm not sure what the most simple
approach could be. Ideas that have been implemented before include:
a) have a limited set of device IDs (e.g. 65535 devices, or a hierarchic tree),
   and try to access each one of them in order to find out if it's there. We
   do that for PCI or CCW, for instance.
b) Have an iterator in the hypervisor (or firmware), to return a handle to
   the first, next or child of a device. We do that for open firmware.
c) ask the hypervisor for an unused device of a given class, which needs to
   be returned to the hypervisor when no longer used. This is how the PS3
   hypervisor works, but it does not play well with the Linux driver model.

2. Device access:
When talking to a virtual device, you want to have at least a way to give
commands to it and a way to get interrupts back. Again, multiple ideas
have been used in the past, and we should choose a subset:
a) PCI-like: mmio using memory and/or I/O space BAR setup, interrupt
   numbers and DMA to guest physical addresses.
b) Channel-like: use an hcall to give commands to the hypervisor, passing
   down a device handle command code and data areas in guest physical space.
   Interrupts return the device handle or a OS-defined per-device value.
c) Minimalistic: Every device is mapped into the guest address space and
   can potentially be remapped into user space. The device memory can be
   shared between guests and/or with the host if that uses the same driver.
   The guest is able to signal the receiving end using an hcall and gets
   interrupts like in b)
d) UNIX-like: devices appear like file descriptors, the guest can do
   operations like read/write/sync/mmap, potentially ioctl on them to talk
   to the host.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A set of standard virtual devices?

2007-04-03 Thread Arnd Bergmann

On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
 Arnd Bergmann wrote:
  I think we need to separate two problems here:
 
  1. Probing:
  That's really what triggered the discussion, PCI probing is well-understood
  and implemented on _most_ platforms, so there is some value in reusing it.
  When you talk about 'very simple probing', I'm not sure what the most simple
  approach could be. 
 
 Is probing an interesting problem to consider on its own?  If there's
 some hypervisor-agnostic device driver in Linux, then obviously it needs
 some way to find the the corresponding (virtual) hardware for it to talk
 to.  But that probing mechanism will depend on the actual interface
 structure, and is just one of the many problems that need to be solved. 
 There's no point in overloading PCI to probe for the device unless
 you're actually using PCI to talk to the device.

We already have device drivers for physical devices that can be attached
to different buses. The EHCI USB is an example of a driver that can 
be for instance PCI, OF or an on-chip device. Moreover, you can have an
abstracted device behind it that does not need to know about the transport,
like the SCSI disk driver does not care if it is talking to an ATA, 
parallel SCSI or SAS chip, or even which controller that is.

 Let me say up front that I'm skeptical that we can come up with a single
 bus-like abstraction which can be a both simple and efficient interface
 to all the virtual architectures.  I think a more fruitful path is to
 find what pieces of functionality can be made common, with the aim of
 having small, simple and self-contained hypervisor-specific backends.
 
 I think this needs to be considered on a class by class basis.  This
 thread started with a discussion about entropy sources.  In theory you
 could implement it as simply as exposing a mmaped ringbuffer.  There are
 some extra complexities deriving from the security requirements though;
 for example, all the entropy needs to be kept strictly private to the
 domain that consumes it.
 
 But beyond that, there are 3 other important classes of device:
 
 * console
 * disk
 * networking
 
 (There are obviously more, but these are the must-have.)
 
 Console already provides us with a model to work on, in the form of
 hvc-console.  The hvc-console code itself has the bulk of the common
 console code, along with a set of very small hypervisor-specific
 backends. The Xen console implementation shrunk considerably when we
 switched to using it.

console is also the least problematic interface, you can do it over
practically anything.
 
 If we could do the same thing with disk and net, I would be very happy.
 
 For example, if we wanted to change the Xen frontend/backend disk
 interface, we could use SCSI as the basic protocol, and then convert
 netfront into a relatively simple scsi driver.  There would still be a
 Xen-specific piece, but it should be fairly small and have a clean
 interface.  Though the existing interface is pretty simple
 shove-this-block-there affair.

Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
The interesting question about block devices is how to handle concurrency
and interrupt mitigation. An efficient interface should

- have asynchronous notification, not sleep until the transfer is complete
- allow multiple blocks to be in flight simultaneously, so the host can
  reorder the requests if it is smart enough
- give only a single interrupt when multiple transfers have completed

minor optimizations could be
- give an interrupt early when some transfers are complete
- allow I/O barriers to be inserted in the stream
- allow marking blocks as more or less important (readahead vs. read)
- provide passthrough of SG_IO or similar for optical media
  (e.g. DVD writer)

 I'm not sure what similar common code could be extracted for network
 devices.  I haven't looked into it all that closely.

One way to do networking would be to simply provide a shared memory area
that everyone can write to, then use a ring buffer and atomic operations
to synchronize between the guests, and a method to send interrupts to the
others for flow control.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A set of standard virtual devices?

2007-04-03 Thread Arnd Bergmann

On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
  Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
    
 
 OK, interesting.  People had proposed using SCSI as the interface, but I
 wasn't aware of any results from doing that.  How is it not good?
 

SCSI is really overengineered for something as simple as a block interface.
A large part of the SCSI stack deals only with error handling, which
you don't want to burden the guests with at all, since most error conditions
can be handled fine by the host.
Another big aspect of SCSI is device enumeration and probing. Doing it
the SCSI way is particularly pointless. It's much simpler to have one
device with its own I/O interface at the hcall layer, and one interrupt
number for the block device, instead of faking the full hca/bus/dev/lun
hierarchy.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A set of standard virtual devices?

2007-04-03 Thread Arnd Bergmann

On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
 That said, something like USB is probably the best bet for this kind of
 low-performance device.  I think.  Not that I really know anything about
 USB.

USB has the disadvantage that it is more complex than PCI and requires
significantly more code to simulate on the host side.

On the plus side, I think it should be possible to implement a virtual
USB host on s390, which is not possible with PCI, but that again takes
a lot of work to implement.

One interesting aspect of the PS3 hypervisor is that some of the
low-speed interfaces are implemented as a virtual UART, meaning
something that only has read and write operations and uses an
interrupt for flow control. The implementation in 
drivers/ps3/vuart.c is probably more complex than what we want
as a generic transport mechanism, but simply having a bidirectional
data stream sounds like an ideal abstraction for the simple
case. Some more or less obvious users of this include:

- console
- additional tty
- random
- slow network (using ppp)
- printer
- watchdog
- hid (e.g. mouse)
- system management (like ps3)
- fast network (in combination with
  shared memory segment)

The transport can be hypervisor specific, e.g. there could be
a virtual PCI serial port on kvm, an hcall interface on the ps3
and a virtual CTC on s390 (kidding), while all of them can have
the same kind of hardware _behind_ the serial connection.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: missing madvise functionality

2007-04-03 Thread Arnd Bergmann

On Tuesday 03 April 2007, Ulrich Drepper wrote:
 The problem is glibc has to work around kernel limitations.  If the
 malloc implementation detects that a large chunk of previously allocated
 memory is now free and unused it wants to return the memory to the
 system.  What we currently have to do is this:
 
   to free:      mmap(PROT_NONE) over the area
   to reuse:     mprotect(PROT_READ|PROT_WRITE)
 
 Yep, that's expensive, both operations need to get locks preventing
 other threads from doing the same.

I thought this is what the read_zero_pagealigned hack [1] was used
for (read from /dev/zero replaces target pages with empty_zero_page).
Now if read_zero_pagealigned does not solve _this_ scenario, is it
good for anything else then? 
Can we simply kill that function as a misfeature and avoid future
pain arising from it?

Arnd 

[1] http://lkml.org/lkml/1997/1/16/49
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A set of standard virtual devices?

2007-04-03 Thread Arnd Bergmann

On Wednesday 04 April 2007, H. Peter Anvin wrote:
 Note that at least for PIO-based devices, there is nothing that says you 
 can't implement PCI over another transport, if you wish.  It's really 
 just a very simple RPC protocol.

The PIO aspect of PCI is simple, yes, except on architectures that don't
have the concept of PIO or even uncached memory, but even that can
be done by defining readl/writel/inl/outl/... as hcalls.

The tricky part about PCI is the device probing, everything about config
space accesses, interrupt swizzling, bus/device/function numbers and
base address registers becomes a pointless excercise when the other side
is just faking it.

 DMA is trickier, as it makes the data appear into the address space of 
 the guest in a way that is both device- and host-dependent (in the 
 presence of PCI domains, IOMMU etc.)  There may be reason to avoid DMA 
 for that reason.

Right, PCI DMA and virtualization don't mix. DMA in general is fine though,
as long as your devices (real or virtual) see the guest physical addresses
as a contiguous 64 bit range and have well-defined semantics about what
addresses are accessed in what way.
When you think of file read/write syscalls as DMA into user space, it's
a very clean concept. Async I/O somewhat less so, but still pretty good.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A set of standard virtual devices?

2007-04-04 Thread Arnd Bergmann

On Wednesday 04 April 2007, H. Peter Anvin wrote:
 Configuration space access is platform-dependent.  It's only defined to 
 work in a specific way on x86 platforms.
 
 Interrupt swizzling is really totally independent of PCI.  ALL PCI 
 really provides is up to four interrupts per device (not counting 
 MSI/MSI-X) and an 8-bit writable field which the platform can choose to 
 use to hold interrupt information.  That's all.  The rest is all 
 platform information.
 
 PCI enumeration is hardly complex.  Most of the stuff that doesn't apply 
 to you you can generally ignore, as is done by other busses like 
 HyperTransport when they emulate PCI.

You still don't get my point: On a platform that doesn't have interrupt
numbers, and where most of the fields in the config space don't correspond
do anything that is already there, you really don't want to invent
a set of new hcalls that implement emulation, to get something as
simple as a pipe.

wc drivers/pci/*.[ch] include/asm-i386/{pci,io}.h lib/iomap*.c \
arch/i386/pci/*.c kernel/irq/*.c
17015  59037 463967 total

Even if you only need half of that code in reality, reimplementing
all that in both the kernel and in the hypervisor is an enourmous
effort. We've seen that before on the ps3, which initially faked
a virtual PCI bus just for the USB controller, but doing something
like that requires adding abstraction layers, to decide whether to
implement e.g. an inb as a hypercall or as a memory read.

 That being said, on platforms which are PCI-centric, such as x86, this 
 of course makes it a lot easier to produce virtual devices which work 
 across hypervisors, since the device model, of *any* operating system is 
 set up to handle them.

Yes, as I said there are two separate problems. I really think that
a standardized virtual driver interface should be modeled after
kernel - user interfaces, not hardware - kernel interfaces.

Once we know what operations we want (e.g. read, write and SIGIO,
or some other set of primitives), it will be good to provide a
virtual PCI device that can be used as one transport mechanism
below it. Using PCI device IDs to tell what functionality is
provided by the device would provide a reasonable method for
autoprobing.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [JANITOR PROPOSAL] Switch ioctl functions to -unlocked_ioctl

2008-01-08 Thread Arnd Bergmann

On Tuesday 08 January 2008, Andi Kleen wrote:
  Thanks, Andi! I think it'd very useful change.
 
 Reminds me this is something that should be actually flagged
 in checkpatch.pl too
 
 Andy, it would be good if checkpatch.pl complained about .ioctl = 
 as opposed to .unlocked_ioctl = ...

This is rather hard, as there are different data structures that
all contain -ioctl and/or -unlocked_ioctl function pointers.
Some of them already use -ioctl in an unlocked fashion only,
so blindly warning about this would give lots of false positives.
 
 Also perhaps if a whole new file_operations with a ioctl is added
 complain about missing compat_ioctl as a low prioritity warning?
 (might be ok if it's architecture specific on architectures without
 compat layer)

Also, not every data structure that provides a -ioctl callback
also has a -compat_ioctl, although there should be fewer exceptions
here.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New linux arch

2008-01-08 Thread Arnd Bergmann

On Monday 07 January 2008, Michal Simek wrote:
 I would like to ask you what is the best way to push these changes to 
 kernel.org.
 
 I would like to know step by step how to do.
 

Adding the whole architecture tree will probably be too much for
a single reviewer and almost certainly too much for a the size limit
of mails to lkml.
On the other hand, there is not much point in merging the architecture
code in multiple changesets if there is nothing at all you can do
with part of it. I suggest therefore that you split the code twice:

First, split every device driver into its own git changeset. Often,
these have to go through a different set of mailing lists, e.g.
network drivers go to [EMAIL PROTECTED] (see MAINTAINERS for
details), while the actual architecture changeset should not have
device drivers by itself. These are going to be the changesets
that you have in your git tree and merged upstream eventually.

Then, split each of those changesets into reviewer-friendly
chunks of less than 100kb. Don't worry if a patch ends up only having
a few line while others are considerably larger.
For people that like to see a whole changeset, upload it as a combined
patch to an http location that you mention in your patch 00/12 or so,
and have the smaller patches as reply mails to that.
Use either 'quilt mail' or 'git-format-patch' to do that work for you.

I think blackfin is a good example of how an architecture got merged,
and how they resolved the initial problems. Read through the comments
at http://lkml.org/lkml/2006/9/20/404 and related mails to see what
can go wrong in such large projects and how to do it better.

Regarding the code itself, my assumption is that you started out
copying from another architecture (everyone does that) and hacked on
it until you had it working. This is not wrong by itself, but it would
be really nice if we can make it easier for the next person to
add an architecture. My vision is that for each header file you
copied from include/asm-i386 or similar and did not end up rewriting,
you create a version in include/asm-generic and start using that
instead of adding a private copy in your architecture.
One example where this was already done is asm/errno.h, an example
where you should do it is asm/stat.h.
It's similar for files like arch/microblaze/kernel/sys.c and pci.c:
ideally, you shouldn't have these at all, but be able to just
use completely generic code.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [JANITOR PROPOSAL] Switch ioctl functions to -unlocked_ioctl

2008-01-08 Thread Arnd Bergmann

On Wednesday 09 January 2008, Andi Kleen wrote:
 I imagined it would check for 
 
 +struct file_operations ... = { 
 +      ...
 +   .ioctl = ...
 
 That wouldn't catch the case of someone adding only .ioctl to an 
 already existing file_operations which is not visible in the patch context, 
 but that should be hopefully rare. The more common case is adding
 completely new operations

Right, this would work fine. We can probably even have a list of
data structures that work like file_operations in this regard.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 15/15] Add DEFINE_SPUFS_ATTRIBUTE()

2007-09-13 Thread Arnd Bergmann

On Thursday 13 September 2007, Michael Ellerman wrote:
 Well that'd be nice, but I don't see anywhere that that happens. AFAICT
 the acquire we do in the first coredump callback is the first the SPU
 contexts know about their PPE process dying. And spufs is still live, so
 I think we definitely need to grab the mutex, or we might race with
 userspace accessing spufs files.

Right, I was only thinking about the dumping process itself, but there
may be other processes that still have files open for that context.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 7/8] debugfs: allow access to signed values

2007-12-20 Thread Arnd Bergmann

On Thursday 20 December 2007, Stefano Brivio wrote:
 debugfs: allow access to signed values
 
 Add debugfs_create_s{8,16,32,64}. For these to work properly, we need to 
 remove
 a cast in libfs, change the simple_attr_open prototype and thus fix the users 
 as
 well.
 
 Cc: Johannes Berg [EMAIL PROTECTED]
 Cc: Mattias Nissler [EMAIL PROTECTED]
 To: Greg Kroah-Hartman [EMAIL PROTECTED]
 To: Arnd Bergmann [EMAIL PROTECTED]
 To: Akinobu Mita [EMAIL PROTECTED]
 Signed-off-by: Stefano Brivio [EMAIL PROTECTED]

Have you checked that spufs still builds? I would guess that you need
to do the same interface changes there.

Also, Christoph has recently posted a suggestion for how to improve
the interface to allow the 'get' operation to return an error:
http://patchwork.ozlabs.org/cbe-oss-dev/patch?id=14962

I'd suggest consolidating the two changes in order to avoid merge
conflicts.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 05/43] compat_binfmt_elf

2007-12-21 Thread Arnd Bergmann

On Thursday 20 December 2007, Roland McGrath wrote:
 This adds fs/compat_binfmt_elf.c, a wrapper around fs/binfmt_elf.c for
 32-bit ELF support on 64-bit kernels.  It can replace all the hand-rolled
 versions of this that each 32/64 arch has, which are all about the same.

Great stuff! I've attempted to do this a few times over the past years,
but could never get my head around it. One more bit of broken compat
code gone from the architectures!

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 18/43] powerpc compat_binfmt_elf

2007-12-21 Thread Arnd Bergmann

On Friday 21 December 2007, Kyle McMartin wrote:
 Just taking a stab that hch means,
 
 config BINFMT_COMPAT_ELF
 def_bool n
 depends on 64BIT
 

I'd call it COMPAT_BINFMT_ELF, for consistency with the file name.
Also, the definition and the depends are redundant if you expect the
option to be autoselected. You can do either of

config COMPAT_BINFMT_ELF
bool

or 

config COMPAT_BINFMT_ELF
def_bool y
depends on COMPAT

The second option makes sense at the point where all architectures with
compat code are using the same compat_binfmt_elf code.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-29 Thread Arnd Bergmann

On Thursday 22 November 2007, Andi Kleen wrote:
  #define EXPORT_SYMBOL(sym) \
 -   __EXPORT_SYMBOL(sym, )
 +   __EXPORT_SYMBOL(sym, ,,, NULL)
  
  #define EXPORT_SYMBOL_GPL(sym) \
 -   __EXPORT_SYMBOL(sym, _gpl)
 +   __EXPORT_SYMBOL(sym, _gpl,,, NULL)
  
  #define EXPORT_SYMBOL_GPL_FUTURE(sym)  \
 -   __EXPORT_SYMBOL(sym, _gpl_future)
 +   __EXPORT_SYMBOL(sym, _gpl_future,,, NULL)
  
 +/* Export symbol into namespace ns
 + * No _GPL variants because namespaces imply GPL only
 + */
 +#define EXPORT_SYMBOL_NS(ns, sym)  \
 +   __EXPORT_SYMBOL(sym, _gpl,__##ns, NS_SEPARATOR #ns, #ns)
  

I think it would be good if you could specify a default namespace
per module, that could reduce the amount of necessary changes significantly.

For example, you can do

#define EXPORT_SYMBOL_GLOBAL(sym) __EXPORT_SYMBOL(sym, _gpl,,, NULL)
#ifdef MODULE_NAMESPACE
#define EXPORT_SYMBOL_GPL(sym) EXPORT_SYMBOL_GLOBAL(sym)
#else
#define EXPORT_SYMBOL_GPL(sym) EXPORT_SYMBOL_NS(sym, MODULE_NAMESPACE)
#endif

If we go that way, it may be useful to extend the namespace mechanism to
non-GPL symbols as well, like

#define EXPORT_SYMBOL(sym) __EXPORT_SYMBOL(sym, ,__## MODULE_NAMESPACE, 
NS_SEPARATOR #MODULE_NAMESPACE, #MODULE_NAMESPACE)

Unfortunately, doing this automatic namespace selection requires to set
the namespace before #include linux/module.h. One way to work around this
could be to use Makefile magic so you can list a Makefile as

obj-$(CONFIG_COMBINED) += combined.o
combined-$(CONFIG_SUBOPTION) += combined_main.o combined_other.o
obj-$(CONFIG_SINGLE) += single.o
obj-$(CONFIG_OTHER) += other.o
obj-$(CONFIG_API) += api.o

NAMESPACE = subsys   # default, used for other.o
NAMESPACE_single.o = single  # used only for single.o
NAMESPACE_combined.o = combined  # all parts of combined.o
NAMESPACE_combined_other.o = special #except this one
NAMESPACE_api.o =# api.o is put into the global ns

The Makefile logic here would basically just follow the rules we have for
CFLAGS etc, and then pass -DMODULE_NAMESPACE=$(NAMESPACE_$(obj)) to gcc.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-29 Thread Arnd Bergmann

On Thursday 29 November 2007, Andi Kleen wrote:
  I think it would be good if you could specify a default namespace
  per module, that could reduce the amount of necessary changes significantly.
 
 But also give less documentation. It's also not that difficult to mark
 the exports once. I've forward ported such patches over a few kernels
 and didn't run into significant me

Part of your sentence seems to be missing, but I guess I understand your
point. How many files did you annotate this way? I can see it as being
useful to have the namespace explicit in each symbol, but doing it once
per module sounds like the 80% solution for 20% of the work, and the
two don't even conflict. In the current kernel, I count 12644 exported
symbols in 1646 files, in 540 directories.

One problem I can see with annotating every symbol is that it conflicts
with other patches that add more exported functions to a file without
adding the namespace, or that simply break because of context changes.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5

2007-12-06 Thread Arnd Bergmann

On Thursday 06 December 2007, Joachim Fenkes wrote:
 printk(KERN_INFO eHCA Infiniband Device Driver 
        (Version  HCAD_VERSION )\n);
  
 +   /* Autodetect hCall locking -- we can't read the firmware version
 +    * directly, but we know that starting with POWER6, all firmware
 +    * versions are good.
 +    */
 +   if (ehca_lock_hcalls == -1)
 +   ehca_lock_hcalls = !(cur_cpu_spec-cpu_user_features
 +         PPC_FEATURE_ARCH_2_05);
 +
 ret = ehca_create_comp_pool();
 if (ret) {
 ehca_gen_err(Cannot create comp pool.);

We already talked about this yesterday, but I still feel that checking the
instruction set of the CPU should not be used to determine whether a
specific device driver implementation is used int hypervisor.

At the very least, I think you should change this to read the hypervisor
version number from the device tree, though the ideal solution would be
to have the absence of this bug encoded in the device node for the ehca
device itself.

Regarding the performance problem, have you checked whether converting all
your spin_lock_irqsave to spin_lock/spin_lock_irq improves your performance
on the older machines? Maybe it's already fast enough that way.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 32729 matches

Mail list logo