Re: 2.6.30-rc6: Problem with an SSD disk on Freescale PowerPC mpc8315e-rdb, works fine on x86

2009-06-09 Thread Leon Woestenberg
Adding the sata_fsl.c developers to the recipients:

On Mon, Jun 8, 2009 at 4:59 PM, Leon
Woestenbergleon.woestenb...@gmail.com wrote:
 Hello,

 using 2.6.30-rc6, I get the following problems when I read from a SSD
 disk, connected to the
 3.0 Gb SATA controller of the MPC8315E SoC rev 1.0 running Linux 2.6.30-rc6.

 Below see the output from two dd read runs.

 The disk behaves fine on a x86 box.

 What I can do to (help) pin-point the problem?

 Regards,

 Leon.


 r...@mpc8315e-rdb:~# dd if=/dev/sda of=/dev/null bs=4k
 ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata2.00: cmd c8/00:3e:1e:e0:01/00:00:00:00:00/e0 tag 0 dma 31744 in
         res 50/00:3e:e0:df:01/00:00:00:00:00/e0 Emask 0x1 (device error)
 ata2.00: status: { DRDY }
 ata2: hard resetting link
 ata2: Signature Update detected @ 3528 msecs
 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata2.00: configured for UDMA/133
 sd 1:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
 sd 1:0:0:0: [sda] Sense Key : 0xb [current] [descriptor]
 Descriptor sense data with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
        00 01 df e0
 sd 1:0:0:0: [sda] ASC=0x0 ASCQ=0x0
 end_request: I/O error, dev sda, sector 122910
 __ratelimit: 52 callbacks suppressed
 Buffer I/O error on device sda, logical block 122910
 Buffer I/O error on device sda, logical block 122911
 Buffer I/O error on device sda, logical block 122912
 Buffer I/O error on device sda, logical block 122913
 Buffer I/O error on device sda, logical block 122914
 Buffer I/O error on device sda, logical block 122915
 Buffer I/O error on device sda, logical block 122916
 Buffer I/O error on device sda, logical block 122917
 Buffer I/O error on device sda, logical block 122918
 Buffer I/O error on device sda, logical block 122919
 ata2: EH complete
 dd: /dev/sda: Input/output error


 r...@mpc8315e-rdb:~# dd if=/dev/sda of=/dev/null bs=4k
 ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata2.00: cmd c8/00:32:9a:6e:00/00:00:00:00:00/e0 tag 0 dma 25600 in
         res 50/00:3e:5c:6e:00/00:00:00:00:00/e0 Emask 0x1 (device error)
 ata2.00: status: { DRDY }
 ata2: hard resetting link
 ata2: Signature Update detected @ 3528 msecs
 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata2.00: configured for UDMA/133
 sd 1:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
 sd 1:0:0:0: [sda] Sense Key : 0xb [current] [descriptor]
 Descriptor sense data with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
        00 00 6e 5c
 sd 1:0:0:0: [sda] ASC=0x0 ASCQ=0x0
 end_request: I/O error, dev sda, sector 28314
 __ratelimit: 52 callbacks suppressed
 Buffer I/O error on device sda, logical block 28314
 Buffer I/O error on device sda, logical block 28315
 Buffer I/O error on device sda, logical block 28316
 Buffer I/O error on device sda, logical block 28317
 Buffer I/O error on device sda, logical block 28318
 Buffer I/O error on device sda, logical block 28319
 Buffer I/O error on device sda, logical block 28320
 Buffer I/O error on device sda, logical block 28321
 Buffer I/O error on device sda, logical block 28322
 Buffer I/O error on device sda, logical block 28323
 ata2: EH complete
 dd: /dev/sda: Input/output error


 r...@mpc8315e-rdb:~# uname -a
 Linux mpc8315e-rdb 2.6.30-rc6 #1 Mon Jun 8 15:54:00 CEST 2009 ppc unknown

 r...@mpc8315e-rdb:~# hdparm -i /dev/sda

 /dev/sda:

  Model=Solidata X SSD                          , FwRev=0955    , 
 SerialNo=...
  Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs }
  RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
  BuffType=unknown, BuffSize=0kB, MaxMultSect=128, MultSect=?1?
  CurCHS=16383/16/63, CurSects=16514064, LBA=no
  IORDY=no, tPIO={min:240,w/IORDY:120}
  PIO modes:  pio0 pio3 pio4
  UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
  AdvancedPM=yes: disabled (255) WriteCache=disabled
  Drive conforms to: Unspecified:  ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6
 ATA/ATAPI-7

  * signifies the current active mode



 --
 Leon




-- 
Leon
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v4] zone_reclaim is always 0 by default

2009-06-09 Thread Robin Holt
On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:

Let me start by saying I agree completely with everything you wrote and
still disagree with this patch, but was willing to compromise and work
around this for our upcoming x86_64 machine by putting a value add
into our packaging of adding a sysctl that turns reclaim back on.

...
  Index: b/arch/powerpc/include/asm/topology.h
  ===
  --- a/arch/powerpc/include/asm/topology.h
  +++ b/arch/powerpc/include/asm/topology.h
  @@ -10,6 +10,12 @@ struct device_node;
   
   #include asm/mmzone.h
   
  +/*
  + * Distance above which we begin to use zone reclaim
  + */
  +#define RECLAIM_DISTANCE 20
  +
  +
 
 Where is the ia-64-specific modifier to RECAIM_DISTANCE?

It was already defined as 15 in arch/ia64/include/asm/topology.h

Thanks,
Robin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v4] zone_reclaim is always 0 by default

2009-06-09 Thread Mel Gorman
On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
 On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
 
 Let me start by saying I agree completely with everything you wrote and
 still disagree with this patch, but was willing to compromise and work
 around this for our upcoming x86_64 machine by putting a value add
 into our packaging of adding a sysctl that turns reclaim back on.
 

To be honest, I'm more leaning towards a NACK than an ACK on this one. I
don't support enough NUMA machines to feel strongly enough about it but
unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
might be there seems ill-advised to me and will have other consequences for
existing more traditional x86-64 NUMA machines.

 ...
   Index: b/arch/powerpc/include/asm/topology.h
   ===
   --- a/arch/powerpc/include/asm/topology.h
   +++ b/arch/powerpc/include/asm/topology.h
   @@ -10,6 +10,12 @@ struct device_node;

#include asm/mmzone.h

   +/*
   + * Distance above which we begin to use zone reclaim
   + */
   +#define RECLAIM_DISTANCE 20
   +
   +
  
  Where is the ia-64-specific modifier to RECAIM_DISTANCE?
 
 It was already defined as 15 in arch/ia64/include/asm/topology.h
 

/me slaps self

thanks

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] mpc83xx/usb.c: fix usb mux setup for mpc834x

2009-06-09 Thread Peter Korsgaard
usb0 and usb1 mux settings in the sicrl register were swapped (twice!)
in mpc834x_usb_cfg(), leading to various strange issues with fsl-ehci
and full speed devices.

The USB port config on mpc834x is done using 2 muxes: Port 0 is always
used for MPH port 0, and port 1 can either be used for MPH port 1 or DR
(unless DR uses TMDI phy or OTG, then it uses both ports) - See 8349 RM
figure 1-4..

mpc8349_usb_cfg() had this inverted for the DR, and it also had the bit
positions of the usb0 / usb1 mux settings swapped. It would basically
work if you specified port1 instead of port0 for the MPH controller (and
happened to use ULPI phys), which is what all the 834x dts have done,
even though that configuration is physically invalid.

Instead fix mpc8349_usb_cfg() and adjust the dts files to match reality.

Signed-off-by: Peter Korsgaard jac...@sunsite.dk
---
 arch/powerpc/boot/dts/asp834x-redboot.dts |2 +-
 arch/powerpc/boot/dts/mpc8349emitx.dts|2 +-
 arch/powerpc/boot/dts/mpc834x_mds.dts |2 +-
 arch/powerpc/boot/dts/sbc8349.dts |2 +-
 arch/powerpc/platforms/83xx/mpc83xx.h |4 ++--
 arch/powerpc/platforms/83xx/usb.c |   10 +-
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/boot/dts/asp834x-redboot.dts 
b/arch/powerpc/boot/dts/asp834x-redboot.dts
index 7da84fd..261d10c 100644
--- a/arch/powerpc/boot/dts/asp834x-redboot.dts
+++ b/arch/powerpc/boot/dts/asp834x-redboot.dts
@@ -167,7 +167,7 @@
interrupt-parent = ipic;
interrupts = 39 0x8;
phy_type = ulpi;
-   port1;
+   port0;
};
/* phy type (ULPI, UTMI, UTMI_WIDE, SERIAL) */
u...@23000 {
diff --git a/arch/powerpc/boot/dts/mpc8349emitx.dts 
b/arch/powerpc/boot/dts/mpc8349emitx.dts
index 1ae38f0..e540d44 100644
--- a/arch/powerpc/boot/dts/mpc8349emitx.dts
+++ b/arch/powerpc/boot/dts/mpc8349emitx.dts
@@ -156,7 +156,7 @@
interrupt-parent = ipic;
interrupts = 39 0x8;
phy_type = ulpi;
-   port1;
+   port0;
};
 
u...@23000 {
diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts 
b/arch/powerpc/boot/dts/mpc834x_mds.dts
index d9f0a23..a667fe7 100644
--- a/arch/powerpc/boot/dts/mpc834x_mds.dts
+++ b/arch/powerpc/boot/dts/mpc834x_mds.dts
@@ -153,7 +153,7 @@
interrupt-parent = ipic;
interrupts = 39 0x8;
phy_type = ulpi;
-   port1;
+   port0;
};
/* phy type (ULPI, UTMI, UTMI_WIDE, SERIAL) */
u...@23000 {
diff --git a/arch/powerpc/boot/dts/sbc8349.dts 
b/arch/powerpc/boot/dts/sbc8349.dts
index a36dbbc..c7e1c4b 100644
--- a/arch/powerpc/boot/dts/sbc8349.dts
+++ b/arch/powerpc/boot/dts/sbc8349.dts
@@ -144,7 +144,7 @@
interrupt-parent = ipic;
interrupts = 39 0x8;
phy_type = ulpi;
-   port1;
+   port0;
};
/* phy type (ULPI, UTMI, UTMI_WIDE, SERIAL) */
u...@23000 {
diff --git a/arch/powerpc/platforms/83xx/mpc83xx.h 
b/arch/powerpc/platforms/83xx/mpc83xx.h
index 83cfe51..d1dc5b0 100644
--- a/arch/powerpc/platforms/83xx/mpc83xx.h
+++ b/arch/powerpc/platforms/83xx/mpc83xx.h
@@ -22,8 +22,8 @@
 /* system i/o configuration register low */
 #define MPC83XX_SICRL_OFFS 0x114
 #define MPC834X_SICRL_USB_MASK 0x6000
-#define MPC834X_SICRL_USB0 0x4000
-#define MPC834X_SICRL_USB1 0x2000
+#define MPC834X_SICRL_USB0 0x2000
+#define MPC834X_SICRL_USB1 0x4000
 #define MPC831X_SICRL_USB_MASK 0x0c00
 #define MPC831X_SICRL_USB_ULPI 0x0800
 #define MPC8315_SICRL_USB_MASK 0x00fc
diff --git a/arch/powerpc/platforms/83xx/usb.c 
b/arch/powerpc/platforms/83xx/usb.c
index 11e1fac..f53eba3 100644
--- a/arch/powerpc/platforms/83xx/usb.c
+++ b/arch/powerpc/platforms/83xx/usb.c
@@ -51,21 +51,21 @@ int mpc834x_usb_cfg(void)
!strcmp(prop, utmi_wide))) {
sicrl |= MPC834X_SICRL_USB0 | MPC834X_SICRL_USB1;
sicrh |= MPC834X_SICRH_USB_UTMI;
-   port1_is_dr = 1;
+   port0_is_dr = 1;
} else if (prop  !strcmp(prop, serial)) {
dr_mode = of_get_property(np, dr_mode, NULL);
if (dr_mode  !strcmp(dr_mode, otg)) {
sicrl |= MPC834X_SICRL_USB0 | 
MPC834X_SICRL_USB1;
-   port1_is_dr = 1;
+   port0_is_dr = 1;
} else {
-   sicrl |= 

Re: [PATCH v4] zone_reclaim is always 0 by default

2009-06-09 Thread Robin Holt
On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
 On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
  On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
  
  Let me start by saying I agree completely with everything you wrote and
  still disagree with this patch, but was willing to compromise and work
  around this for our upcoming x86_64 machine by putting a value add
  into our packaging of adding a sysctl that turns reclaim back on.
  
 
 To be honest, I'm more leaning towards a NACK than an ACK on this one. I
 don't support enough NUMA machines to feel strongly enough about it but
 unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
 might be there seems ill-advised to me and will have other consequences for
 existing more traditional x86-64 NUMA machines.

I was sort-of planning on coming up with an x86_64 arch specific function
for setting zone_reclaim_mode, but didn't like the direction things
were going.

Something to the effect of...
--- 20090609.orig/mm/page_alloc.c   2009-06-09 06:51:34.0 -0500
+++ 20090609/mm/page_alloc.c2009-06-09 06:55:00.160762069 -0500
@@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
while ((node = find_next_best_node(local_node, used_mask)) = 0) {
int distance = node_distance(local_node, node);
 
-   /*
-* If another node is sufficiently far away then it is better
-* to reclaim pages in a zone before going off node.
-*/
-   if (distance  RECLAIM_DISTANCE)
-   zone_reclaim_mode = 1;
+   zone_reclaim_mode = arch_zone_reclaim_mode(distance);
 
/*
 * We don't want to pressure a particular node.

And then letting each arch define an arch_zone_reclaim_mode().  If other
values are needed in the determination, we would add parameters to
reflect this.

For ia64, add

static inline ia64_zone_reclaim_mode(int distance)
{
if (distance  15)
return 1;
}

#define arch_zone_reclaim_mode(_d)  ia64_zone_reclaim_mode(_d)


Then, inside x86_64_zone_reclaim_mode(), I could make it something like
if (distance  40 || is_uv_system())
return 1;

In the end, I didn't think this fight was worth fighting given how ugly
this felt.  Upon second thought, I am beginning to think it is not that
bad, but I also don't think it is that good either.

Thanks,
Robin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v4] zone_reclaim is always 0 by default

2009-06-09 Thread KOSAKI Motohiro
Hi

sorry for late responce. my e-mail reading speed is very slow ;-)

First, Could you please read past thread?
I think many topic of this mail are already discussed.


 On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
  
  Current linux policy is, zone_reclaim_mode is enabled by default if the 
  machine
  has large remote node distance. it's because we could assume that large 
  distance
  mean large server until recently.
  
 
 We don't make assumptions about the server being large, small or otherwise. 
 The
 affinity tables reporting a distance of 20 or more is saying remote memory
 has twice the latency of local memory. This is true irrespective of workload
 and implies that going off-node has a real penalty regardless of workload.

No.
Now, we talk about off-node allocation vs unnecessary file cache dropping.
IOW, off-node allocation vs disk access.

Then, the worth doesn't only depend on off-node distance, but also depend on
workload IO tendency and IO speed.

Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
degression although its box. 

So, I don't think this problem is small vs large machine issue.
nor i7 issue.
high-speed P2P CPU integrated memory controller expose old issue.


  In general, workload depended configration shouldn't put into default 
  settings.
  
  However, current code is long standing about two year. Highest POWER and 
  IA64 HPC machine
  (only) use this setting.
  
  Thus, x86 and almost rest architecture change default setting, but Only 
  power and ia64
  remain current configuration for backward-compatibility.
  
 
 What about if it's x86-64-based NUMA but it's not i7 based. There, the
 NUMA distances might really mean something and that zone_reclaim behaviour
 is desirable.

hmmm..
I don't hope ignore AMD, I think it's common characterastic of P2P and
integrated memory controller machine.

Also, I don't hope detect CPU family or similar, because we need update
such code evey when Intel makes new cpu.

Can we detect P2P interconnect machine? I'm not sure.


 I think if we're going down the road of setting the default, it shouldn't be
 per-architecture defaults as such. Other choices for addressing this might be;
 
 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
(or some other sensible figure) on i7
 
 2. There should be a per-arch modifier callback for the affinity
distances. If the x86 code detects the CPU is an i7, it can reduce the
reported latencies to be more in line with expected reality.
 
 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
overall is free. The difficulty is figuring out if the allocation is for
file pages.
 
 4. Change zone_reclaim_mode default to mean do your best to figure it
out. Patch 1 would default large distances to 1 to see what happens.
Then apply a heuristic when in figure-it-out mode and using reclaim_mode 
 == 1
 
   If we have locally reclaimed 2% of the nodes memory in file pages
   within the last 5 seconds when = 20% of total physical memory was
   free, then set the reclaim_mode to 0 on the assumption the node is
   mostly caching pages and shouldn't be reclaimed to avoid excessive IO
 
 Option 1 would appear to be the most straight-forward but option 2
 should be doable. Option 3 and 4 could turn into a rats nest and I would
 consider those approaches a bit more drastic.

hmhm. 
I think the key-point of option 1 and 2 are proper hardware detecting way.

option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
but you already pointed out its hard, because page-allocator don't know
allocation purpose ;)


  @@ -10,6 +10,12 @@ struct device_node;
   
   #include asm/mmzone.h
   
  +/*
  + * Distance above which we begin to use zone reclaim
  + */
  +#define RECLAIM_DISTANCE 20
  +
  +
 
 Where is the ia-64-specific modifier to RECAIM_DISTANCE?


arch/ia64/include/asm/topology.h has

/*
 * Distance above which we begin to use zone reclaim
 */
#define RECLAIM_DISTANCE 15


I don't think distance==15 is machine independent proper definition.
but there is long lived definition ;)




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-09 Thread Hannes Hering
This patch implements toleration of dynamic memory operations and 16 GB
gigantic pages. On module load the driver walks through available system
memory, checks for available memory ranges and then registers the kernel
internal memory region accordingly. The translation of address ranges is
implemented via a 3-level busmap.

Signed-off-by: Hannes Hering heri...@de.ibm.com

---
This patch is built and tested against infiniband.git. Please apply for 2.6.31.

Regards

Hannes

Index: infiniband/drivers/infiniband/hw/ehca/ehca_main.c
===
--- infiniband.orig/drivers/infiniband/hw/ehca/ehca_main.c  2009-06-09 
14:20:37.0 +0200
+++ infiniband/drivers/infiniband/hw/ehca/ehca_main.c   2009-06-09 
14:20:47.0 +0200
@@ -52,7 +52,7 @@
 #include ehca_tools.h
 #include hcp_if.h
 
-#define HCAD_VERSION 0026
+#define HCAD_VERSION 0027
 
 MODULE_LICENSE(Dual BSD/GPL);
 MODULE_AUTHOR(Christoph Raisch rai...@de.ibm.com);
@@ -506,6 +506,7 @@
shca-ib_device.detach_mcast= ehca_detach_mcast;
shca-ib_device.process_mad = ehca_process_mad;
shca-ib_device.mmap= ehca_mmap;
+   shca-ib_device.dma_ops = ehca_dma_mapping_ops;
 
if (EHCA_BMASK_GET(HCA_CAP_SRQ, shca-hca_cap)) {
shca-ib_device.uverbs_cmd_mask |=
@@ -1028,17 +1029,23 @@
goto module_init1;
}
 
+   ret = ehca_create_busmap();
+   if (ret) {
+   ehca_gen_err(Cannot create busmap.);
+   goto module_init2;
+   }
+
ret = ibmebus_register_driver(ehca_driver);
if (ret) {
ehca_gen_err(Cannot register eHCA device driver);
ret = -EINVAL;
-   goto module_init2;
+   goto module_init3;
}
 
ret = register_memory_notifier(ehca_mem_nb);
if (ret) {
ehca_gen_err(Failed registering memory add/remove notifier);
-   goto module_init3;
+   goto module_init4;
}
 
if (ehca_poll_all_eqs != 1) {
@@ -1053,9 +1060,12 @@
 
return 0;
 
-module_init3:
+module_init4:
ibmebus_unregister_driver(ehca_driver);
 
+module_init3:
+   ehca_destroy_busmap();
+
 module_init2:
ehca_destroy_slab_caches();
 
@@ -1073,6 +1083,8 @@
 
unregister_memory_notifier(ehca_mem_nb);
 
+   ehca_destroy_busmap();
+
ehca_destroy_slab_caches();
 
ehca_destroy_comp_pool();
Index: infiniband/drivers/infiniband/hw/ehca/ehca_mrmw.c
===
--- infiniband.orig/drivers/infiniband/hw/ehca/ehca_mrmw.c  2009-06-09 
14:20:37.0 +0200
+++ infiniband/drivers/infiniband/hw/ehca/ehca_mrmw.c   2009-06-09 
14:20:47.0 +0200
@@ -53,6 +53,39 @@
 /* max number of rpages (per hcall register_rpages) */
 #define MAX_RPAGES 512
 
+/* DMEM toleration management */
+#define EHCA_SECTSHIFTSECTION_SIZE_BITS
+#define EHCA_SECTSIZE  (1UL  EHCA_SECTSHIFT)
+#define EHCA_HUGEPAGESHIFT 34
+#define EHCA_HUGEPAGE_SIZE (1UL  EHCA_HUGEPAGESHIFT)
+#define EHCA_HUGEPAGE_PFN_MASK ((EHCA_HUGEPAGE_SIZE - 1)  PAGE_SHIFT)
+#define EHCA_INVAL_ADDR0xULL
+#define EHCA_DIR_INDEX_SHIFT 13   /* 8k Entries in 64k block */
+#define EHCA_TOP_INDEX_SHIFT (EHCA_DIR_INDEX_SHIFT * 2)
+#define EHCA_MAP_ENTRIES (1  EHCA_DIR_INDEX_SHIFT)
+#define EHCA_TOP_MAP_SIZE (0x1)   /* currently fixed map size 
*/
+#define EHCA_DIR_MAP_SIZE (0x1)
+#define EHCA_ENT_MAP_SIZE (0x1)
+#define EHCA_INDEX_MASK (EHCA_MAP_ENTRIES - 1)
+#define EHCA_REG_MR 0
+#define EHCA_REG_BUSMAP_MR (~0)
+
+static unsigned long ehca_mr_len;
+/*
+ * Memory map data structures
+ */
+struct ehca_dir_bmap {
+   u64 ent[EHCA_MAP_ENTRIES];
+};
+struct ehca_top_bmap {
+   struct ehca_dir_bmap *dir[EHCA_MAP_ENTRIES];
+};
+struct ehca_bmap {
+   struct ehca_top_bmap *top[EHCA_MAP_ENTRIES];
+};
+
+static struct ehca_bmap *ehca_bmap;
+
 static struct kmem_cache *mr_cache;
 static struct kmem_cache *mw_cache;
 
@@ -68,6 +101,8 @@
 #define EHCA_MR_PGSHIFT1M  20
 #define EHCA_MR_PGSHIFT16M 24
 
+static u64 ehca_map_vaddr(void *caddr);
+
 static u32 ehca_encode_hwpage_size(u32 pgsize)
 {
int log = ilog2(pgsize);
@@ -135,7 +170,8 @@
goto get_dma_mr_exit0;
}
 
-   ret = ehca_reg_maxmr(shca, e_maxmr, (u64 *)KERNELBASE,
+   ret = ehca_reg_maxmr(shca, e_maxmr,
+(void *)ehca_map_vaddr((void *)KERNELBASE),
 mr_access_flags, e_pd,
 e_maxmr-ib.ib_mr.lkey,
 e_maxmr-ib.ib_mr.rkey);
@@ -251,7 +287,7 @@
 
ret = ehca_reg_mr(shca, e_mr, iova_start, size, mr_access_flags,
  e_pd, 

Re: [Powerpc/SLQB] Next June 06 : BUG during scsi initialization

2009-06-09 Thread Nick Piggin
On Mon, Jun 08, 2009 at 05:42:14PM +0530, Sachin Sant wrote:
 Pekka J Enberg wrote:
 Hi Sachin,
 __slab_alloc_page: nid=2, cache_node=c000de01ba00, 
 cache_list=c000de01ba00
 __slab_alloc_page: nid=2, cache_node=c000de01bd00, 
 cache_list=c000de01bd00
 __slab_alloc_page: nid=2, cache_node=c000de01ba00, cache_lisBUG: spinlock 
 bad magic on CPU#1, modprobe/62
  lock: c08c4280, .magic: 7dcc61f0, .owner:  || status == 
 __GCONV_INCOMPLETE_INPUT || status == __GCONV_FULL_OUTPUT/724596736, 
 .owner_cpu: 4095
 Call Trace:
 [c000c7da36d0] [c00116e0] .show_stack+0x6c/0x16c (unreliable)
 [c000c7da3780] [c0365bcc] .spin_bug+0xb0/0xd4
 [c000c7da3810] [c0365e94] ._raw_spin_lock+0x48/0x184
 [c000c7da38b0] [c05de4f8] ._spin_lock+0x10/0x24
 [c000c7da3920] [c0141240] .__slab_alloc_page+0x410/0x4b4
 [c000c7da39e0] [c0142804] .kmem_cache_alloc+0x13c/0x21c
 [c000c7da3aa0] [c01431dc] .kmem_cache_create+0x294/0x2a8
 [c000c7da3b90] [d0ea1438] .scsi_init_queue+0x38/0x170 [scsi_mod]
 [c000c7da3c20] [d0ea1334] .init_scsi+0x1c/0xe8 [scsi_mod]
 [c000c7da3ca0] [c00092c0] .do_one_initcall+0x80/0x19c
 [c000c7da3d90] [c00c09c8] .SyS_init_module+0xe0/0x244
 [c000c7da3e30] [c0008534] syscall_exit+0x0/0x40

I can't really work it out. It seems to be the kmem_cache_cache which has
a problem, but there have already been lots of caches created and even
this samw cache_node already used right beforehand with no problem.

Unless a CPU or node comes up or something right at this point or the
caller is scheduled onto a different CPU... oopses seem to all
have CPU#1, wheras boot CPU is probably #0 (these CPUs are node 0
and memory is only on node 1 and 2 where there are no CPUs if I read
correctly).

I still can't see the reason for the failure, but can you try this
patch please and show dmesg?

---
 mm/slqb.c |   34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/slqb.c
===
--- linux-2.6.orig/mm/slqb.c
+++ linux-2.6/mm/slqb.c
@@ -963,6 +963,7 @@ static struct slqb_page *allocate_slab(s
 
flags |= s-allocflags;
 
+   flags = ~0x2000;
page = (struct slqb_page *)alloc_pages_node(node, flags, s-order);
if (!page)
return NULL;
@@ -1357,6 +1358,8 @@ static noinline void *__slab_alloc_page(
unsigned int colour;
void *object;
 
+   if (gfpflags  0x2000)
+   printk(SLQB: __slab_alloc_page cpu=%d request node=%d\n, 
smp_processor_id(), node);
c = get_cpu_slab(s, smp_processor_id());
colour = c-colour_next;
c-colour_next += s-colour_off;
@@ -1374,6 +1377,8 @@ static noinline void *__slab_alloc_page(
if (unlikely(!page))
return page;
 
+   if (gfpflags  0x2000)
+   printk(SLQB: __slab_alloc_page cpu=%d,nid=%d request node=%d 
page node=%d\n, smp_processor_id(), numa_node_id(), node, 
slqb_page_to_nid(page));
if (!NUMA_BUILD || likely(slqb_page_to_nid(page) == numa_node_id())) {
struct kmem_cache_cpu *c;
int cpu = smp_processor_id();
@@ -1382,6 +1387,7 @@ static noinline void *__slab_alloc_page(
l = c-list;
page-list = l;
 
+   printk(SLQB: __slab_alloc_page spin_lock(%p)\n, 
l-page_lock);
spin_lock(l-page_lock);
l-nr_slabs++;
l-nr_partial++;
@@ -1398,6 +1404,8 @@ static noinline void *__slab_alloc_page(
l = n-list;
page-list = l;
 
+   printk(SLQB: __slab_alloc_page spin_lock(%p)\n, 
n-list_lock);
+   printk(SLQB: __slab_alloc_page spin_lock(%p)\n, 
l-page_lock);
spin_lock(n-list_lock);
spin_lock(l-page_lock);
l-nr_slabs++;
@@ -1411,6 +1419,7 @@ static noinline void *__slab_alloc_page(
 #endif
}
VM_BUG_ON(!object);
+   printk(SLQB: __slab_alloc_page OK\n);
return object;
 }
 
@@ -1440,6 +1449,8 @@ static void *__remote_slab_alloc_node(st
struct kmem_cache_list *l;
void *object;
 
+   if (gfpflags  0x2000)
+   printk(SLQB: __remote_slab_alloc_node cpu=%d request 
node=%d\n, smp_processor_id(), node);
n = s-node_slab[node];
if (unlikely(!n)) /* node has no memory */
return NULL;
@@ -1541,7 +1552,11 @@ static __always_inline void *slab_alloc(
 
 again:
local_irq_save(flags);
+   if (gfpflags  0x2000)
+   printk(SLQB: slab_alloc cpu=%d,nid=%d request node=%d\n, 
smp_processor_id(), numa_node_id(), node);
object = __slab_alloc(s, gfpflags, node);
+   if (gfpflags  0x2000)
+   printk(SLQB: slab_alloc cpu=%d return=%p\n, 
smp_processor_id(), object);

Re: [PATCH v4] zone_reclaim is always 0 by default

2009-06-09 Thread Mel Gorman
On Tue, Jun 09, 2009 at 10:48:34PM +0900, KOSAKI Motohiro wrote:
 Hi
 
 sorry for late responce. my e-mail reading speed is very slow ;-)
 
 First, Could you please read past thread?
 I think many topic of this mail are already discussed.
 

I think I caught them all but the horrible fact of the matter is that
whether zone_reclaim_mode should be 1 or 0 on NUMA machines is it depends.
There are arguements for both and no clear winner.

 
  On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
   
   Current linux policy is, zone_reclaim_mode is enabled by default if the 
   machine
   has large remote node distance. it's because we could assume that large 
   distance
   mean large server until recently.
   
  
  We don't make assumptions about the server being large, small or otherwise. 
  The
  affinity tables reporting a distance of 20 or more is saying remote memory
  has twice the latency of local memory. This is true irrespective of 
  workload
  and implies that going off-node has a real penalty regardless of workload.
 
 No.
 Now, we talk about off-node allocation vs unnecessary file cache dropping.
 IOW, off-node allocation vs disk access.
 

Even if we used GFP flags to identify the file pages, there is no guarantee
that we are taking the correct action to keep relevant pages in memory.

 Then, the worth doesn't only depend on off-node distance, but also depend on
 workload IO tendency and IO speed.
 
 Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
 degression although its box. 
 

I bet if it was 0, that the off-node accesses would somewtimes make
performance degression as well :(

 So, I don't think this problem is small vs large machine issue.
 nor i7 issue.
 high-speed P2P CPU integrated memory controller expose old issue.
 
 
   In general, workload depended configration shouldn't put into default 
   settings.
   
   However, current code is long standing about two year. Highest POWER and 
   IA64 HPC machine
   (only) use this setting.
   
   Thus, x86 and almost rest architecture change default setting, but Only 
   power and ia64
   remain current configuration for backward-compatibility.
   
  
  What about if it's x86-64-based NUMA but it's not i7 based. There, the
  NUMA distances might really mean something and that zone_reclaim behaviour
  is desirable.
 
 hmmm..
 I don't hope ignore AMD, I think it's common characterastic of P2P and
 integrated memory controller machine.
 
 Also, I don't hope detect CPU family or similar, because we need update
 such code evey when Intel makes new cpu.
 
 Can we detect P2P interconnect machine? I'm not sure.
 

I've no idea. It's not just I7 because some of the AMD chips will have
integrated memory controllers as well. We were somewhat depending on the
affinity information providing the necessary information.

  I think if we're going down the road of setting the default, it shouldn't be
  per-architecture defaults as such. Other choices for addressing this might 
  be;
  
  1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
 (or some other sensible figure) on i7
  
  2. There should be a per-arch modifier callback for the affinity
 distances. If the x86 code detects the CPU is an i7, it can reduce the
 reported latencies to be more in line with expected reality.
  
  3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
 overall is free. The difficulty is figuring out if the allocation is for
 file pages.
  
  4. Change zone_reclaim_mode default to mean do your best to figure it
 out. Patch 1 would default large distances to 1 to see what happens.
 Then apply a heuristic when in figure-it-out mode and using reclaim_mode 
  == 1
  
  If we have locally reclaimed 2% of the nodes memory in file pages
  within the last 5 seconds when = 20% of total physical memory was
  free, then set the reclaim_mode to 0 on the assumption the node is
  mostly caching pages and shouldn't be reclaimed to avoid excessive IO
  
  Option 1 would appear to be the most straight-forward but option 2
  should be doable. Option 3 and 4 could turn into a rats nest and I would
  consider those approaches a bit more drastic.
 
 hmhm. 
 I think the key-point of option 1 and 2 are proper hardware detecting way.
 
 option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
 but you already pointed out its hard, because page-allocator don't know
 allocation purpose ;)
 

Option 3 may be undoable. Even if the allocations are tagged as this is
a file-backed allocation, we have no way of detecting how important
that is to the overall workload. Option 4 would be the preference. It's
a heuristic that might let us down, but the administrator can override
it and fix the reclaim_mode in the event we get it wrong.

 
   @@ -10,6 +10,12 @@ struct device_node;

#include asm/mmzone.h

   +/*
   + * Distance above which we begin to use 

Re: [BUILD FAILURE 01/04] Next June 04:PPC64 randconfig [drivers/staging/comedi/drivers.o]

2009-06-09 Thread Subrata Modak
On Tue, 2009-06-09 at 13:50 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2009-06-07 at 20:06 +0530, Subrata Modak wrote:
  On Sat, 2009-06-06 at 09:36 -0400, Frank Mori Hess wrote:
   On Saturday 06 June 2009, Greg KH wrote:
Frank and Ian, any thoughts about the vmap call in the
comedi_buf_alloc() call?  Why is it using PAGE_KERNEL_NOCACHE, and what
is the prealloc_buf buffer used for?
   
   It is a circular buffer used to hold data streaming either to or from a 
   board (for example when producing an analog output waveform).  Reads and 
   writes to the device files read/write to the circular buffer, plus a few 
   drivers do dma directly to/from it.  I personally don't have a problem 
   with requiring drivers to have their own dma buffers and making them copy 
   data between their private dma buffers and the main circular buffer.  I 
   guess the original design wanted to support zero-copy dma.
  
  Great to hear that. How about a patch that solves my build problem on
  PPC64(the problem seems to be existing for long) ? 
 
 In any case, doing PAGE_KERNEL_NOCACHE for DMA memory is incorrect on
 many architectures. So at this stage, there's no much option but ifdef I
 suspect for now until this is fixed properly.

Ok. But, i am not sure whether Greg will agree to this. If, Ok, is the
following patch i sent earlier Ok ?

http://lkml.org/lkml/2009/6/5/462,

Regards--
Subrata

 
 It does make sense to want to have some memory like that shared between
 user space and DMA, though I don't know what the right approach that
 works on all archs is at this stage. Worth asking the Alsa guys, I think
 they have similar issues :-)
 
 But doing double buffering might do the trick fine for now.
 
 Cheers,
 Ben.
 
 
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUILD FAILURE 02/04] Next June 04:PPC64 randconfig [drivers/usb/host/ohci-hcd.o]

2009-06-09 Thread Subrata Modak
On Fri, 2009-06-05 at 13:26 -0500, Subrata Modak wrote:
 On Thu, 2009-06-04 at 10:07 -0400, Jon Smirl wrote:
 On Thu, Jun 4, 2009 at 9:31 AM, Subrata Modak
  subr...@linux.vnet.ibm.com wrote:
   CC  drivers/usb/host/ohci-hcd.o
   In file included from drivers/usb/host/ohci-hcd.c:1060:
   drivers/usb/host/ohci-ppc-of.c:242:2: error: #error No endianess 
   selected for ppc-of-ohci
   make[3]: *** [drivers/usb/host/ohci-hcd.o] Error 1
   make[2]: *** [drivers/usb/host] Error 2
   make[1]: *** [drivers/usb] Error 2
   make: *** [drivers] Error 2
  
   I reported this earlier, and there were some discussions:
   http://groups.google.co.kr/group/linux.kernel/browse_thread/thread/edff9d5572d3d225
  
  Proposed patch by Arnd should fix this. It has not been merged.
  http://lkml.org/lkml/2009/4/22/49
 
 Correct, it fixes the issue. However, since few changes might have gone
 to the Kconfig, the patch does not apply cleanly. Below is the patch, just
 a retake of the earlier one, but on the latest code. 
 
 David,
 
 Can you please pickup the following patch ?

David,

Is it you who will be merging this patch. Or, do i need to send it to
somebody else ?

Regards--
Subrata

 
 Signed-off-by: Arnd Bergmann a...@arndb.de,
 Resent-by: Subrata Modak subr...@linux.vnet.ibm.com
 ---
 
 --- linux-2.6.30-rc8/drivers/usb/host/Kconfig.orig2009-06-05 
 10:31:30.0 -0500
 +++ linux-2.6.30-rc8/drivers/usb/host/Kconfig 2009-06-05 10:37:53.0 
 -0500
 @@ -181,26 +181,26 @@ config USB_OHCI_HCD_PPC_SOC
 Enables support for the USB controller on the MPC52xx or
 STB03xxx processor chip.  If unsure, say Y.
 
 -config USB_OHCI_HCD_PPC_OF
 - bool OHCI support for PPC USB controller on OF platform bus
 - depends on USB_OHCI_HCD  PPC_OF
 - default y
 - ---help---
 -   Enables support for the USB controller PowerPC present on the
 -   OpenFirmware platform bus.
 -
  config USB_OHCI_HCD_PPC_OF_BE
 - bool Support big endian HC
 - depends on USB_OHCI_HCD_PPC_OF
 - default y
 + bool OHCI support for OF platform bus (big endian)
 + depends on USB_OHCI_HCD  PPC_OF
   select USB_OHCI_BIG_ENDIAN_DESC
   select USB_OHCI_BIG_ENDIAN_MMIO
 + ---help---
 + Enables support for big-endian USB controllers present on the
 + OpenFirmware platform bus.
 
  config USB_OHCI_HCD_PPC_OF_LE
 - bool Support little endian HC
 - depends on USB_OHCI_HCD_PPC_OF
 - default n
 + bool OHCI support for OF platform bus (little endian)
 + depends on USB_OHCI_HCD  PPC_OF
   select USB_OHCI_LITTLE_ENDIAN
 + ---help---
 + Enables support for little-endian USB controllers present on the
 + OpenFirmware platform bus.
 +
 + config USB_OHCI_HCD_PPC_OF
 + bool
 + default USB_OHCI_HCD_PPC_OF_BE || USB_OHCI_HCD_PPC_OF_LE
 
  config USB_OHCI_HCD_PCI
   bool OHCI support for PCI-bus USB controllers
 
 ---
 Regards--
 Subrata
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -next] powerpc/85xx: Add support for X-ES MPC85xx boards

2009-06-09 Thread Nate Case
On Mon, 2009-06-08 at 17:52 -0500, Kumar Gala wrote:
  +static void xes_mpc85xx_configure_l1(void)
  +{
[snip]
 
 I'd prefer we move this into __setup_cpu_e500v1/__setup_cpu_e500v2 so  
 its done for all processors regardless of platform.

How does something like this look?  Let me know and I can test and
submit it separately.

- Nate

diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S 
b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
index eb4b9ad..546804f 100644
--- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
+++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
@@ -17,6 +17,34 @@
 #include asm/cputable.h
 #include asm/ppc_asm.h
 
+_GLOBAL(__e500_icache_enable)
+   mfspr   r3, SPRN_L1CSR1
+   orisr3, r3, l1csr1_...@h
+   ori r3, r3, (L1CSR1_ICFI | L1CSR1_ICE)
+   mtspr   SPRN_L1CSR1, r3 /* Enable I-Cache */
+   isync
+   blr
+
+_GLOBAL(__e500_dcache_enable)
+   msync
+   isync
+   li  r3, 0
+   mtspr   SPRN_L1CSR0, r3 /* Disable */
+   msync
+   isync
+   li  r3, L1CSR0_DCFI
+   mtspr   SPRN_L1CSR0, r3 /* Invalidate */
+   msync
+   isync
+   mfspr   r3, SPRN_L1CSR0
+   orisr3, r3, l1csr0_...@h
+   ori r3, r3, (L1CSR0_DCFI | L1CSR0_DCE)
+   msync
+   isync
+   mtspr   SPRN_L1CSR0, r3 /* Enable */
+   isync
+   blr
+
 _GLOBAL(__setup_cpu_e200)
/* enable dedicated debug exception handling resources (Debug APU) */
mfspr   r3,SPRN_HID0
@@ -25,7 +53,16 @@ _GLOBAL(__setup_cpu_e200)
b   __setup_e200_ivors
 _GLOBAL(__setup_cpu_e500v1)
 _GLOBAL(__setup_cpu_e500v2)
-   b   __setup_e500_ivors
+   mflrr4
+   bl  __e500_icache_enable
+   bl  __e500_dcache_enable
+   bl  __setup_e500_ivors
+   mtlrr4
+   blr
 _GLOBAL(__setup_cpu_e500mc)
-   b   __setup_e500mc_ivors
-
+   mflrr4
+   bl  __e500_icache_enable
+   bl  __e500_dcache_enable
+   bl  __setup_e500mc_ivors
+   mtlrr4
+   blr


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUILD FAILURE 01/04] Next June 04:PPC64 randconfig [drivers/staging/comedi/drivers.o]

2009-06-09 Thread Geert Uytterhoeven
On Tue, Jun 9, 2009 at 20:34, Subrata Modaksubr...@linux.vnet.ibm.com wrote:
 On Tue, 2009-06-09 at 13:50 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2009-06-07 at 20:06 +0530, Subrata Modak wrote:
  On Sat, 2009-06-06 at 09:36 -0400, Frank Mori Hess wrote:
   On Saturday 06 June 2009, Greg KH wrote:
Frank and Ian, any thoughts about the vmap call in the
comedi_buf_alloc() call?  Why is it using PAGE_KERNEL_NOCACHE, and what
is the prealloc_buf buffer used for?
  
   It is a circular buffer used to hold data streaming either to or from a
   board (for example when producing an analog output waveform).  Reads and
   writes to the device files read/write to the circular buffer, plus a few
   drivers do dma directly to/from it.  I personally don't have a problem
   with requiring drivers to have their own dma buffers and making them copy
   data between their private dma buffers and the main circular buffer.  I
   guess the original design wanted to support zero-copy dma.
 
  Great to hear that. How about a patch that solves my build problem on
  PPC64(the problem seems to be existing for long) ?

 In any case, doing PAGE_KERNEL_NOCACHE for DMA memory is incorrect on
 many architectures. So at this stage, there's no much option but ifdef I
 suspect for now until this is fixed properly.

 Ok. But, i am not sure whether Greg will agree to this. If, Ok, is the
 following patch i sent earlier Ok ?

 http://lkml.org/lkml/2009/6/5/462,

Your patch helps powerpc only. Compilation is still broken on most
other architectures.

 It does make sense to want to have some memory like that shared between
 user space and DMA, though I don't know what the right approach that
 works on all archs is at this stage. Worth asking the Alsa guys, I think
 they have similar issues :-)

 But doing double buffering might do the trick fine for now.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH -next] powerpc/85xx: Add support for X-ES MPC85xx boards

2009-06-09 Thread Kumar Gala


On Jun 9, 2009, at 1:53 PM, Nate Case wrote:


On Mon, 2009-06-08 at 17:52 -0500, Kumar Gala wrote:

+static void xes_mpc85xx_configure_l1(void)
+{

[snip]


I'd prefer we move this into __setup_cpu_e500v1/__setup_cpu_e500v2 so
its done for all processors regardless of platform.


How does something like this look?  Let me know and I can test and
submit it separately.

- Nate

diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/ 
powerpc/kernel/cpu_setup_fsl_booke.S

index eb4b9ad..546804f 100644
--- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
+++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
@@ -17,6 +17,34 @@
#include asm/cputable.h
#include asm/ppc_asm.h

+_GLOBAL(__e500_icache_enable)


I'd prefer we test to see if the cache is enabled and if it is just  
return



+   mfspr   r3, SPRN_L1CSR1
+   orisr3, r3, l1csr1_...@h
+   ori r3, r3, (L1CSR1_ICFI | L1CSR1_ICE)
+   mtspr   SPRN_L1CSR1, r3 /* Enable I-Cache */
+   isync
+   blr
+
+_GLOBAL(__e500_dcache_enable)


I'd prefer we test to see if the cache is enabled and if it is just  
return


+   msync
+   isync
+   li  r3, 0
+   mtspr   SPRN_L1CSR0, r3 /* Disable */
+   msync
+   isync
+   li  r3, L1CSR0_DCFI


should probably flash reset the locks as well.



+   mtspr   SPRN_L1CSR0, r3 /* Invalidate */
+   msync
+   isync
+   mfspr   r3, SPRN_L1CSR0
+   orisr3, r3, l1csr0_...@h
+   ori r3, r3, (L1CSR0_DCFI | L1CSR0_DCE)
+   msync
+   isync
+   mtspr   SPRN_L1CSR0, r3 /* Enable */
+   isync
+   blr
+
_GLOBAL(__setup_cpu_e20

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc/mpc52xx/mtd: fix mtd-ram access for 16-bit Local Plus Bus

2009-06-09 Thread Albrecht Dreß

Hi all,

this patch adds support for RAM chips connected to the Local Plus Bus  
of a MPC5200B in 16-bit mode.  As no single byte write accesses are  
allowed by the bus in this mode, a byte write has to be split into a  
word read - modify - write sequence (mpc52xx_memcpy2lpb16, as  
fix/extension for memcpy_toio; note that memcpy_fromio *does* work just  
fine).  It has been tested in conjunction with Wolfram Sang's mtd-ram  
[1] and Sascha Hauer's jffs unaligned access [2] patches on 2.6.29.1,  
with a Renesas static RAM connected in 16-bit Large Flash mode.


[1]  
http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-June/072794.html

[2] http://article.gmane.org/gmane.linux.drivers.mtd/21521

Signed-off-by: Albrecht Dreß albrecht.dr...@arcor.de
Cc: Grant Likely grant.lik...@secretlab.ca
Cc: David Woodhouse dw...@infradead.org
Cc: linuxppc-...@ozlabs.org

---

diff -u  
linux-2.6.29.1.orig/arch/powerpc/platforms/52xx/mpc52xx_common.c  
linux-2.6.29.1/arch/powerpc/platforms/52xx/mpc52xx_common.c
--- linux-2.6.29.1.orig/arch/powerpc/platforms/52xx/mpc52xx_common.c 
2009-04-02 22:55:27.0 +0200
+++ linux-2.6.29.1/arch/powerpc/platforms/52xx/mpc52xx_common.c  
2009-06-09 21:16:22.0 +0200

@@ -225,3 +225,59 @@

while (1);
 }
+
+/**
+ * mpc52xx_memcpy2lpb16: copy data to the Local Plus Bus in 16-bit  
mode which

+ * doesn't allow byte accesses
+ */
+void
+mpc52xx_memcpy2lpb16(volatile void __iomem *dest, const void *src,
+unsigned long n)
+{
+   void *vdest = (void __force *) dest;
+
+   __asm__ __volatile__ (sync : : : memory);
+
+   if (((unsigned long) vdest  1) != 0) {
+   u8 buf[2];
+
+   *(u16 *)buf = *((volatile u16 *)(vdest - 1));
+   buf[1] = *((u8 *)src);
+   *((volatile u16 *)(vdest - 1)) = *(u16 *)buf;
+   src++;
+   vdest++;
+   n--;
+   }
+
+   /* looks weird, but helps the optimiser... */
+   if (n = 4) {
+   unsigned long chunks = n  2;
+   volatile u32 * _dst = (volatile u32 *)(vdest - 4);
+   volatile u32 * _src = (volatile u32 *)(src - 4);
+
+   vdest += chunks  2;
+   src += chunks  2;
+   do {
+   *++_dst = *++_src;
+   } while (--chunks);
+   n = 3;
+   }
+
+   if (n = 2) {
+   *((volatile u16 *)vdest) = *((volatile u16 *)src);
+   src += 2;
+   vdest += 2;
+   n -= 2;
+   }
+
+   if (n  0) {
+   u8 buf[2];
+
+   *(u16 *)buf = *((volatile u16 *)vdest);
+   buf[0] = *((u8 *)src);
+   *((volatile u16 *)vdest) = *(u16 *)buf;
+   }
+
+   __asm__ __volatile__ (sync : : : memory);
+}
+EXPORT_SYMBOL(mpc52xx_memcpy2lpb16);
diff -u linux-2.6.29.1.orig/arch/powerpc/include/asm/mpc52xx.h  
linux-2.6.29.1/arch/powerpc/include/asm/mpc52xx.h
--- linux-2.6.29.1.orig/arch/powerpc/include/asm/mpc52xx.h   
2009-04-02 22:55:27.0 +0200
+++ linux-2.6.29.1/arch/powerpc/include/asm/mpc52xx.h   2009-06-09  
21:14:31.0 +0200

@@ -274,6 +274,8 @@
 extern void mpc52xx_map_common_devices(void);
 extern int mpc52xx_set_psc_clkdiv(int psc_id, int clkdiv);
 extern void mpc52xx_restart(char *cmd);
+extern void mpc52xx_memcpy2lpb16(volatile void __iomem *dest, const  
void *src,

+unsigned long n);

 /* mpc52xx_pic.c */
 extern void mpc52xx_init_irq(void);
diff -u linux-2.6.29.1.orig/include/linux/mtd/map.h  
linux-2.6.29.1/include/linux/mtd/map.h
--- linux-2.6.29.1.orig/include/linux/mtd/map.h 2009-04-02  
22:55:27.0 +0200
+++ linux-2.6.29.1/include/linux/mtd/map.h  2009-06-08  
14:28:05.0 +0200

@@ -13,6 +13,9 @@
 #include asm/unaligned.h
 #include asm/system.h
 #include asm/io.h
+#ifdef CONFIG_PPC_MPC52xx
+#include asm/mpc52xx.h
+#endif

 #ifdef CONFIG_MTD_MAP_BANK_WIDTH_1
 #define map_bankwidth(map) 1
@@ -417,6 +420,11 @@

 static inline void inline_map_copy_to(struct map_info *map, unsigned  
long to, const void *from, ssize_t len)

 {
+#ifdef CONFIG_PPC_MPC52xx
+   if (map-bankwidth == 2)
+   mpc52xx_memcpy2lpb16(map-virt + to, from, len);
+   else
+#endif
memcpy_toio(map-virt + to, from, len);
 }



pgpVaj8YdBwdW.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4] zone_reclaim is always 0 by default

2009-06-09 Thread Andrew Morton
On Tue, 9 Jun 2009 07:02:14 -0500
Robin Holt h...@sgi.com wrote:

 On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
  On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
   On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
   
   Let me start by saying I agree completely with everything you wrote and
   still disagree with this patch, but was willing to compromise and work
   around this for our upcoming x86_64 machine by putting a value add
   into our packaging of adding a sysctl that turns reclaim back on.
   
  
  To be honest, I'm more leaning towards a NACK than an ACK on this one. I
  don't support enough NUMA machines to feel strongly enough about it but
  unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
  might be there seems ill-advised to me and will have other consequences for
  existing more traditional x86-64 NUMA machines.
 
 I was sort-of planning on coming up with an x86_64 arch specific function
 for setting zone_reclaim_mode, but didn't like the direction things
 were going.
 
 Something to the effect of...
 --- 20090609.orig/mm/page_alloc.c   2009-06-09 06:51:34.0 -0500
 +++ 20090609/mm/page_alloc.c2009-06-09 06:55:00.160762069 -0500
 @@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
 while ((node = find_next_best_node(local_node, used_mask)) = 0) {
 int distance = node_distance(local_node, node);
  
 -   /*
 -* If another node is sufficiently far away then it is better
 -* to reclaim pages in a zone before going off node.
 -*/
 -   if (distance  RECLAIM_DISTANCE)
 -   zone_reclaim_mode = 1;
 +   zone_reclaim_mode = arch_zone_reclaim_mode(distance);
  
 /*
  * We don't want to pressure a particular node.
 
 And then letting each arch define an arch_zone_reclaim_mode().  If other
 values are needed in the determination, we would add parameters to
 reflect this.
 
 For ia64, add
 
 static inline ia64_zone_reclaim_mode(int distance)
 {
   if (distance  15)
   return 1;
 }
 
 #define   arch_zone_reclaim_mode(_d)  ia64_zone_reclaim_mode(_d)
 
 
 Then, inside x86_64_zone_reclaim_mode(), I could make it something like
   if (distance  40 || is_uv_system())
   return 1;
 
 In the end, I didn't think this fight was worth fighting given how ugly
 this felt.  Upon second thought, I am beginning to think it is not that
 bad, but I also don't think it is that good either.
 

We've done worse before now...

Is it not possible to work out at runtime whether zone reclaim mode is
beneficial?

Given that zone_reclaim_mode is settable from initscripts, why all the
fuss?

Is anyone testing RECLAIM_WRITE and RECLAIM_SWAP, btw?

The root cause of this problem: having something called mode.  Any
time we put a mode in the kernel, we get in a mess trying to work out
when to set it and to what.

I think I'll drop this patch for now.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUILD FAILURE 01/04] Next June 04:PPC64 randconfig [drivers/staging/comedi/drivers.o]

2009-06-09 Thread Benjamin Herrenschmidt

  In any case, doing PAGE_KERNEL_NOCACHE for DMA memory is incorrect on
  many architectures. So at this stage, there's no much option but ifdef I
  suspect for now until this is fixed properly.
 
 Ok. But, i am not sure whether Greg will agree to this. If, Ok, is the
 following patch i sent earlier Ok ?
 
 http://lkml.org/lkml/2009/6/5/462,

Not really.

You probably want to use a constant (call it MY_DMA_MAP_PGPROT), and
in a header, you have a bunch of ifdef's that set it to PAGE_KERNEL,
PAGE_KERNEL_NOCACHE or PAGE_KERNEL_NC depending on what's needed.

Today, you can pretty much assume that

 - x86*, sparc*, ia64*, alpha, ... needs PAGE_KERNEL
 - powerpc needs PAGE_KERNEL if !CONFIG_NOT_COHERENT_CACHE
 - powerpc needs PAGE_KERNEL_NC if CONFIG_NOT_COHERENT_CACHE
 - ARM and MIPS, I think, needs PAGE_KERNEL_NOCACHE
 - ... others I don't know.

Cheers,
Ben.

 Regards--
 Subrata
 
  
  It does make sense to want to have some memory like that shared between
  user space and DMA, though I don't know what the right approach that
  works on all archs is at this stage. Worth asking the Alsa guys, I think
  they have similar issues :-)
  
  But doing double buffering might do the trick fine for now.
  
  Cheers,
  Ben.
  
  
  
  

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Info threads hangs in Linux-2.6.29 with KGDBOE

2009-06-09 Thread srikanth krishnakar
Hi All,

*ISSUE *: *Info threads* hangs in KGDBOE
Kernel : Linux-2.6.29
Bug found in Architectures: PowerPC (ppc32), x86
---

While trying to run kernel* Linux-2.6.29* on* PowerPC* Xilinx target
with *KGDBOE
*enabled.  Further issues arise when I run  *info threads* after
connecting to the target. following is the error:

(gdb) target remote udp:10.161.2.35:6443

warning: The remote protocol may be unreliable over UDP.

Some events may be lost, rendering further debugging impossible.

Remote debugging using udp:10.161.2.35:6443

kgdb_breakpoint () at kernel/kgdb.c:1803

1803arch_kgdb_breakpoint();

(gdb) info threads
[New Thread -2]
[New Thread 2]
[New Thread 3]
[New Thread 4]
[New Thread 5]
[New Thread 6]
[New Thread 59]
[New Thread 67]
[New Thread 101]
[New Thread 102]
[New Thread 103]
[New Thread 104]
[New Thread 105]
 14 Thread 105 (nfsiod)  __switch_to (prev=value optimized out,
new=0xcf89c100) at arch/powerpc/kernel/process.c:411
 13 Thread 104 (aio/0)  __switch_to (prev=value optimized out,
new=0xcf82f4e0) at arch/powerpc/kernel/process.c:411
 12 Thread 103 (kswapd0)  __switch_to (prev=value optimized out,
new=0xcf82f4e0) at arch/powerpc/kernel/process.c:411
 11 Thread 102 (pdflush)  __switch_to (prev=value optimized out,
new=0xcf82e880) at arch/powerpc/kernel/process.c:411
 10 Thread 101 (pdflush)  Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...
Ignoring packet error, continuing...

Finally kernel dies, after these error messages. This issue is not found
till Linux-2.6.28.10 kernel version, KGDBOE works fine in x86  PowerPC. Now
the bug is seen in x86 (32bit) and PowerPC from kernel version Linux-2.6.29

Hope this should not be raw_smp_processor_id issue ! The CPU ID returned in
both arch's is 0.  Which patch in netpoll* or any net device has caused this
issue.

One more thing to notice in x86 or PowerPC the kernel dies exactly after
reply of four threads/packets.


Thanks,
Srikanth Krishnakar
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] of_serial: Add UPF_FIXED_TYPE flag

2009-06-09 Thread Dave Mitchell
This patch adds the UPF_FIXED_TYPE flag which will bypass the
8250's autoconfig probe for uart type. The uart type identified
by the of_serial's parse of the flat device tree will be utilized
as defined.

Signed-off-by: Dave Mitchell dmitch...@amcc.com
---
 drivers/serial/of_serial.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/serial/of_serial.c b/drivers/serial/of_serial.c
index 14f8fa9..3f2027c 100644
--- a/drivers/serial/of_serial.c
+++ b/drivers/serial/of_serial.c
@@ -67,7 +67,7 @@ static int __devinit of_platform_serial_setup(struct 
of_device *ofdev,
port-type = type;
port-uartclk = *clk;
port-flags = UPF_SHARE_IRQ | UPF_BOOT_AUTOCONF | UPF_IOREMAP
-   | UPF_FIXED_PORT;
+   | UPF_FIXED_PORT | UPF_FIXED_TYPE;
port-dev = ofdev-dev;
/* If current-speed was set, then try not to change it. */
if (spd)
-- 
1.6.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-09 Thread Michael Ellerman
On Tue, 2009-06-09 at 15:59 +0200, Hannes Hering wrote:
 This patch implements toleration of dynamic memory operations and 16 GB
 gigantic pages. On module load the driver walks through available system
 memory, checks for available memory ranges and then registers the kernel
 internal memory region accordingly. The translation of address ranges is
 implemented via a 3-level busmap.

Hi Hannes,

For those of us who haven't read the HEA spec lately, can you give us
some more detail on that? :)

How does it interact with kexec/kdump?

 +static int ehca_update_busmap(unsigned long pfn, unsigned long nr_pages)
 +{
 + unsigned long i, start_section, end_section;
 + int top, dir, idx;
 +
 + if (!nr_pages)
 + return 0;
 +
 + if (!ehca_bmap) {
 + ehca_bmap = kmalloc(sizeof(struct ehca_bmap), GFP_KERNEL);
 + if (!ehca_bmap)
 + return -ENOMEM;
 + /* Set map block to 0xFF according to EHCA_INVAL_ADDR */
 + memset(ehca_bmap, 0xFF, EHCA_TOP_MAP_SIZE);
 + }
 +
 + start_section = phys_to_abs(pfn * PAGE_SIZE) / EHCA_SECTSIZE;
 + end_section = phys_to_abs((pfn + nr_pages) * PAGE_SIZE) / EHCA_SECTSIZE;


phys_to_abs() ? As below, or does it come from somewhere else?

 arch/powerpc/include/asm/abs_addr.h:
 47 static inline unsigned long phys_to_abs(unsigned long pa)   
 48 {
 49 unsigned long chunk;
 50 
 51 /* This is a no-op on non-iSeries */
 52 if (!firmware_has_feature(FW_FEATURE_ISERIES))
 53 return pa;
 54 
 55 chunk = addr_to_chunk(pa);
 56 
 57 if (chunk  mschunks_map.num_chunks)
 58 chunk = mschunks_map.mapping[chunk];
 59 
 60 return chunk_to_addr(chunk) + (pa  MSCHUNKS_OFFSET_MASK);
 61 }


cheers


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [BUILD FAILURE 02/04] Next June 04:PPC64 randconfig [drivers/usb/host/ohci-hcd.o]

2009-06-09 Thread David Brownell
On Friday 05 June 2009, Subrata Modak wrote:
 Correct, it fixes the issue. However, since few changes might have gone
 to the Kconfig, the patch does not apply cleanly. Below is the patch, just
 a retake of the earlier one, but on the latest code. 

And it got mangled a bit along the way.  Plus, the original one
goofed up Kconfig dependency displays ... both issues fixed in
this version, against current mainline GIT.

If someone can verify all four PPC/OF/OHCI configs build on
on PPC64, I'm OK with it.

- Dave


== CUT HERE
From: Arnd Bergmann a...@arndb.de
Subject: fix build failure for PPC64 randconfig [usb/ohci]

We could just make the USB_OHCI_HCD_PPC_OF option implicit
and selected only if at least one of USB_OHCI_HCD_PPC_OF_BE
and USB_OHCI_HCD_PPC_OF_LE are set.

[ dbrown...@users.sourceforge.net: fix patch manglation and dependencies ]

Signed-off-by: Arnd Bergmann a...@arndb.de
Resent-by: Subrata Modak subr...@linux.vnet.ibm.com
Signed-off-by: David Brownell dbrown...@users.sourceforge.net
---
 drivers/usb/host/Kconfig |   29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

--- a/drivers/usb/host/Kconfig
+++ b/drivers/usb/host/Kconfig
@@ -180,26 +180,27 @@ config USB_OHCI_HCD_PPC_SOC
  Enables support for the USB controller on the MPC52xx or
  STB03xxx processor chip.  If unsure, say Y.
 
-config USB_OHCI_HCD_PPC_OF
-   bool OHCI support for PPC USB controller on OF platform bus
-   depends on USB_OHCI_HCD  PPC_OF
-   default y
-   ---help---
- Enables support for the USB controller PowerPC present on the
- OpenFirmware platform bus.
-
 config USB_OHCI_HCD_PPC_OF_BE
-   bool Support big endian HC
-   depends on USB_OHCI_HCD_PPC_OF
-   default y
+   bool OHCI support for OF platform bus (big endian)
+   depends on USB_OHCI_HCD  PPC_OF
select USB_OHCI_BIG_ENDIAN_DESC
select USB_OHCI_BIG_ENDIAN_MMIO
+   ---help---
+ Enables support for big-endian USB controllers present on the
+ OpenFirmware platform bus.
 
 config USB_OHCI_HCD_PPC_OF_LE
-   bool Support little endian HC
-   depends on USB_OHCI_HCD_PPC_OF
-   default n
+   bool OHCI support for OF platform bus (little endian)
+   depends on USB_OHCI_HCD  PPC_OF
select USB_OHCI_LITTLE_ENDIAN
+   ---help---
+ Enables support for little-endian USB controllers present on the
+ OpenFirmware platform bus.
+
+config USB_OHCI_HCD_PPC_OF
+   bool
+   depends on USB_OHCI_HCD  PPC_OF
+   default USB_OHCI_HCD_PPC_OF_BE || USB_OHCI_HCD_PPC_OF_LE
 
 config USB_OHCI_HCD_PCI
bool OHCI support for PCI-bus USB controllers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


next branch update

2009-06-09 Thread Benjamin Herrenschmidt
Hi !

I've updated my next branch with the following patches. We're getting
real close to the merge window now, so if something is missing, please
holler ASAP.

Cheers,
Ben.

Becky Bruce (1):
  powerpc: Add support for swiotlb on 32-bit

Benjamin Herrenschmidt (8):
  powerpc/mm: Fix some SMP issues with MMU context handling
  powerpc/mm: Fix a AB-BA deadlock scenario with nohash MMU context lock
  powerpc: Set init_bootmem_done on NUMA platforms as well
  powerpc: Move VMX and VSX asm code to vector.S
  powerpc: Introduce CONFIG_PPC_BOOK3S
  powerpc: Split exception handling out of head_64.S
  powerpc: Separate PACA fields for server CPUs
  powerpc: Shield code specific to 64-bit server processors

Grant Likely (1):
  powerpc/virtex: refactor intc driver and add support for i8259 cascading

John Linn (1):
  fbdev: Add PLB support and cleanup DCR in xilinxfb driver.

Roderick Colenbrander (3):
  powerpc/virtex: Add support for Xilinx PCI host bridge
  powerpc/virtex: Add Xilinx ML510 reference design support
  powerpc/virtex: Add ml510 reference design device tree

Roland McGrath (1):
  powerpc: Add PTRACE_SINGLEBLOCK support

Stephen Rothwell (4):
  powerpc/pseries: Fix warnings when printing resource_size_t
  powerpc/xmon: Remove unused variable in xmon.c
  powerpc: Fix warning when printing a resource_size_t
  powerpc/spufs: Remove unused error path



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: next branch update

2009-06-09 Thread Grant Likely
On Tue, Jun 9, 2009 at 9:14 PM, Benjamin Herrenschmidt
b...@kernel.crashing.org wrote:

 Hi !

 I've updated my next branch with the following patches. We're getting
 real close to the merge window now, so if something is missing, please
 holler ASAP.

Just these two; but I see you've got them marked under review:

http://patchwork.ozlabs.org/patch/28191/
http://patchwork.ozlabs.org/patch/27752/

g.

--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev