Re: [U-Boot] cuImage and multi image?

2010-09-23 Thread Shawn Jin
 Can you paste the whole log from the u-boot prompt?

 In the previous run the ramdisk image was corrupted because the single
 image was loaded at 0x80. But the boot message showed that the
 initrd image was at 0x0066c000-0x009ae825. So it was over the 8MB
 area.

 However after the load address was changed to 0x0400 (64MB), the
 ramdisk still seemed corrupted but with different error messages.

 = bootm
 ## Booting image at 0400 ...
   Image Name:   Linux-2.6.33.5
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:    4424922 Bytes =  4.2 MB
   Load Address: 0040
   Entry Point:  00400554
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
 Memory - 0x0 0x800 (128MB)
 ENET0: local-mac-address - 00:09:9b:01:58:64
 CPU clock-frequency - 0x7270e00 (120MHz)
 CPU timebase-frequency - 0x7270e0 (8MHz)
 CPU bus-frequency - 0x3938700 (60MHz)

 zImage starting: loaded at 0x0040 (sp: 0x07d1cbd0)
 Allocating 0x22a1e1 bytes for kernel ...
 gunzipping (0x - 0x0040c000:0x0066b0ac)...done 0x21c6c8 bytes
 Attached initrd image at 0x0066c000-0x009ae825
 initrd head: 0x1f8b0808

 Linux/PowerPC load: root=/dev/ram
 Finalizing device tree... flat tree at 0x9bb300
 Using my870 machine description
 Linux version 2.6.33.5 (sh...@ubuntu) (gcc version 4.2.2) #4 Tue Sep
 21 09:23:51 PDT 2010
 Found initrd at 0xc066c000:0xc09ae825

The following shows the boot message that the same kernel and the same
ramdisk were loaded separately. The difference is that when boot from
two separate images, the ramdisk is loaded to the top of RAM
(0x79d9000-0x7d1b825). While when booting from the single image, the
ramdisk is loaded to the place immediately after the uncompressed
kernel image (0x0066c000-0x009ae825). I'm not familiar with how the
kernel uses the memory. But it seems clear from this failure that the
kernel overwrites to where the initrd locates.

Anyone can shed some light on why the kernel would overwrite the
initrd area? BTW, if the initrd is small enough, the single image
method works well. Maybe we should have relocated the initrd to the
top of available ram just like u-boot's bootm?

= bootm 100 200
## Booting image at 0100 ...
   Image Name:   Linux-2.6.33.5
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:1040228 Bytes = 1015.8 kB
   Load Address: 0040
   Entry Point:  00400554
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
## Loading RAMDisk Image at 0200 ...
   Image Name:   16MB Ramdisk
   Image Type:   PowerPC Linux RAMDisk Image (gzip compressed)
   Data Size:3418149 Bytes =  3.3 MB
   Load Address: 
   Entry Point:  
   Verifying Checksum ... OK
   Loading Ramdisk to 079d9000, end 07d1b825 ... OK
Memory - 0x0 0x800 (128MB)
ENET0: local-mac-address - 00:09:9b:01:58:64
CPU clock-frequency - 0x7270e00 (120MHz)
CPU timebase-frequency - 0x7270e0 (8MHz)
CPU bus-frequency - 0x3938700 (60MHz)

zImage starting: loaded at 0x0040 (sp: 0x07d1cbd0)
Allocating 0x22a1e1 bytes for kernel ...
gunzipping (0x - 0x0040c000:0x0066b0ac)...done 0x21c6c8 bytes
Using loader supplied ramdisk at 0x79d9000-0x7d1b825
initrd head: 0x1f8b0808

Linux/PowerPC load: root=/dev/ram
Finalizing device tree... flat tree at 0x678300
Using my870 machine description
Linux version 2.6.33.5 (sh...@ubuntu) (gcc version 4.2.2) #4 Tue Sep
21 09:23:51 PDT 2010
Found initrd at 0xc79d9000:0xc7d1b825

Thanks,
-Shawn.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev



[PATCH 12/20] powerpc: change to new flag variables

2010-09-23 Thread matt mooney
Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y.

Signed-off-by: matt mooney m...@muteddisk.com
---
 arch/powerpc/kernel/vdso32/Makefile |6 +++---
 arch/powerpc/kernel/vdso64/Makefile |6 +++---
 arch/powerpc/kvm/Makefile   |2 +-
 arch/powerpc/lib/Makefile   |4 +---
 arch/powerpc/math-emu/Makefile  |2 +-
 arch/powerpc/mm/Makefile|4 +---
 arch/powerpc/oprofile/Makefile  |4 +---
 arch/powerpc/platforms/iseries/Makefile |2 +-
 arch/powerpc/platforms/pseries/Makefile |   11 +++
 arch/powerpc/sysdev/Makefile|4 +---
 arch/powerpc/xmon/Makefile  |4 +---
 11 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 51ead52..9a7946c 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -14,10 +14,10 @@ obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
 
 GCOV_PROFILE := n
 
-EXTRA_CFLAGS := -shared -fno-common -fno-builtin
-EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso32.so.1 \
+ccflags-y := -shared -fno-common -fno-builtin
+ccflags-y += -nostdlib -Wl,-soname=linux-vdso32.so.1 \
$(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
-EXTRA_AFLAGS := -D__VDSO32__ -s
+asflags-y := -D__VDSO32__ -s
 
 obj-y += vdso32_wrapper.o
 extra-y += vdso32.lds
diff --git a/arch/powerpc/kernel/vdso64/Makefile 
b/arch/powerpc/kernel/vdso64/Makefile
index 79da65d..8c500d8 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -9,10 +9,10 @@ obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
 
 GCOV_PROFILE := n
 
-EXTRA_CFLAGS := -shared -fno-common -fno-builtin
-EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso64.so.1 \
+ccflags-y := -shared -fno-common -fno-builtin
+ccflags-y += -nostdlib -Wl,-soname=linux-vdso64.so.1 \
$(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
-EXTRA_AFLAGS := -D__VDSO64__ -s
+asflags-y := -D__VDSO64__ -s
 
 obj-y += vdso64_wrapper.o
 extra-y += vdso64.lds
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index d45c818..4d68638 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -4,7 +4,7 @@
 
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
-EXTRA_CFLAGS += -Ivirt/kvm -Iarch/powerpc/kvm
+ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm
 
 common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
 
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 5bb89c8..e4b0c07 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -4,9 +4,7 @@
 
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
-ifeq ($(CONFIG_PPC64),y)
-EXTRA_CFLAGS   += -mno-minimal-toc
-endif
+ccflags-$(CONFIG_PPC64):= -mno-minimal-toc
 
 CFLAGS_REMOVE_code-patching.o = -pg
 CFLAGS_REMOVE_feature-fixups.o = -pg
diff --git a/arch/powerpc/math-emu/Makefile b/arch/powerpc/math-emu/Makefile
index 0c16ab9..7d1dba0 100644
--- a/arch/powerpc/math-emu/Makefile
+++ b/arch/powerpc/math-emu/Makefile
@@ -15,4 +15,4 @@ obj-$(CONFIG_SPE) += math_efp.o
 CFLAGS_fabs.o = -fno-builtin-fabs
 CFLAGS_math.o = -fno-builtin-fabs
 
-EXTRA_CFLAGS = -I. -Iinclude/math-emu -w
+ccflags-y = -I. -Iinclude/math-emu -w
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index ce68708..53102f3 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -4,9 +4,7 @@
 
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
-ifeq ($(CONFIG_PPC64),y)
-EXTRA_CFLAGS   += -mno-minimal-toc
-endif
+ccflags-$(CONFIG_PPC64):= -mno-minimal-toc
 
 obj-y  := fault.o mem.o pgtable.o gup.o \
   init_$(CONFIG_WORD_SIZE).o \
diff --git a/arch/powerpc/oprofile/Makefile b/arch/powerpc/oprofile/Makefile
index e219ca4..73456c4 100644
--- a/arch/powerpc/oprofile/Makefile
+++ b/arch/powerpc/oprofile/Makefile
@@ -1,8 +1,6 @@
 subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
 
-ifeq ($(CONFIG_PPC64),y)
-EXTRA_CFLAGS   += -mno-minimal-toc
-endif
+ccflags-$(CONFIG_PPC64):= -mno-minimal-toc
 
 obj-$(CONFIG_OPROFILE) += oprofile.o
 
diff --git a/arch/powerpc/platforms/iseries/Makefile 
b/arch/powerpc/platforms/iseries/Makefile
index ce01492..a7602b1 100644
--- a/arch/powerpc/platforms/iseries/Makefile
+++ b/arch/powerpc/platforms/iseries/Makefile
@@ -1,4 +1,4 @@
-EXTRA_CFLAGS   += -mno-minimal-toc
+ccflags-y  := -mno-minimal-toc
 
 obj-y += exception.o
 obj-y += hvlog.o hvlpconfig.o lpardata.o setup.o dt.o mf.o lpevents.o \
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 046ace9..7ee1599 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -1,10 +1,5 @@
-ifeq ($(CONFIG_PPC64),y)
-EXTRA_CFLAGS   += -mno-minimal-toc
-endif
-
-ifeq 

RE: Modifying mpc8308rdb.dts

2010-09-23 Thread Maria Johansen

 posting patches beats waiting for an indefinite amount of time :)

Kim

Well, yes I suppose so.. Only I have noticed that only 1-2 people post
these kind of patches, so I thought maybe there were some kind of
unwritten agreement that no one else were to tamper with their work.

I will dig into the documentation and figure out how to post patches
then.
Thanks :)

--
Maria
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list

2010-09-23 Thread Michael Ellerman
On Wed, 2010-09-22 at 16:04 +1000, Michael Neuling wrote:
 When irqblance attempts writes to the IPI smp_affinity (ie.
 /proc/irq/16/smp_affinity in the above example) it fails but irqbalance
 ignores currently ignores this.
 
 This patch catches these write fails and in this case adds that IRQ
 number to the banned IRQ list.  This will catch the above IPI case and
 any other IRQ where the SMP affinity can't be set.

Cool!

 Index: irqbalance/irqlist.c
 ===
 --- irqbalance.orig/irqlist.c
 +++ irqbalance/irqlist.c
 @@ -67,7 +67,7 @@
   DIR *dir;
   struct dirent *entry;
   char *c, *c2;
 - int nr , count = 0;
 + int nr , count = 0, can_set = 1;
   char buf[PATH_MAX];
   sprintf(buf, /proc/irq/%i, number);
   dir = opendir(buf);
 @@ -80,7 +80,7 @@
   size_t size = 0;
   FILE *file;
   sprintf(buf, /proc/irq/%i/smp_affinity, number);
 - file = fopen(buf, r);
 + file = fopen(buf, r+);
   if (!file)
   continue;
   if (getline(line, size, file)==0) {
 @@ -89,7 +89,14 @@
   continue;
   }
   cpumask_parse_user(line, strlen(line), irq-mask);
 - fclose(file);
 + /*
 +  * Check that we can write the affinity, if
 +  * not take it out of the list.
 +  */
 + if (fwrite(line, strlen(line) - 1, 1, file) == 0)

if (fputs(line, file) == EOF)

?

cheers



signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 12/20] powerpc: change to new flag variables

2010-09-23 Thread Stephen Rothwell
Hi Matt,

On Wed, 22 Sep 2010 23:51:09 -0700 matt mooney m...@muteddisk.com wrote:

 Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y.

This looks good.  One comment below ...

 --- a/arch/powerpc/platforms/pseries/Makefile
 +++ b/arch/powerpc/platforms/pseries/Makefile
 @@ -1,10 +1,5 @@
 -ifeq ($(CONFIG_PPC64),y)
 -EXTRA_CFLAGS += -mno-minimal-toc
 -endif
 -
 -ifeq ($(CONFIG_PPC_PSERIES_DEBUG),y)
 -EXTRA_CFLAGS += -DDEBUG
 -endif
 +ccflags-$(CONFIG_PPC64)  := -mno-minimal-toc
 +ccflags-$(CONFIG_PPC_PSERIES_DEBUG)  += -DDEBUG
  
  obj-y:= lpar.o hvCall.o nvram.o reconfig.o \
  setup.o iommu.o event_sources.o ras.o \
 @@ -23,7 +18,7 @@ obj-$(CONFIG_MEMORY_HOTPLUG)+= hotplug-memory.o
  obj-$(CONFIG_HVC_CONSOLE)+= hvconsole.o
  obj-$(CONFIG_HVCS)   += hvcserver.o
  obj-$(CONFIG_HCALL_STATS)+= hvCall_inst.o
 -obj-$(CONFIG_PHYP_DUMP)  += phyp_dump.o
 +obj-$(CONFIG_PHYP_DUMP)  += phyp_dump.o
  obj-$(CONFIG_CMM)+= cmm.o
  obj-$(CONFIG_DTL)+= dtl.o

This looks like a spurious extra hunk.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/


pgp107zH4uvNr.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list

2010-09-23 Thread Michael Neuling

  +   if (fwrite(line, strlen(line) - 1, 1, file) == 0)
 
 if (fputs(line, file) == EOF)

Good point thanks... new patch below

Mikey

irqbalance, powerpc: add IRQs without settable SMP affinity to banned list

On pseries powerpc, IPIs are registered with an IRQ number so
/proc/interrupts looks like this on a 2 core/2 thread machine:

   CPU0   CPU1   CPU2   CPU3
 16:316428232905141138794 983121   XICS Level   
 IPI
 18:2605674  0 304994  0   XICS Level   
 lan0
 30: 400057  0 169209  0   XICS Level   
 ibmvscsi
LOC: 133734  77250 106425  91951   Local timer interrupts
SPU:  0  0  0  0   Spurious interrupts
CNT:  0  0  0  0   Performance monitoring 
interrupts
MCE:  0  0  0  0   Machine check exceptions

Unfortunately this means irqbalance attempts to set the affinity of IPIs
which is not possible.  So in the above case, when irqbalance is in
performance mode due to heavy IPI, lan0 and ibmvscsi activity, it
sometimes attempts to put the IPIs on one core (CPU01) and lan0 and
ibmvscsi on the other core (CPU23).  This is suboptimal as we want lan0
and ibmvscsi to be on separate cores and IPIs to be ignored.

When irqblance attempts writes to the IPI smp_affinity (ie.
/proc/irq/16/smp_affinity in the above example) it fails but irqbalance
ignores currently ignores this.

This patch catches these write fails and in this case adds that IRQ
number to the banned IRQ list.  This will catch the above IPI case and
any other IRQ where the SMP affinity can't be set.

Tested on POWER6, POWER7 and x86.

Signed-off-by: Michael Neuling mi...@neuling.org

Index: irqbalance/irqlist.c
===
--- irqbalance.orig/irqlist.c
+++ irqbalance/irqlist.c
@@ -67,7 +67,7 @@
DIR *dir;
struct dirent *entry;
char *c, *c2;
-   int nr , count = 0;
+   int nr , count = 0, can_set = 1;
char buf[PATH_MAX];
sprintf(buf, /proc/irq/%i, number);
dir = opendir(buf);
@@ -80,7 +80,7 @@
size_t size = 0;
FILE *file;
sprintf(buf, /proc/irq/%i/smp_affinity, number);
-   file = fopen(buf, r);
+   file = fopen(buf, r+);
if (!file)
continue;
if (getline(line, size, file)==0) {
@@ -89,7 +89,14 @@
continue;
}
cpumask_parse_user(line, strlen(line), irq-mask);
-   fclose(file);
+   /*
+* Check that we can write the affinity, if
+* not take it out of the list.
+*/
+   if (fputs(line, file) == EOF)
+   can_set = 0;
+   if (fclose(file))
+   can_set = 0;
free(line);
} else if (strcmp(entry-d_name,allowed_affinity)==0) {
char *line = NULL;
@@ -122,7 +129,7 @@
count++;
 
/* if there is no choice in the allowed mask, don't bother to balance */
-   if (count2)
+   if ((count2) || (can_set == 0))
 irq-balance_level = BALANCE_NONE;

 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: MPC8641D PEX: programming OWBAR in Endpoint mode?

2010-09-23 Thread David Hagood
On Thu, 2010-09-23 at 05:21 +0200, Chen, Tiejun wrote:
  I can get the device to show up on the host's PCI bus, I can 
 
 This only ensure you can access the PCIe configure space.
Not quite: I can also read the BARs that I program, and the memory
behind them on the PPC.
 
  program the inbound ATMUs such that the BARS are updated when 
  the host (re-)scans them, but I cannot for the life of me get
 
 What value are configured to IntBound REGs?
I can program them at run time via sysfs on the PPC's side, so there is
no single set of values. However, I am pointing them at the PPC's RAM
space, and as I stated above, I can read the PPC's RAM from the host
side via the BARs.

 How do you configure OWS of PEXOWAR?
 
 I means you still access that if OWS is match the whole target memory
 size even when '0' is as the internal platform address.
As I understand it, not if the OWS is not correctly mapped on the PPC
side - the PEX outbound ATMU's OWBAR must be mapped to a region of the
PPCs address space that is also mapped to PEX in the LAW. The LAW does
NOT indicate that PPC address 0 is mapped to the PEX.

 
 Out_be32 should be fine for atmu REGs. And also you can refe to the
 function, setup_pci_atmu  setup_one_atmu, on the file,
 arch/powerpc/sysdev/fsl_pci.c, to know how to access atmu REGs. Often
 you should disable them, configure then enable/invoke atmu antry as
 normal configuring sequent.
I have tried disabling the outbound ATMU when I program it, with no
change.
I have looked at the functions you mention, and that is a part of my
confusion, as they aren't doing anything different than I am.
 
 Additionally I'm a bit afraid your initial phase :) As you know PCIe
 would be used as RC mode on Freescale PowerPC kernel. So I don't know if
 you also drop this path on your kernel to conflict each other :) 
I have tried doing this under a kernel built without PCI support with no
change.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop

2010-09-23 Thread Mikael Pettersson
Running modules_install from a newly built 2.6.36-rc5 kernel
on my 32-bit PowerMac results in:

WARNING: Module 
/lib/modules/2.6.36-rc5/kernel/drivers/i2c/busses/i2c-powermac.ko ignored, due 
to loop
WARNING: Loop detected: /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko 
needs of_i2c.ko which needs i2c-core.ko again!
WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko ignored, 
due to loop
WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-dev.ko ignored, 
due to loop
WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/of/of_i2c.ko ignored, 
due to loop
WARNING: Module /lib/modules/2.6.36-rc5/kernel/sound/ppc/snd-powermac.ko 
ignored, due to loop

 grep '.*I2C.*=' .config
CONFIG_OF_I2C=m
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_POWERMAC=m

I can't say exactly when this started, haven't built kernels on this
box in a while.

/Mikael
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [U-Boot] cuImage and multi image?

2010-09-23 Thread Chen, Tiejun
 -Original Message-
 From: Shawn Jin [mailto:shawnx...@gmail.com] 
 Sent: Thursday, September 23, 2010 4:23 AM
 To: Chen, Tiejun
 Cc: Scott Wood; ppcdev; uboot
 Subject: Re: [U-Boot] cuImage and multi image?
 
  I have a large ramdisk image. The size of the image itself 
 (i.e. the
  *.gz) is about 4MB. When the ramdisk was being decompressed
 
  Did you try to change link_address on the file, 
 arch/powerpc/boot/wrapper?
 
 No. I don't have to. Right? The link_address is still 0x40.

I means you can change link_address to other value according to the
Image size. Try set link_address='0x500'.

 
  Did you try boot the uImage and the ramdisk separately? For 
 example, you can boot this as the following command:
  # bootm ${kernel_addr} ${ramdisk_addr} ${fdt_addr}
 
 Mine is a cuImage. I'm pretty sure that my ramdisk is valid 
 when it's a separate image. I used bootm kernel_addr 
 ramdisk_addr to boot.
 
  Can you paste the whole log from the u-boot prompt?
 
 In the previous run the ramdisk image was corrupted because 
 the single image was loaded at 0x80. But the boot message 
 showed that the initrd image was at 0x0066c000-0x009ae825. So 
 it was over the 8MB area.
 
 However after the load address was changed to 0x0400 
 (64MB), the ramdisk still seemed corrupted but with different 
 error messages.

This should be the same reason, 'uncompression error'.

Cheers
Tiejun

 
 = bootm
 ## Booting image at 0400 ...
Image Name:   Linux-2.6.33.5
Image Type:   PowerPC Linux Kernel Image (gzip compressed)
Data Size:4424922 Bytes =  4.2 MB
Load Address: 0040
Entry Point:  00400554
Verifying Checksum ... OK
Uncompressing Kernel Image ... OK
 Memory - 0x0 0x800 (128MB)
 ENET0: local-mac-address - 00:09:9b:01:58:64 CPU 
 clock-frequency - 0x7270e00 (120MHz) CPU timebase-frequency 
 - 0x7270e0 (8MHz) CPU bus-frequency - 0x3938700 (60MHz)
 
 zImage starting: loaded at 0x0040 (sp: 0x07d1cbd0) 
 Allocating 0x22a1e1 bytes for kernel ...
 gunzipping (0x - 0x0040c000:0x0066b0ac)...done 
 0x21c6c8 bytes Attached initrd image at 0x0066c000-0x009ae825 
 initrd head: 0x1f8b0808
 
 Linux/PowerPC load: root=/dev/ram
 Finalizing device tree... flat tree at 0x9bb300 Using my870 
 machine description Linux version 2.6.33.5 (sh...@ubuntu) 
 (gcc version 4.2.2) #4 Tue Sep
 21 09:23:51 PDT 2010
 Found initrd at 0xc066c000:0xc09ae825
 Zone PFN ranges:
   DMA  0x - 0x8000
   Normal   0x8000 - 0x8000
 Movable zone start PFN for each node
 early_node_map[1] active PFN ranges
 0: 0x - 0x8000
 MMU: Allocated 72 bytes of context maps for 16 contexts Built 
 1 zonelists in Zone order, mobility grouping on.  Total 
 pages: 32512 Kernel command line: root=/dev/ram PID hash 
 table entries: 512 (order: -1, 2048 bytes) Dentry cache hash 
 table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash 
 table entries: 8192 (order: 3, 32768 bytes)
 Memory: 124072k/131072k available (2080k kernel code, 6836k 
 reserved, 84k data, 52k bss, 104k init) Kernel virtual memory layout:
   * 0xfffdf000..0xf000  : fixmap
   * 0xfde0..0xfe00  : consistent mem
   * 0xfddfa000..0xfde0  : early ioremap
   * 0xc900..0xfddfa000  : vmalloc  ioremap
 SLUB: Genslabs=12, HWalign=16, Order=0-3, MinObjects=0, 
 CPUs=1, Nodes=1 
 snipped
 
 RAMDISK: gzip image found at block 0
 uncompression error
 VFS: Mounted root (ext2 filesystem) readonly on device 1:0.
 Freeing unused kernel memory: 104k init
 EXT2-fs (ram0): error: ext2_check_page: bad entry in directory #336: :
 unaligned directory entry - offset=0, inode=74187384, rec_len=2081,
 name_len=126
 EXT2-fs (ram0): error: remounting filesystem read-only 
 attempt to access beyond end of device
 ram0: rw=0, want=156831968, limit=32768
 Buffer I/O error on device ram0, logical block 78415983 
 attempt to access beyond end of device
 ram0: rw=0, want=112233212, limit=32768
 Buffer I/O error on device ram0, logical block 56116605 
 attempt to access beyond end of device
 ram0: rw=0, want=6626681482, limit=32768 Buffer I/O error on 
 device ram0, logical block 3313340740 attempt to access 
 beyond end of device
 ram0: rw=0, want=184684282, limit=32768
 Buffer I/O error on device ram0, logical block 92342140 
 Kernel panic - not syncing: No init found.  Try passing init= 
 option to kernel.
 Call Trace:
 [c7821f30] [c0006cd8] show_stack+0x40/0x168 (unreliable) 
 [c7821f70] [c001cefc] panic+0x8c/0x178 [c7821fc0] [c00026d4] 
 init_post+0xe4/0xf4 [c7821fd0] [c01ee224] 
 kernel_init+0x108/0x130 [c7821ff0] [c000dcc0] 
 kernel_thread+0x4c/0x68 Rebooting in 180 seconds..
 
 Thanks,
 -Shawn.
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list

2010-09-23 Thread Neil Horman
On Thu, Sep 23, 2010 at 08:57:20PM +1000, Michael Neuling wrote:
 
   + if (fwrite(line, strlen(line) - 1, 1, file) == 0)
  
  if (fputs(line, file) == EOF)
 
 Good point thanks... new patch below
 
 Mikey
 
 irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
 
 On pseries powerpc, IPIs are registered with an IRQ number so
 /proc/interrupts looks like this on a 2 core/2 thread machine:
 
CPU0   CPU1   CPU2   CPU3
  16:316428232905141138794 983121   XICS Level 
IPI
  18:2605674  0 304994  0   XICS Level 
lan0
  30: 400057  0 169209  0   XICS Level 
ibmvscsi
 LOC: 133734  77250 106425  91951   Local timer interrupts
 SPU:  0  0  0  0   Spurious interrupts
 CNT:  0  0  0  0   Performance monitoring 
 interrupts
 MCE:  0  0  0  0   Machine check exceptions
 
 Unfortunately this means irqbalance attempts to set the affinity of IPIs
 which is not possible.  So in the above case, when irqbalance is in
 performance mode due to heavy IPI, lan0 and ibmvscsi activity, it
 sometimes attempts to put the IPIs on one core (CPU01) and lan0 and
 ibmvscsi on the other core (CPU23).  This is suboptimal as we want lan0
 and ibmvscsi to be on separate cores and IPIs to be ignored.
 
 When irqblance attempts writes to the IPI smp_affinity (ie.
 /proc/irq/16/smp_affinity in the above example) it fails but irqbalance
 ignores currently ignores this.
 
 This patch catches these write fails and in this case adds that IRQ
 number to the banned IRQ list.  This will catch the above IPI case and
 any other IRQ where the SMP affinity can't be set.
 
 Tested on POWER6, POWER7 and x86.
 
 Signed-off-by: Michael Neuling mi...@neuling.org
 
 Index: irqbalance/irqlist.c
 ===
 --- irqbalance.orig/irqlist.c
 +++ irqbalance/irqlist.c
 @@ -67,7 +67,7 @@
   DIR *dir;
   struct dirent *entry;
   char *c, *c2;
 - int nr , count = 0;
 + int nr , count = 0, can_set = 1;
   char buf[PATH_MAX];
   sprintf(buf, /proc/irq/%i, number);
   dir = opendir(buf);
 @@ -80,7 +80,7 @@
   size_t size = 0;
   FILE *file;
   sprintf(buf, /proc/irq/%i/smp_affinity, number);
 - file = fopen(buf, r);
 + file = fopen(buf, r+);
   if (!file)
   continue;
   if (getline(line, size, file)==0) {
 @@ -89,7 +89,14 @@
   continue;
   }
   cpumask_parse_user(line, strlen(line), irq-mask);
 - fclose(file);
 + /*
 +  * Check that we can write the affinity, if
 +  * not take it out of the list.
 +  */
 + if (fputs(line, file) == EOF)
 + can_set = 0;
This is maybe a nit, but writing to the affinity file can fail for a few
different reasons, some of them permanent, some transient.  For instance, if
we're in a memory constrained condition temporarily irq_affinity_proc_write
might return -ENOMEM.  Might it be better to modify this code so that, instead
of using fputs to merge the various errors into an EOF, we use some other write
method that lets us better determine the error and selectively ban the interrupt
only for those errors which we consider permanent?

Otherwise this looks fine to me.

Thanks
Neil

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: MPC8641D PEX: programming OWBAR in Endpoint mode?

2010-09-23 Thread Chen, Tiejun
 -Original Message-
 From: David Hagood [mailto:david.hag...@gmail.com] 
 Sent: Thursday, September 23, 2010 7:11 PM
 To: Chen, Tiejun; linuxppc-...@ozlabs.org
 Subject: RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
 
 On Thu, 2010-09-23 at 05:21 +0200, Chen, Tiejun wrote:
   I can get the device to show up on the host's PCI bus, I can
  
  This only ensure you can access the PCIe configure space.
 Not quite: I can also read the BARs that I program, and the 
 memory behind them on the PPC.

Absolutely. 

  
   program the inbound ATMUs such that the BARS are updated when the 
   host (re-)scans them, but I cannot for the life of me get
  
  What value are configured to IntBound REGs?
 I can program them at run time via sysfs on the PPC's side, 
 so there is no single set of values. However, I am pointing 
 them at the PPC's RAM space, and as I stated above, I can 
 read the PPC's RAM from the host side via the BARs.

I read your email again and something hint me. I notice you clarify you
already condigure InBound successfully. Right? If so I'm a bit confused.
For PCIe EP mode PEXIWBARs are not implemented in the memory-mapped
space. If you read any PEXIWBAR these registers always return zero
regardless of writing any value at first. 

You only can program 4 inbound BARs by type 0 configure action like
normal PCIe device.

 
  How do you configure OWS of PEXOWAR?
  
  I means you still access that if OWS is match the whole 
 target memory 
  size even when '0' is as the internal platform address.
 As I understand it, not if the OWS is not correctly mapped on 
 the PPC side - the PEX outbound ATMU's OWBAR must be mapped 
 to a region of the PPCs address space that is also mapped to 
 PEX in the LAW. The LAW does NOT indicate that PPC address 0 
 is mapped to the PEX.

If there is no any law entry for PCIe the kernel should trap machine
check when you access PCIe space. And as my above comment I'm afraid you
mix up InBound and OutBound on EP mode? So you always read zero from
your so-called OutBound? I means that should be PEXIWBAR in fact. I'm
not sure but you can check this.

 
  
  Out_be32 should be fine for atmu REGs. And also you can refe to the 
  function, setup_pci_atmu  setup_one_atmu, on the file, 
  arch/powerpc/sysdev/fsl_pci.c, to know how to access atmu 
 REGs. Often 
  you should disable them, configure then enable/invoke atmu antry as 
  normal configuring sequent.
 I have tried disabling the outbound ATMU when I program it, 
 with no change.
 I have looked at the functions you mention, and that is a 
 part of my confusion, as they aren't doing anything different 
 than I am.

I only means you can refer how to access these registers.

  
  Additionally I'm a bit afraid your initial phase :) As you 
 know PCIe 
  would be used as RC mode on Freescale PowerPC kernel. So I 
 don't know 
  if you also drop this path on your kernel to conflict each other :)
 I have tried doing this under a kernel built without PCI 
 support with no change.

Good.

Tiejun

 
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: MPC8641D PEX: programming OWBAR in Endpoint mode?

2010-09-23 Thread david . hagood
 -Original Message-
 via the BARs.

 I read your email again and something hint me. I notice you clarify you
 already condigure InBound successfully.

I am programming BOTH the inbound ATMUs to make PPC memory available to
the root complex, AND programming outbound ATMUs to enable the PPC to bus
master to the root complex's memory space on PCIe.

I am NOT attempting to program the IWBARs - as you noted, they get
programmed by the root complex via PCI config operations.


  And as my above comment I'm afraid you
 mix up InBound and OutBound on EP mode?

No, I am NOT confusing the two - that is why I am being VERY EXPLICIT
about accessing the OUTBOUND ATMUs.

The only reason I mention the inbound ATMUs is to demonstrate that the
physical layer is working.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


ppc44x - how do i optimize driver for tlb hits

2010-09-23 Thread Ayman El-Khashab
I've implemented a working driver on my 460EX.  it allocates a couple
of buffers of 4MB each.  I have a custom memcmp algorithm in asm that
is extremely fast in user space, but 1/2 as fast when run on these
buffers.

my tests are showing that the algorithm seems to be memory bandwidth
bound.  my guess is that i am having tlb or cache misses (my algo
uses the dbct) that is slowing performance.  curiously when in user
space, i can affect the performance by small changes in the size of
the buffer, i.e. 4MB + 32B is fast, 4MB + 4K is much worse.

Can i adjust my driver code that is using kmalloc to make sure that
the ppc44x has 4MB tlb entries for these and that they stay put?

thanks
ayman
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread Richard Cochran
Here is the sixth version of my patch set adding PTP hardware clock
support to the Linux kernel. The main difference to v5 is that the
character device interface has been replaced with one based on the
posix clock system calls.

The first three patches add necessary background support in the posix
clock code. The last five add the new PTP hardware clock features.
Previously, I had tried to present the posix clock changes all by
themselves, but commentators asked to see the whole context.

What follows is a rather lengthy discussion of the various design
issues.

Table of Contents
=
1 Introduction 
2 Previous Discussions 
3 Design Issues 
3.1 Clock Operations 
3.2 Character Device vs System Calls 
3.2.1 Using the POSIX Clock API 
3.2.2 Tuning a POSIX Clock 
3.2.3 Dynamic POSIX Clock IDs 
3.3 Synchronizing the Linux System Time 
3.4 Ancillary PHC Operations 
3.5 User timers 
4 Drivers 
4.1 Supported Hardware Clocks 
4.2 Open Driver Issues 
4.2.1 DP83640 
4.2.2 IXP465 


1 Introduction 
~~~

  The aim of this patch set is to add support for PTP hardware clocks
  into the Linux kernel. In the following description, we use the
  abbreviation PHC to mean PTP hardware clock. 

  Support for obtaining timestamps from a PHC already exists via the
  SO_TIMESTAMPING socket option, integrated in kernel version 2.6.30.
  This patch set completes the picture by allow user space programs to
  adjust the PHC and to control its ancillary features.

2 Previous Discussions 
~~~

  This patch set previously appeared on the netdev list. Since V5 of
  the character device patch set, the discussion has moved to the
  lkml.

  - PTP hardware clock as a character device V5
[http://lkml.org/lkml/2010/8/16/90]

  - POSIX clock tuning syscall with static clock ids
[http://lkml.org/lkml/2010/8/23/49]

  - POSIX clock tuning syscall with dynamic clock ids
[http://lkml.org/lkml/2010/9/3/119]

3 Design Issues 


3.1 Clock Operations 
=

   Based on experience with several commercially available PHCs, we
   identified a set of essential operations and a set of ancillary
   operations.

   - Basic clock operations

 1. Set time
 2. Get time
 3. Shift the clock by a given offset atomically
 4. Adjust clock frequency

   - Ancillary clock features

 1. Time stamp external events
 2. Enable Linux PPS subsystem events
 3. Periodic output signals
 4. One shot or periodic alarms, with CPU interrupt

The patch set includes examples of the first two ancillary
features, and implementing the third point for a particular PHC is
fairly straightforward. The fourth point is discussed below.

3.2 Character Device vs System Calls 
=

   This patch set started out as a class driver that exposes the PHC
   as a character device with standardized ioctls. Since several clock
   operations in the ioctl interface mimic the POSIX clock API, the
   suggestion was made to expose the PHC as a new clockid_t.

   POSIX defines the CLOCK_REALTIME, CLOCK_MONOTONIC,
   CLOCK_PROCESS_CPUTIME_ID, and CLOCK_THREAD_CPUTIME_ID clock ids.
   As to other possible clock ids, the standard offers the following
   hint:

  An implementation may also support additional clocks. The
  interpretation of time values for these clocks is unspecified.

   So as far as the POSIX standard is concerned, offering a clock id
   to represent the PHC would be acceptable.

   From discussions on the lkml, a repeated wish was to ensure that
   any changes in the POSIX clock code would be general enough to
   support other new hardware clocks that might appear in the future,
   not just the particulars of PHCs.

3.2.1 Using the POSIX Clock API 


Looking at the mapping from PHC operation to the POSIX clock API,
we see that two of the basic clock operations, marked with *, have
no POSIX equivalent. The items marked NA are peculiar to PHCs and
will be discussed separately, below.

  Clock Operation   POSIX function   
 -+-
  Set time  clock_gettime
  Get time  clock_settime
  Shift the clock   *
  Adjust clock frequency*
 -+-
  Time stamp external eventsNA   
  Enable PPS events NA   
  Periodic output signals   NA   
  One shot or periodic alarms   timer_create, timer_settime  

In contrast to the standard Linux system clock, a PHC is
adjustable in hardware, for example using 

[PATCH 2/8] posix clocks: dynamic clock ids.

2010-09-23 Thread Richard Cochran
This patch augments the POSIX clock code to offer a dynamic clock
creation method. Instead of registering a hard coded clock ID, modules
may call create_posix_clock(), which returns a new clock ID.

Signed-off-by: Richard Cochran richard.coch...@omicron.at
---
 include/linux/posix-timers.h |7 ++-
 include/linux/time.h |2 ++
 kernel/posix-timers.c|   41 ++---
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index abf61cc..08aa4da 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -68,6 +68,7 @@ struct k_itimer {
 };
 
 struct k_clock {
+   clockid_t id;
int res;/* in nanoseconds */
int (*clock_getres) (const clockid_t which_clock, struct timespec *tp);
int (*clock_set) (const clockid_t which_clock, struct timespec * tp);
@@ -86,7 +87,11 @@ struct k_clock {
   struct itimerspec * cur_setting);
 };
 
-void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock);
+/* Regsiter a posix clock with a well known clock id. */
+int register_posix_clock(const clockid_t id, struct k_clock *clock);
+
+/* Create a new posix clock with a dynamic clock id. */
+clockid_t create_posix_clock(struct k_clock *clock);
 
 /* error handlers for timer_create, nanosleep and settime */
 int do_posix_clock_nonanosleep(const clockid_t, int flags, struct timespec *,
diff --git a/include/linux/time.h b/include/linux/time.h
index 9f15ac7..914c48d 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -299,6 +299,8 @@ struct itimerval {
 #define CLOCKS_MASK(CLOCK_REALTIME | CLOCK_MONOTONIC)
 #define CLOCKS_MONOCLOCK_MONOTONIC
 
+#define CLOCK_INVALID  -1
+
 /*
  * The various flags for setting POSIX.1b interval timers:
  */
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 446b566..67fba5c 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -132,6 +132,8 @@ static DEFINE_SPINLOCK(idr_lock);
  */
 
 static struct k_clock posix_clocks[MAX_CLOCKS];
+static DECLARE_BITMAP(clocks_map, MAX_CLOCKS);
+static DEFINE_MUTEX(clocks_mux); /* protects 'posix_clocks' and 'clocks_map' */
 
 /*
  * These ones are defined below.
@@ -484,18 +486,43 @@ static struct pid *good_sigevent(sigevent_t * event)
return task_pid(rtn);
 }
 
-void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
+int register_posix_clock(const clockid_t id, struct k_clock *clock)
 {
-   if ((unsigned) clock_id = MAX_CLOCKS) {
-   printk(POSIX clock register failed for clock_id %d\n,
-  clock_id);
-   return;
-   }
+   struct k_clock *kc;
+   int err = 0;
 
-   posix_clocks[clock_id] = *new_clock;
+   mutex_lock(clocks_mux);
+   if (test_bit(id, clocks_map)) {
+   pr_err(clock_id %d already registered\n, id);
+   err = -EBUSY;
+   goto out;
+   }
+   kc = posix_clocks[id];
+   *kc = *clock;
+   kc-id = id;
+   set_bit(id, clocks_map);
+out:
+   mutex_unlock(clocks_mux);
+   return err;
 }
 EXPORT_SYMBOL_GPL(register_posix_clock);
 
+clockid_t create_posix_clock(struct k_clock *clock)
+{
+   clockid_t id;
+
+   mutex_lock(clocks_mux);
+   id = find_first_zero_bit(clocks_map, MAX_CLOCKS);
+   mutex_unlock(clocks_mux);
+
+   if (id  MAX_CLOCKS) {
+   register_posix_clock(id, clock);
+   return id;
+   }
+   return CLOCK_INVALID;
+}
+EXPORT_SYMBOL_GPL(create_posix_clock);
+
 static struct k_itimer * alloc_posix_timer(void)
 {
struct k_itimer *tmr;
-- 
1.7.0.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/8] posix clocks: introduce a syscall for clock tuning.

2010-09-23 Thread Richard Cochran
A new syscall is introduced that allows tuning of a POSIX clock. The
syscall is implemented for four architectures: arm, blackfin, powerpc,
and x86.

The new syscall, clock_adjtime, takes two parameters, the clock ID,
and a pointer to a struct timex. The semantics of the timex struct
have been expanded by one additional mode flag, which allows an
absolute offset correction. When specificied, the clock offset is
immediately corrected by adding the given time value to the current
time value.

Signed-off-by: Richard Cochran richard.coch...@omicron.at
---
 arch/arm/include/asm/unistd.h  |1 +
 arch/arm/kernel/calls.S|1 +
 arch/blackfin/include/asm/unistd.h |3 +-
 arch/blackfin/mach-common/entry.S  |1 +
 arch/powerpc/include/asm/systbl.h  |1 +
 arch/powerpc/include/asm/unistd.h  |3 +-
 arch/x86/ia32/ia32entry.S  |1 +
 arch/x86/include/asm/unistd_32.h   |3 +-
 arch/x86/include/asm/unistd_64.h   |2 +
 arch/x86/kernel/syscall_table_32.S |1 +
 include/linux/posix-timers.h   |3 +
 include/linux/syscalls.h   |2 +
 include/linux/timex.h  |3 +-
 kernel/compat.c|  136 +++-
 kernel/posix-cpu-timers.c  |4 +
 kernel/posix-timers.c  |   17 +
 16 files changed, 130 insertions(+), 52 deletions(-)

diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index c891eb7..f58d881 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -396,6 +396,7 @@
 #define __NR_fanotify_init (__NR_SYSCALL_BASE+367)
 #define __NR_fanotify_mark (__NR_SYSCALL_BASE+368)
 #define __NR_prlimit64 (__NR_SYSCALL_BASE+369)
+#define __NR_clock_adjtime (__NR_SYSCALL_BASE+370)
 
 /*
  * The following SWIs are ARM private.
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 5c26ecc..430de4c 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -379,6 +379,7 @@
CALL(sys_fanotify_init)
CALL(sys_fanotify_mark)
CALL(sys_prlimit64)
+/* 370 */  CALL(sys_clock_adjtime)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3)  ~3) - NR_syscalls
 #define syscalls_counted
diff --git a/arch/blackfin/include/asm/unistd.h 
b/arch/blackfin/include/asm/unistd.h
index 14fcd25..79ad99b 100644
--- a/arch/blackfin/include/asm/unistd.h
+++ b/arch/blackfin/include/asm/unistd.h
@@ -392,8 +392,9 @@
 #define __NR_fanotify_init 371
 #define __NR_fanotify_mark 372
 #define __NR_prlimit64 373
+#define __NR_clock_adjtime 374
 
-#define __NR_syscall   374
+#define __NR_syscall   375
 #define NR_syscalls__NR_syscall
 
 /* Old optional stuff no one actually uses */
diff --git a/arch/blackfin/mach-common/entry.S 
b/arch/blackfin/mach-common/entry.S
index af1bffa..ee68730 100644
--- a/arch/blackfin/mach-common/entry.S
+++ b/arch/blackfin/mach-common/entry.S
@@ -1631,6 +1631,7 @@ ENTRY(_sys_call_table)
.long _sys_fanotify_init
.long _sys_fanotify_mark
.long _sys_prlimit64
+   .long _sys_clock_adjtime
 
.rept NR_syscalls-(.-_sys_call_table)/4
.long _sys_ni_syscall
diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 3d21266..2485d8f 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -329,3 +329,4 @@ COMPAT_SYS(rt_tgsigqueueinfo)
 SYSCALL(fanotify_init)
 COMPAT_SYS(fanotify_mark)
 SYSCALL_SPU(prlimit64)
+COMPAT_SYS_SPU(clock_adjtime)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index 597e6f9..85d5067 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -348,10 +348,11 @@
 #define __NR_fanotify_init 323
 #define __NR_fanotify_mark 324
 #define __NR_prlimit64 325
+#define __NR_clock_adjtime 326
 
 #ifdef __KERNEL__
 
-#define __NR_syscalls  326
+#define __NR_syscalls  327
 
 #define __NR__exit __NR_exit
 #define NR_syscalls__NR_syscalls
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 518bb99..0ed7896 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -851,4 +851,5 @@ ia32_sys_call_table:
.quad sys_fanotify_init
.quad sys32_fanotify_mark
.quad sys_prlimit64 /* 340 */
+   .quad compat_sys_clock_adjtime
 ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index b766a5e..b6f73f1 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -346,10 +346,11 @@
 #define __NR_fanotify_init 338
 #define __NR_fanotify_mark 339
 #define __NR_prlimit64 340
+#define __NR_clock_adjtime 341
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 341
+#define NR_syscalls 342
 
 #define 

[PATCH 3/8] posix clocks: introduce a sysfs presence.

2010-09-23 Thread Richard Cochran
This patch adds a 'timesource' class into sysfs. Each registered POSIX
clock appears by name under /sys/class/timesource. The idea is to
expose to user space the dynamic mapping between clock devices and
clock IDs.

Signed-off-by: Richard Cochran richard.coch...@omicron.at
---
 Documentation/ABI/testing/sysfs-timesource |   24 
 drivers/char/mmtimer.c |1 +
 include/linux/posix-timers.h   |4 +++
 kernel/posix-cpu-timers.c  |2 +
 kernel/posix-timers.c  |   40 
 5 files changed, 71 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-timesource

diff --git a/Documentation/ABI/testing/sysfs-timesource 
b/Documentation/ABI/testing/sysfs-timesource
new file mode 100644
index 000..f991de2
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-timesource
@@ -0,0 +1,24 @@
+What:  /sys/class/timesource/
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This directory contains files and directories
+   providing a standardized interface to the available
+   time sources.
+
+What:  /sys/class/timesource/name/
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This directory contains the attributes of a time
+   source registered with the POSIX clock subsystem.
+
+What:  /sys/class/timesource/name/id
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This file contains the clock ID (a non-negative
+   integer) of the named time source registered with the
+   POSIX clock subsystem. This value may be passed as the
+   first argument to the POSIX clock and timer system
+   calls. See man CLOCK_GETRES(2) and TIMER_CREATE(2).
diff --git a/drivers/char/mmtimer.c b/drivers/char/mmtimer.c
index ea7c99f..e9173e3 100644
--- a/drivers/char/mmtimer.c
+++ b/drivers/char/mmtimer.c
@@ -758,6 +758,7 @@ static int sgi_timer_set(struct k_itimer *timr, int flags,
 }
 
 static struct k_clock sgi_clock = {
+   .name = sgi_cycle,
.res = 0,
.clock_set = sgi_clock_set,
.clock_get = sgi_clock_get,
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 08aa4da..64e6fee 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -67,7 +67,11 @@ struct k_itimer {
} it;
 };
 
+#define KCLOCK_MAX_NAME 32
+
 struct k_clock {
+   char name[KCLOCK_MAX_NAME];
+   struct device *dev;
clockid_t id;
int res;/* in nanoseconds */
int (*clock_getres) (const clockid_t which_clock, struct timespec *tp);
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index e1c2e7b..df9cbab 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -1611,6 +1611,7 @@ static long thread_cpu_nsleep_restart(struct 
restart_block *restart_block)
 static __init int init_posix_cpu_timers(void)
 {
struct k_clock process = {
+   .name = process_cputime,
.clock_getres = process_cpu_clock_getres,
.clock_get = process_cpu_clock_get,
.clock_set = do_posix_clock_nosettime,
@@ -1619,6 +1620,7 @@ static __init int init_posix_cpu_timers(void)
.nsleep_restart = process_cpu_nsleep_restart,
};
struct k_clock thread = {
+   .name = thread_cputime,
.clock_getres = thread_cpu_clock_getres,
.clock_get = thread_cpu_clock_get,
.clock_set = do_posix_clock_nosettime,
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 67fba5c..719aa11 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -46,6 +46,7 @@
 #include linux/wait.h
 #include linux/workqueue.h
 #include linux/module.h
+#include linux/device.h
 
 /*
  * Management arrays for POSIX timers.  Timers are kept in slab memory
@@ -135,6 +136,8 @@ static struct k_clock posix_clocks[MAX_CLOCKS];
 static DECLARE_BITMAP(clocks_map, MAX_CLOCKS);
 static DEFINE_MUTEX(clocks_mux); /* protects 'posix_clocks' and 'clocks_map' */
 
+static struct class *timesource_class;
+
 /*
  * These ones are defined below.
  */
@@ -271,20 +274,40 @@ static int posix_get_coarse_res(const clockid_t 
which_clock, struct timespec *tp
*tp = ktime_to_timespec(KTIME_LOW_RES);
return 0;
 }
+
+/*
+ * sysfs attributes
+ */
+
+static ssize_t show_clock_id(struct device *dev,
+struct device_attribute *attr, char *page)
+{
+   struct k_clock *kc = dev_get_drvdata(dev);
+   return snprintf(page, PAGE_SIZE-1, %d\n, kc-id);
+}
+
+static struct device_attribute timesource_dev_attrs[] = {
+   __ATTR(id,   0444, show_clock_id,   NULL),
+   

[PATCH 4/8] ptp: Added a brand new class driver for ptp clocks.

2010-09-23 Thread Richard Cochran
This patch adds an infrastructure for hardware clocks that implement
IEEE 1588, the Precision Time Protocol (PTP). A class driver offers a
registration method to particular hardware clock drivers. Each clock is
presented as a standard POSIX clock.

The ancillary clock features are exposed in two different ways, via
the sysfs and by a character device.

Signed-off-by: Richard Cochran richard.coch...@omicron.at
---
 Documentation/ABI/testing/sysfs-ptp |  107 ++
 Documentation/ptp/ptp.txt   |   94 +
 Documentation/ptp/testptp.c |  358 
 Documentation/ptp/testptp.mk|   33 +++
 drivers/Kconfig |2 +
 drivers/Makefile|1 +
 drivers/ptp/Kconfig |   27 +++
 drivers/ptp/Makefile|6 +
 drivers/ptp/ptp_chardev.c   |  178 
 drivers/ptp/ptp_clock.c |  382 +++
 drivers/ptp/ptp_private.h   |   64 ++
 drivers/ptp/ptp_sysfs.c |  235 +
 include/linux/Kbuild|1 +
 include/linux/ptp_clock.h   |   79 +++
 include/linux/ptp_clock_kernel.h|  139 +
 15 files changed, 1706 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-ptp
 create mode 100644 Documentation/ptp/ptp.txt
 create mode 100644 Documentation/ptp/testptp.c
 create mode 100644 Documentation/ptp/testptp.mk
 create mode 100644 drivers/ptp/Kconfig
 create mode 100644 drivers/ptp/Makefile
 create mode 100644 drivers/ptp/ptp_chardev.c
 create mode 100644 drivers/ptp/ptp_clock.c
 create mode 100644 drivers/ptp/ptp_private.h
 create mode 100644 drivers/ptp/ptp_sysfs.c
 create mode 100644 include/linux/ptp_clock.h
 create mode 100644 include/linux/ptp_clock_kernel.h

diff --git a/Documentation/ABI/testing/sysfs-ptp 
b/Documentation/ABI/testing/sysfs-ptp
new file mode 100644
index 000..47142ce
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-ptp
@@ -0,0 +1,107 @@
+What:  /sys/class/ptp/
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This directory contains files and directories
+   providing a standardized interface to the ancillary
+   features of PTP hardware clocks.
+
+What:  /sys/class/ptp/ptpN/
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This directory contains the attributes of the Nth PTP
+   hardware clock registered into the PTP class driver
+   subsystem.
+
+What:  /sys/class/ptp/ptpN/clock_id
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This file contains the POSIX clock ID (a non-negative
+   integer) corresponding to the PTP hardware clock. This
+   value may be passed as the first argument to the POSIX
+   clock and timer system calls. See man CLOCK_GETRES(2)
+   and TIMER_CREATE(2).
+
+What:  /sys/class/ptp/ptpN/clock_name
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This file contains the name of the PTP hardware clock
+   as a human readable string.
+
+What:  /sys/class/ptp/ptpN/max_adjustment
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This file contains the PTP hardware clock's maximum
+   frequency adjustment value (a positive integer) in
+   parts per billion.
+
+What:  /sys/class/ptp/ptpN/n_alarms
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This file contains the number of periodic or one shot
+   alarms offer by the PTP hardware clock.
+
+What:  /sys/class/ptp/ptpN/n_external_timestamps
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This file contains the number of external timestamp
+   channels offered by the PTP hardware clock.
+
+What:  /sys/class/ptp/ptpN/n_periodic_outputs
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This file contains the number of programmable periodic
+   output channels offered by the PTP hardware clock.
+
+What:  /sys/class/ptp/ptpN/pps_avaiable
+Date:  September 2010
+Contact:   Richard Cochran richardcoch...@gmail.com
+Description:
+   This file indicates whether the PTP hardware clock
+   supports a Pulse Per Second to the host CPU. Reading
+   1 means that the PPS is supported, while 0 means
+   not supported.
+

[PATCH 5/8] ptp: Added a simulated PTP hardware clock.

2010-09-23 Thread Richard Cochran
This patch adds a driver that simulates a PTP hardware clock. The
driver serves as a simple example for writing real clock driver and
can be used for testing the PTP clock API.

The basic clock operations are implemented using the system clock,
and the ancillary clock operations are simulated.

Signed-off-by: Richard Cochran richard.coch...@omicron.at
---
 drivers/ptp/Kconfig |   14 
 drivers/ptp/Makefile|1 +
 drivers/ptp/ptp_linux.c |  165 +++
 kernel/time/ntp.c   |2 +
 4 files changed, 182 insertions(+), 0 deletions(-)
 create mode 100644 drivers/ptp/ptp_linux.c

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 17be208..94f329f 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -24,4 +24,18 @@ config PTP_1588_CLOCK
  To compile this driver as a module, choose M here: the module
  will be called ptp.
 
+config PTP_1588_CLOCK_LINUX
+   tristate Simulated PTP clock
+   depends on PTP_1588_CLOCK
+   help
+ This driver adds support for a simulated PTP clock. It
+ implements the basic clock operations by using the standard
+ Linux system time. The driver simulates the ancillary clock
+ operations. This clock can be used to test PTP programs
+ provided they use software time stamps for the PTP Ethernet
+ packets.
+
+ To compile this driver as a module, choose M here: the module
+ will be called ptp_linux.
+
 endmenu
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index 480e2af..266d4f2 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -4,3 +4,4 @@
 
 ptp-y  := ptp_clock.o ptp_chardev.o ptp_sysfs.o
 obj-$(CONFIG_PTP_1588_CLOCK)   += ptp.o
+obj-$(CONFIG_PTP_1588_CLOCK_LINUX) += ptp_linux.o
diff --git a/drivers/ptp/ptp_linux.c b/drivers/ptp/ptp_linux.c
new file mode 100644
index 000..57b3da4
--- /dev/null
+++ b/drivers/ptp/ptp_linux.c
@@ -0,0 +1,165 @@
+/*
+ * PTP 1588 clock using the Linux system clock
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include linux/device.h
+#include linux/err.h
+#include linux/hrtimer.h
+#include linux/init.h
+#include linux/kernel.h
+#include linux/module.h
+#include linux/timex.h
+
+#include linux/ptp_clock_kernel.h
+
+static struct ptp_clock *linux_clock;
+
+DEFINE_SPINLOCK(adjtime_lock);
+
+static int ptp_linux_adjfreq(void *priv, s32 ppb)
+{
+   struct timex txc;
+   s64 tmp = ppb;
+   int err;
+   pr_debug(ptp_linux: adjfreq ppb=%d\n, ppb);
+   txc.freq = div_s64(tmp16, 1000);
+   txc.modes = ADJ_FREQUENCY;
+   err = do_adjtimex(txc);
+   return err  0 ? err : 0;
+}
+
+static int ptp_linux_adjtime(void *priv, struct timespec *ts)
+{
+   s64 delta;
+   ktime_t now;
+   struct timespec t2;
+   unsigned long flags;
+   int err;
+
+   delta = 10LL * ts-tv_sec + ts-tv_nsec;
+
+   spin_lock_irqsave(adjtime_lock, flags);
+
+   now = ktime_get_real();
+
+   now = delta  0 ? ktime_sub_ns(now, -delta) : ktime_add_ns(now, delta);
+
+   t2 = ktime_to_timespec(now);
+
+   err = do_settimeofday(t2);
+
+   spin_unlock_irqrestore(adjtime_lock, flags);
+
+   return err;
+}
+
+static int ptp_linux_gettime(void *priv, struct timespec *ts)
+{
+   getnstimeofday(ts);
+   return 0;
+}
+
+static int ptp_linux_settime(void *priv, struct timespec *ts)
+{
+   return do_settimeofday(ts);
+}
+
+#define sim(x...) pr_warn(ptp_linux simulation:  x)
+
+static int ptp_linux_enable(void *priv, struct ptp_clock_request *rq, int on)
+{
+   struct ptp_clock_event event;
+   ktime_t kt;
+   int i;
+
+   switch (rq-type) {
+
+   case PTP_CLK_REQ_EXTTS:
+   if (on) {
+   sim(enable external timestamped events\n);
+   for (i = 0; i  100; i++) {
+   kt = ktime_get_real();
+   event.type = PTP_CLOCK_EXTTS;
+   event.index = 0;
+   event.timestamp = ktime_to_ns(kt);
+   ptp_clock_event(linux_clock, event);
+   }
+

[PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.

2010-09-23 Thread Richard Cochran
The eTSEC includes a PTP clock with quite a few features. This patch adds
support for the basic clock adjustment functions, plus two external time
stamps, one alarm, and the PPS callback.

Signed-off-by: Richard Cochran richard.coch...@omicron.at
---
 Documentation/powerpc/dts-bindings/fsl/tsec.txt |   57 +++
 arch/powerpc/boot/dts/mpc8313erdb.dts   |   14 +
 arch/powerpc/boot/dts/mpc8572ds.dts |   14 +
 arch/powerpc/boot/dts/p2020ds.dts   |   14 +
 arch/powerpc/boot/dts/p2020rdb.dts  |   14 +
 drivers/net/Makefile|1 +
 drivers/net/gianfar_ptp.c   |  447 +++
 drivers/net/gianfar_ptp_reg.h   |  113 ++
 drivers/ptp/Kconfig |   13 +
 9 files changed, 687 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/gianfar_ptp.c
 create mode 100644 drivers/net/gianfar_ptp_reg.h

diff --git a/Documentation/powerpc/dts-bindings/fsl/tsec.txt 
b/Documentation/powerpc/dts-bindings/fsl/tsec.txt
index edb7ae1..f6edbb8 100644
--- a/Documentation/powerpc/dts-bindings/fsl/tsec.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/tsec.txt
@@ -74,3 +74,60 @@ Example:
interrupt-parent = mpic;
phy-handle = phy0
};
+
+* Gianfar PTP clock nodes
+
+General Properties:
+
+  - compatible   Should be fsl,etsec-ptp
+  - reg  Offset and length of the register set for the device
+  - interrupts   There should be at least two interrupts. Some devices
+ have as many as four PTP related interrupts.
+
+Clock Properties:
+
+  - tclk-period  Timer reference clock period in nanoseconds.
+  - tmr-prsc Prescaler, divides the output clock.
+  - tmr-add  Frequency compensation value.
+  - cksel0= external clock, 1= eTSEC system clock, 3= RTC clock input.
+ Currently the driver only supports choice 1.
+  - tmr-fiper1   Fixed interval period pulse generator.
+  - tmr-fiper2   Fixed interval period pulse generator.
+  - max-adj  Maximum frequency adjustment in parts per billion.
+
+  These properties set the operational parameters for the PTP
+  clock. You must choose these carefully for the clock to work right.
+  Here is how to figure good values:
+
+  TimerOsc = system clock   MHz
+  tclk_period  = desired clock period   nanoseconds
+  NominalFreq  = 1000 / tclk_period MHz
+  FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0)
+  tmr_add  = ceil(2^32 / FreqDivRatio)
+  OutputClock  = NominalFreq / tmr_prsc MHz
+  PulseWidth   = 1 / OutputClockmicroseconds
+  FiperFreq1   = desired frequency in Hz
+  FiperDiv1= 100 * OutputClock / FiperFreq1
+  tmr_fiper1   = tmr_prsc * tclk_period * FiperDiv1 - tclk_period
+  max_adj  = 10 * (FreqDivRatio - 1.0) - 1
+
+  The calculation for tmr_fiper2 is the same as for tmr_fiper1. The
+  driver expects that tmr_fiper1 will be correctly set to produce a 1
+  Pulse Per Second (PPS) signal, since this will be offered to the PPS
+  subsystem to synchronize the Linux clock.
+
+Example:
+
+   ptp_cl...@24e00 {
+   compatible = fsl,etsec-ptp;
+   reg = 0x24E00 0xB0;
+   interrupts = 12 0x8 13 0x8;
+   interrupt-parent =  ipic ;
+   tclk-period = 10;
+   tmr-prsc= 100;
+   tmr-add = 0x99A4;
+   cksel   = 0x1;
+   tmr-fiper1  = 0x3B9AC9F6;
+   tmr-fiper2  = 0x00018696;
+   max-adj = 65998;
+   };
diff --git a/arch/powerpc/boot/dts/mpc8313erdb.dts 
b/arch/powerpc/boot/dts/mpc8313erdb.dts
index 183f2aa..85a7eaa 100644
--- a/arch/powerpc/boot/dts/mpc8313erdb.dts
+++ b/arch/powerpc/boot/dts/mpc8313erdb.dts
@@ -208,6 +208,20 @@
sleep = pmc 0x0030;
};
 
+   ptp_cl...@24e00 {
+   compatible = fsl,etsec-ptp;
+   reg = 0x24E00 0xB0;
+   interrupts = 12 0x8 13 0x8;
+   interrupt-parent =  ipic ;
+   tclk-period = 10;
+   tmr-prsc= 100;
+   tmr-add = 0x99A4;
+   cksel   = 0x1;
+   tmr-fiper1  = 0x3B9AC9F6;
+   tmr-fiper2  = 0x00018696;
+   max-adj = 65998;
+   };
+
enet0: ether...@24000 {
#address-cells = 1;
#size-cells = 1;
diff --git a/arch/powerpc/boot/dts/mpc8572ds.dts 
b/arch/powerpc/boot/dts/mpc8572ds.dts
index cafc128..74208cd 100644
--- a/arch/powerpc/boot/dts/mpc8572ds.dts
+++ b/arch/powerpc/boot/dts/mpc8572ds.dts
@@ -324,6 +324,20 @@
};
};
 
+   ptp_cl...@24e00 {
+   compatible 

[PATCH 7/8] ptp: Added a clock driver for the IXP46x.

2010-09-23 Thread Richard Cochran
This patch adds a driver for the hardware time stamping unit found on the
IXP465. The basic clock operations and an external trigger are implemented.

Signed-off-by: Richard Cochran richard.coch...@omicron.at
---
 arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h |   78 ++
 drivers/net/arm/ixp4xx_eth.c  |  191 ++
 drivers/ptp/Kconfig   |   13 +
 drivers/ptp/Makefile  |1 +
 drivers/ptp/ptp_ixp46x.c  |  345 +
 5 files changed, 628 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h
 create mode 100644 drivers/ptp/ptp_ixp46x.c

diff --git a/arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h 
b/arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h
new file mode 100644
index 000..729a6b2
--- /dev/null
+++ b/arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h
@@ -0,0 +1,78 @@
+/*
+ * PTP 1588 clock using the IXP46X
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _IXP46X_TS_H_
+#define _IXP46X_TS_H_
+
+#define DEFAULT_ADDEND 0xF029
+#define TICKS_NS_SHIFT 4
+
+struct ixp46x_channel_ctl {
+   u32 Ch_Control; /* 0x40 Time Synchronization Channel Control */
+   u32 Ch_Event;   /* 0x44 Time Synchronization Channel Event */
+   u32 TxSnapLo;   /* 0x48 Transmit Snapshot Low Register */
+   u32 TxSnapHi;   /* 0x4C Transmit Snapshot High Register */
+   u32 RxSnapLo;   /* 0x50 Receive Snapshot Low Register */
+   u32 RxSnapHi;   /* 0x54 Receive Snapshot High Register */
+   u32 SrcUUIDLo;  /* 0x58 Source UUID0 Low Register */
+   u32 SrcUUIDHi;  /* 0x5C Sequence Identifier/Source UUID0 High */
+};
+
+struct ixp46x_ts_regs {
+   u32 Control; /* 0x00 Time Sync Control Register */
+   u32 Event;   /* 0x04 Time Sync Event Register */
+   u32 Addend;  /* 0x08 Time Sync Addend Register */
+   u32 Accum;   /* 0x0C Time Sync Accumulator Register */
+   u32 Test;/* 0x10 Time Sync Test Register */
+   u32 Unused;  /* 0x14 */
+   u32 RSysTime_Lo; /* 0x18 RawSystemTime_Low Register */
+   u32 RSysTimeHi;  /* 0x1C RawSystemTime_High Register */
+   u32 SysTimeLo;   /* 0x20 SystemTime_Low Register */
+   u32 SysTimeHi;   /* 0x24 SystemTime_High Register */
+   u32 TrgtLo;  /* 0x28 TargetTime_Low Register */
+   u32 TrgtHi;  /* 0x2C TargetTime_High Register */
+   u32 ASMSLo;  /* 0x30 Auxiliary Slave Mode Snapshot Low  */
+   u32 ASMSHi;  /* 0x34 Auxiliary Slave Mode Snapshot High */
+   u32 AMMSLo;  /* 0x38 Auxiliary Master Mode Snapshot Low */
+   u32 AMMSHi;  /* 0x3C Auxiliary Master Mode Snapshot High */
+
+   struct ixp46x_channel_ctl channel[3];
+};
+
+/* 0x00 Time Sync Control Register Bits */
+#define TSCR_AMM (13)
+#define TSCR_ASM (12)
+#define TSCR_TTM (11)
+#define TSCR_RST (10)
+
+/* 0x04 Time Sync Event Register Bits */
+#define TSER_SNM (13)
+#define TSER_SNS (12)
+#define TTIPEND  (11)
+
+/* 0x40 Time Synchronization Channel Control Register Bits */
+#define MASTER_MODE   (10)
+#define TIMESTAMP_ALL (11)
+
+/* 0x44 Time Synchronization Channel Event Register Bits */
+#define TX_SNAPSHOT_LOCKED (10)
+#define RX_SNAPSHOT_LOCKED (11)
+
+#endif
diff --git a/drivers/net/arm/ixp4xx_eth.c b/drivers/net/arm/ixp4xx_eth.c
index 6028226..eaff9dd 100644
--- a/drivers/net/arm/ixp4xx_eth.c
+++ b/drivers/net/arm/ixp4xx_eth.c
@@ -30,9 +30,12 @@
 #include linux/etherdevice.h
 #include linux/io.h
 #include linux/kernel.h
+#include linux/net_tstamp.h
 #include linux/phy.h
 #include linux/platform_device.h
+#include linux/ptp_classify.h
 #include linux/slab.h
+#include mach/ixp46x_ts.h
 #include mach/npe.h
 #include mach/qmgr.h
 
@@ -67,6 +70,14 @@
 #define RXFREE_QUEUE(port_id)  (NPE_ID(port_id) + 26)
 #define TXDONE_QUEUE   31
 
+#define PTP_SLAVE_MODE 1
+#define PTP_MASTER_MODE2
+#define PORT2CHANNEL(p)1
+/*
+ * PHYSICAL_ID(p-id) ?
+ * TODO - Figure out correct mapping.
+ */
+
 /* TX Control Registers */
 #define TX_CNTRL0_TX_EN0x01
 #define TX_CNTRL0_HALFDUPLEX   0x02
@@ -171,6 +182,8 @@ struct port {
int id; /* 

[PATCH 8/8] ptp: Added a clock driver for the National Semiconductor PHYTER.

2010-09-23 Thread Richard Cochran
This patch adds support for the PTP clock found on the DP83640.
The basic clock operations and one external time stamp have
been implemented.

Signed-off-by: Richard Cochran richard.coch...@omicron.at
---
 drivers/net/phy/Kconfig   |   29 ++
 drivers/net/phy/Makefile  |1 +
 drivers/net/phy/dp83640.c |  887 +
 drivers/net/phy/dp83640_reg.h |  261 
 4 files changed, 1178 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/phy/dp83640.c
 create mode 100644 drivers/net/phy/dp83640_reg.h

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index eb799b3..2e6463d 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -77,6 +77,35 @@ config NATIONAL_PHY
---help---
  Currently supports the DP83865 PHY.
 
+config DP83640_PHY
+   tristate Driver for the National Semiconductor DP83640 PHYTER
+   depends on PTP_1588_CLOCK
+   depends on NETWORK_PHY_TIMESTAMPING
+   ---help---
+ Supports the DP83640 PHYTER with IEEE 1588 features.
+
+ This driver adds support for using the DP83640 as a PTP
+ clock. This clock is only useful if your PTP programs are
+ getting hardware time stamps on the PTP Ethernet packets
+ using the SO_TIMESTAMPING API.
+
+ In order for this to work, your MAC driver must also
+ implement the skb_tx_timetamp() function.
+
+config DP83640_PHY_STATUS_FRAMES
+   bool DP83640 Status Frames
+   default y
+   depends on DP83640_PHY
+   ---help---
+ This option allows the DP83640 PHYTER driver to obtain time
+ stamps from the PHY via special status frames, rather than
+ reading over the MDIO bus. Using status frames is therefore
+ more efficient. However, if enabled, this option will cause
+ the driver to add a mutlicast address to the MAC.
+
+ Say Y here, unless your MAC does not support multicast
+ destination addresses.
+
 config STE10XP
depends on PHYLIB
tristate Driver for STMicroelectronics STe10Xp PHYs
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 13bebab..2333215 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_FIXED_PHY)   += fixed.o
 obj-$(CONFIG_MDIO_BITBANG) += mdio-bitbang.o
 obj-$(CONFIG_MDIO_GPIO)+= mdio-gpio.o
 obj-$(CONFIG_NATIONAL_PHY) += national.o
+obj-$(CONFIG_DP83640_PHY)  += dp83640.o
 obj-$(CONFIG_STE10XP)  += ste10Xp.o
 obj-$(CONFIG_MICREL_PHY)   += micrel.o
 obj-$(CONFIG_MDIO_OCTEON)  += mdio-octeon.o
diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
new file mode 100644
index 000..4cabd0d
--- /dev/null
+++ b/drivers/net/phy/dp83640.c
@@ -0,0 +1,887 @@
+/*
+ * Driver for the National Semiconductor DP83640 PHYTER
+ *
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+#include linux/ethtool.h
+#include linux/kernel.h
+#include linux/list.h
+#include linux/mii.h
+#include linux/module.h
+#include linux/net_tstamp.h
+#include linux/netdevice.h
+#include linux/phy.h
+#include linux/ptp_classify.h
+#include linux/ptp_clock_kernel.h
+
+#include dp83640_reg.h
+
+#ifdef CONFIG_DP83640_PHY_STATUS_FRAMES
+#define USE_STATUS_FRAMES
+#endif
+
+#define DP83640_PHY_ID 0x20005ce1
+#define PAGESEL0x13
+#define LAYER4 0x02
+#define LAYER2 0x01
+#define MAX_RXTS   4
+#define MAX_TXTS   4
+#define N_EXT_TS   1
+#define PSF_PTPVER 2
+#define PSF_EVNT   0x4000
+#define PSF_RX 0x2000
+#define PSF_TX 0x1000
+#define EXT_EVENT  1
+#define EXT_GPIO   1
+
+#if defined(__BIG_ENDIAN)
+#define ENDIAN_FLAG0
+#elif defined(__LITTLE_ENDIAN)
+#define ENDIAN_FLAGPSF_ENDIAN
+#endif
+
+#define SKB_PTP_TYPE(__skb) (*(unsigned int *)((__skb)-cb))
+
+struct phy_rxts {
+   u16 ns_lo;   /* ns[15:0] */
+   u16 ns_hi;   /* overflow[1:0], ns[29:16] */
+   u16 sec_lo;  /* sec[15:0] */
+   u16 sec_hi;  /* sec[31:16] */
+   u16 seqid;   /* sequenceId[15:0] */
+   u16 msgtype; /* messageType[3:0], hash[11:0] */
+};
+
+struct phy_txts {
+   u16 ns_lo;   /* ns[15:0] */
+   u16 

Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread Christoph Lameter
On Thu, 23 Sep 2010, Richard Cochran wrote:

   Support for obtaining timestamps from a PHC already exists via the
   SO_TIMESTAMPING socket option, integrated in kernel version 2.6.30.
   This patch set completes the picture by allow user space programs to
   adjust the PHC and to control its ancillary features.

Is there a way to use the PHC as a system clock? I think the main benefit
of PTP is to have syncronized time on multiple machines in a cluster. That
may mean getting rid of ntp and using an in kernel PHC based way to sync time.

So as far as the POSIX standard is concerned, offering a clock id
to represent the PHC would be acceptable.

Sure but what would you do with it? HPET timer support has no such need.

 3.2.1 Using the POSIX Clock API
 

 Looking at the mapping from PHC operation to the POSIX clock API,
 we see that two of the basic clock operations, marked with *, have
 no POSIX equivalent. The items marked NA are peculiar to PHCs and
 will be discussed separately, below.

   Clock Operation   POSIX function
  -+-
   Set time  clock_gettime
   Get time  clock_settime
   Shift the clock   *
   Adjust clock frequency*
  -+-
   Time stamp external eventsNA
   Enable PPS events NA
   Periodic output signals   NA
   One shot or periodic alarms   timer_create, timer_settime

 In contrast to the standard Linux system clock, a PHC is
 adjustable in hardware, for example using frequency compensation
 registers or a VCO. The ability to directly tune the PHC is
 essential to reap the benefit of hardware timestamping.

There is a reason for not being able to shift posix clocks: The system has
one time base. The various clocks are contributing to maintaining that
sytem wide time.

I do not understand why you want to maintain different clocks running at
different speeds. Certainly interesting for some uses I guess that I
do not have the energy to imagine right now. But can we get the PTP killer
feature of synchronized accurate system time first?

 3.3 Synchronizing the Linux System Time
 

One could offer a PHC as a combined clock source and clock event
device. The advantage of this approach would be that it obviates
the need for synchronization when the PHC is selected as the system
timer. However, some PHCs, namely the PHY based clocks, cannot be
used in this way.

Why not? Do PHY based clock not at least provide a counter that increments
in synchronized intervals throughout the network?

Instead, the patch set provides a way to offer a Pulse Per Second
(PPS) event from the PHC to the Linux PPS subsystem. A user space
application can read the PPS events and tune the system clock, just
like when using other external time sources like radio clocks or
GPS.

User space is subject to various latencies created by the OS etc. I would
that in order to have fine grained (read microsecond) accurary we would
have to run the portions that are relevant to obtaining the desired
accuracy in the kernel.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 12/20] powerpc: change to new flag variables

2010-09-23 Thread matt mooney
On 20:19 Thu 23 Sep , Stephen Rothwell wrote:
 Hi Matt,
 
 On Wed, 22 Sep 2010 23:51:09 -0700 matt mooney m...@muteddisk.com wrote:
 
  Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y.
 
 This looks good.  One comment below ...
 
  --- a/arch/powerpc/platforms/pseries/Makefile
  +++ b/arch/powerpc/platforms/pseries/Makefile
  @@ -1,10 +1,5 @@
  -ifeq ($(CONFIG_PPC64),y)
  -EXTRA_CFLAGS   += -mno-minimal-toc
  -endif
  -
  -ifeq ($(CONFIG_PPC_PSERIES_DEBUG),y)
  -EXTRA_CFLAGS   += -DDEBUG
  -endif
  +ccflags-$(CONFIG_PPC64):= -mno-minimal-toc
  +ccflags-$(CONFIG_PPC_PSERIES_DEBUG)+= -DDEBUG
   
   obj-y  := lpar.o hvCall.o nvram.o reconfig.o \
 setup.o iommu.o event_sources.o ras.o \
  @@ -23,7 +18,7 @@ obj-$(CONFIG_MEMORY_HOTPLUG)  += hotplug-memory.o
   obj-$(CONFIG_HVC_CONSOLE)  += hvconsole.o
   obj-$(CONFIG_HVCS) += hvcserver.o
   obj-$(CONFIG_HCALL_STATS)  += hvCall_inst.o
  -obj-$(CONFIG_PHYP_DUMP)+= phyp_dump.o
  +obj-$(CONFIG_PHYP_DUMP)+= phyp_dump.o
   obj-$(CONFIG_CMM)  += cmm.o
   obj-$(CONFIG_DTL)  += dtl.o
 
 This looks like a spurious extra hunk.

Hi Stephen,

Yeah your right, logically it doesn't follow from my changeset. I should
have left it alone, but it was the only line in the file that didn't align
properly with its surrounding area.

-mfm

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread Christoph Lameter
On Thu, 23 Sep 2010, Jacob Keller wrote:

  There is a reason for not being able to shift posix clocks: The system has
  one time base. The various clocks are contributing to maintaining that
  sytem wide time.
 
  Adjusting clocks is absolutely essential for proper functioning of the PTP
 protocol. The slave obtains and calculates the offset from master and uses
 that in order to adjust the clock properly, The problem is that the
 timestamps are done via the hardware. We need a method to expose that
 hardware so that the ptp software can properly adjust those clocks.

There is no way to use that clock directly to avoid all the user space
tuning etc? There are already tuning mechanisms in the kernel that do this
with system time based on periodic clocks. If you calculate the
nanoseconds since the epoch then you should be able to use that to tune
system time.

  I do not understand why you want to maintain different clocks running at
  different speeds. Certainly interesting for some uses I guess that I
  do not have the energy to imagine right now. But can we get the PTP killer
  feature of synchronized accurate system time first?
 

 The problem is maintaining a hardware clock at the correct speed/frequency
 and time. The timestamping is done via hardware, and that hardware clock
 needs to be accurate. We need to be able to modify that clock. Yes, having
 the system time be the same value would be nice, but the problem comes
 because we don't want to jump through hoops to keep that hardware clock
 accurate to the ptp protocol running on the network.

Then allow system time == hardware clock?

 All of the necessary features for microsecond or better accuracy are done
 via the hardware. You can get accuracy to within 10 mircoseconds while only
 sending sync packets and such once per second. The reason is because the
 hardware timestamps are very accurate. But if we can't properly adjust the
 clocks time and frequency, we cannot maintain the accuracy of the
 timestamps.

You can already adjust the system time with the existing APIs. Tuning
hardware clocks is currently done using device specific controls. But I
would think that you do not need to expose this to user space if you can
do it all in kernel.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop

2010-09-23 Thread Randy Dunlap
On Thu, 23 Sep 2010 13:53:18 +0200 Mikael Pettersson wrote:

 Running modules_install from a newly built 2.6.36-rc5 kernel
 on my 32-bit PowerMac results in:
 
 WARNING: Module 
 /lib/modules/2.6.36-rc5/kernel/drivers/i2c/busses/i2c-powermac.ko ignored, 
 due to loop
 WARNING: Loop detected: 
 /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko needs of_i2c.ko which 
 needs i2c-core.ko again!
 WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko 
 ignored, due to loop
 WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-dev.ko 
 ignored, due to loop
 WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/of/of_i2c.ko ignored, 
 due to loop
 WARNING: Module /lib/modules/2.6.36-rc5/kernel/sound/ppc/snd-powermac.ko 
 ignored, due to loop
 
  grep '.*I2C.*=' .config
 CONFIG_OF_I2C=m
 CONFIG_I2C=m
 CONFIG_I2C_BOARDINFO=y
 CONFIG_I2C_CHARDEV=m
 CONFIG_I2C_POWERMAC=m
 
 I can't say exactly when this started, haven't built kernels on this
 box in a while.


No kconfig warnings?  Please post your full .config file.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] De-couple sysfs memory directories from memory sections

2010-09-23 Thread Balbir Singh
* Nathan Fontenot nf...@austin.ibm.com [2010-09-22 09:15:43]:

 This set of patches decouples the concept that a single memory
 section corresponds to a single directory in 
 /sys/devices/system/memory/.  On systems
 with large amounts of memory (1+ TB) there are performance issues
 related to creating the large number of sysfs directories.  For
 a powerpc machine with 1 TB of memory we are creating 63,000+
 directories.  This is resulting in boot times of around 45-50
 minutes for systems with 1 TB of memory and 8 hours for systems
 with 2 TB of memory.  With this patch set applied I am now seeing
 boot times of 5 minutes or less.
 
 The root of this issue is in sysfs directory creation. Every time
 a directory is created a string compare is done against all sibling
 directories to ensure we do not create duplicates.  The list of
 directory nodes in sysfs is kept as an unsorted list which results
 in this being an exponentially longer operation as the number of
 directories are created.
 
 The solution solved by this patch set is to allow a single
 directory in sysfs to span multiple memory sections.  This is
 controlled by an optional architecturally defined function
 memory_block_size_bytes().  The default definition of this
 routine returns a memory block size equal to the memory section
 size. This maintains the current layout of sysfs memory
 directories as it appears to userspace to remain the same as it
 is today.
 
 For architectures that define their own version of this routine,
 as is done for powerpc in this patchset, the view in userspace
 would change such that each memoryXXX directory would span
 multiple memory sections.  The number of sections spanned would
 depend on the value reported by memory_block_size_bytes.
 
 In both cases a new file 'end_phys_index' is created in each
 memoryXXX directory.  This file will contain the physical id
 of the last memory section covered by the sysfs directory.  For
 the default case, the value in 'end_phys_index' will be the same
 as in the existing 'phys_index' file.


What does this mean for memory hotplug or hotunplug? 

-- 
Three Cheers,
Balbir
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread john stultz
On Thu, 2010-09-23 at 12:53 -0500, Christoph Lameter wrote:
 On Thu, 23 Sep 2010, Richard Cochran wrote:
  In contrast to the standard Linux system clock, a PHC is
  adjustable in hardware, for example using frequency compensation
  registers or a VCO. The ability to directly tune the PHC is
  essential to reap the benefit of hardware timestamping.
 
 There is a reason for not being able to shift posix clocks: The system has
 one time base. The various clocks are contributing to maintaining that
 sytem wide time.
 
 I do not understand why you want to maintain different clocks running at
 different speeds. Certainly interesting for some uses I guess that I
 do not have the energy to imagine right now. But can we get the PTP killer
 feature of synchronized accurate system time first?

This was my initial gut reaction as well, but in the end, I agree with
Richard that in the case of one or multiple PTP hardware clocks, we
really can't abstract over the different time domains.



  3.3 Synchronizing the Linux System Time
  
 
 One could offer a PHC as a combined clock source and clock event
 device. The advantage of this approach would be that it obviates
 the need for synchronization when the PHC is selected as the system
 timer. However, some PHCs, namely the PHY based clocks, cannot be
 used in this way.
 
 Why not? Do PHY based clock not at least provide a counter that increments
 in synchronized intervals throughout the network?

I really don't think the PTP clock can be used as a clocksource sanely.

First, the hardware access is much to slow for system timekeeping.

Second, there is the problem that the system time is a software clock,
and adjustments made (like freq) are made in the layer that interprets
the underlying hardware cycle counter. Adjustments made in PTP (in order
to sync the network timestamps) are made at the hardware level. 

This would cause a disconnect between the hardware freq understood by
the system time management code and the actual hardware freq.

Richard, I'd actually strike this paragraph from the rational, as I feel
it has the tendency to confuse as it suggests having the PHC as a
clocksource is feasible when really it isn't. Or alternatively, maybe
express more clearly why its not feasible, so it doesn't just seem like
a minor design choice.

thanks
-john

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread john stultz
On Thu, 2010-09-23 at 19:30 +0200, Richard Cochran wrote:
 Here is the sixth version of my patch set adding PTP hardware clock
 support to the Linux kernel. The main difference to v5 is that the
 character device interface has been replaced with one based on the
 posix clock system calls.
 
 The first three patches add necessary background support in the posix
 clock code. The last five add the new PTP hardware clock features.
 Previously, I had tried to present the posix clock changes all by
 themselves, but commentators asked to see the whole context.

Richard,
Its great to see this work continue and the patch set is shaping up
nicely! There's still a few details to work out, but I think the
remaining issues are relatively small.


 3.2.3 Dynamic POSIX Clock IDs 
 --
 
 The reaction on the list to having a static id like CLOCK_PTP was
 mostly negative. However, the idea of generating a clock id
 dynamically seems to have gained acceptance. The general idea is
 to advertise the available clock ids to user space via sysfs. This
 patch set implements two different ways:
 
 /sys/class/timesource/name/id
 /sys/class/ptp/ptp_clock_X/id
 
 Note: I am not too sure that this is exactly what people imagined,
   but it is my best understanding so far. I gleaned two
   different ideas about where to offer the clock id. In order
   to keep just one way, I will be happy to remove the less
   popular one.

So yea, I'm not a fan of the timesource sysfs interface. One, I think
the name is poor (posix_clocks or something a little more specific would
be an improvement), and second, I don't like the dictionary interface,
where one looks up the clock by name.

Instead, I think having the id hanging off the class driver is much
better, as it allows mapping the actual hardware to the id more clearly.

So I'd drop the timesource listing. And maybe change id to
clock_id so its a little more clear what the id is for.



 3.3 Synchronizing the Linux System Time 
 
 
One could offer a PHC as a combined clock source and clock event
device. The advantage of this approach would be that it obviates
the need for synchronization when the PHC is selected as the system
timer. However, some PHCs, namely the PHY based clocks, cannot be
used in this way.

Again, I'd scratch this. 

What I think you might want to mention is that an application like NTP
could use the PTP clockid much like NTP currently can be configured to
use the RTC to steer the system time.

Possibly the PTPd could just do this, reducing the number of deamons and
avoiding mixing NTP up in what is really a different sync algorithm.

Instead, the patch set provides a way to offer a Pulse Per Second
(PPS) event from the PHC to the Linux PPS subsystem. A user space
application can read the PPS events and tune the system clock, just
like when using other external time sources like radio clocks or
GPS.

Forgive me for a bit of a tangent here:
So while I think this PPS method is a neat idea, I'm a little curious
how much of a difference the PPS method for syncing the clock would be
over just a simple reading of the two clocks and correcting the offset.

It seems much of it depends on the read latency of the PTP hardware vs
the interrupt latency. Also the PTP clock granularity would effect the
read accuracy (like on the RTC, you don't really know how close to the
second boundary you are).

Have you done any such measurements between the two methods? I just
wonder if it would actually be something noticeable, and if its not, how
much lighter this patch-set would be without the PPS connection.

Again, this isn't super critical, just trying to make sure we don't end
up adding a bunch of code that doesn't end up being used. Also PPS
interrupts are awfully frequent, so systems concerned with power-saving
and deep idles probably would like something that could be done at a
more coarse interval.


 3.5 User timers 
 
 
Using the POSIX clock API gived user space the possibility to
create and use timers with timer_create and timer_settime. In the
current patch set the kernel functionality is not implemented,
since there are some issues to consider first. I see two ways to do
about this.
 
1. Implement the functionality anew. This approach might end up
   duplicating similar code that already exists. Also, looking at
   the hrtimer code, getting user timers right seems to have a
   number of gotchas and thorny issues.
 
2. Reuse the hrtimer code. Since the hrtimer code uses a clock
   event device under the hood, it might be possible (in theory) to
   offer capable PHCs as clock event devices. However, the current
   hrtimers are hard-coded to the event device via a per-cpu
   global. Perhaps one could associate an event device with a
  

Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread Christoph Lameter
On Thu, 23 Sep 2010, john stultz wrote:

 This was my initial gut reaction as well, but in the end, I agree with
 Richard that in the case of one or multiple PTP hardware clocks, we
 really can't abstract over the different time domains.

My (arguably still superficial) review of the source does not show
anything that would make me reach that conclusion.

 I really don't think the PTP clock can be used as a clocksource sanely.

 First, the hardware access is much to slow for system timekeeping.

The HPET or pit timesource are also quite slow these days. You only need
access periodically to essentially tune the TSC ratio.

 Second, there is the problem that the system time is a software clock,
 and adjustments made (like freq) are made in the layer that interprets
 the underlying hardware cycle counter. Adjustments made in PTP (in order
 to sync the network timestamps) are made at the hardware level.

From what I can see the PTP clocks are periodic hardware cycle counters
like any other clock that we currently support. If its configurable enough
then setup a hardware cycle counter that mimics nanoseconds since the
epoch as closely as possible and use that to sync the TSC rate to. Makes
it very easy.

 This would cause a disconnect between the hardware freq understood by
 the system time management code and the actual hardware freq.

We can switch underlying clocks for system time already. We can adapt to a
different hw frequency. But then I do not know why adjust the freq? I
thought the point was that the periodic clock was network synchronized and
can be used as the master clock for multiple machines?

 Richard, I'd actually strike this paragraph from the rational, as I feel
 it has the tendency to confuse as it suggests having the PHC as a
 clocksource is feasible when really it isn't. Or alternatively, maybe
 express more clearly why its not feasible, so it doesn't just seem like
 a minor design choice.

Sorry but I still feel that this is pretty much a misguided approach that
creates unnecessary layers in the kernel. The trivial easy approach was
not done (copy a driver from drivers/clocksource, modify so that it
programs access to a centralized periodic ptp signal and uses it for
system sync).
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.

2010-09-23 Thread Christoph Lameter
On Thu, 23 Sep 2010, Richard Cochran wrote:

 +* Gianfar PTP clock nodes
 +
 +General Properties:
 +
 +  - compatible   Should be fsl,etsec-ptp
 +  - reg  Offset and length of the register set for the device
 +  - interrupts   There should be at least two interrupts. Some devices
 + have as many as four PTP related interrupts.
 +
 +Clock Properties:
 +
 +  - tclk-period  Timer reference clock period in nanoseconds.
 +  - tmr-prsc Prescaler, divides the output clock.
 +  - tmr-add  Frequency compensation value.
 +  - cksel0= external clock, 1= eTSEC system clock, 3= RTC clock 
 input.
 + Currently the driver only supports choice 1.
 +  - tmr-fiper1   Fixed interval period pulse generator.
 +  - tmr-fiper2   Fixed interval period pulse generator.
 +  - max-adj  Maximum frequency adjustment in parts per billion.
 +
 +  These properties set the operational parameters for the PTP
 +  clock. You must choose these carefully for the clock to work right.
 +  Here is how to figure good values:
 +
 +  TimerOsc = system clock   MHz
 +  tclk_period  = desired clock period   nanoseconds
 +  NominalFreq  = 1000 / tclk_period MHz
 +  FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0)
 +  tmr_add  = ceil(2^32 / FreqDivRatio)
 +  OutputClock  = NominalFreq / tmr_prsc MHz
 +  PulseWidth   = 1 / OutputClockmicroseconds
 +  FiperFreq1   = desired frequency in Hz
 +  FiperDiv1= 100 * OutputClock / FiperFreq1
 +  tmr_fiper1   = tmr_prsc * tclk_period * FiperDiv1 - tclk_period
 +  max_adj  = 10 * (FreqDivRatio - 1.0) - 1

Great stuff for clock synchronization...

 +  The calculation for tmr_fiper2 is the same as for tmr_fiper1. The
 +  driver expects that tmr_fiper1 will be correctly set to produce a 1
 +  Pulse Per Second (PPS) signal, since this will be offered to the PPS
 +  subsystem to synchronize the Linux clock.

Argh. And conceptually completely screwed up. Why go through the PPS
subsystem if you can directly tune the system clock based on a number of
the cool periodic clock features that you have above? See how the other
clocks do that easily? Look into drivers/clocksource. Add it there.

Please do not introduce useless additional layers for clock sync. Load
these ptp clocks like the other regular clock modules and make them sync
system time like any other clock.

Really guys: I want a PTP solution! Now! And not some idiotic additional
kernel layers that just pass bits around because its so much fun and
screws up clock accurary in due to the latency noise introduced while
having so much fun with the bits.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.

2010-09-23 Thread john stultz
On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote:
 A new syscall is introduced that allows tuning of a POSIX clock. The
 syscall is implemented for four architectures: arm, blackfin, powerpc,
 and x86.
 
 The new syscall, clock_adjtime, takes two parameters, the clock ID,
 and a pointer to a struct timex. The semantics of the timex struct
 have been expanded by one additional mode flag, which allows an
 absolute offset correction. When specificied, the clock offset is
 immediately corrected by adding the given time value to the current
 time value.


So I'd still split this patch up a little bit more.

1) Patch that implements the ADJ_SETOFFSET  (*and its implementation*)
in do_adjtimex.

2) Patch that adds the new syscall and clock_id multiplexing.

3) Patches that wire it up to the rest of the architectures (there's
still a bunch missing here).



And one little nit in the code:

 diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
 index 9ca4973..446b566 100644
 --- a/kernel/posix-timers.c
 +++ b/kernel/posix-timers.c
 @@ -197,6 +197,14 @@ static int common_timer_create(struct k_itimer 
 *new_timer)
   return 0;
  }
 
 +static inline int common_clock_adj(const clockid_t which_clock, struct timex 
 *t)
 +{
 + if (CLOCK_REALTIME == which_clock)
 + return do_adjtimex(t);
 + else
 + return -EOPNOTSUPP;
 +}


Would it make sense to point to the do_adjtimex() in the k_clock
definition for CLOCK_REALTIME rather then conditionalizing it here?



thanks
-john

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT

2010-09-23 Thread Paul Gortmaker
From: Tiejun Chen tiejun.c...@windriver.com

There exists a four line chunk of code, which when configured for
64 bit address space, can incorrectly set certain page flags during
the TLB creation.  It turns out that this is legacy code that is no
longer required, but since it isn't obvious why this is legacy code
or why it causes problems, the below description covers both in detail.

For powerpc bootstrap, the physical memory (at most 768M), is mapped
into the kernel space via the following path:

MMU_init()
|
+ adjust_total_lowmem()
|
+ map_mem_in_cams()
|
+ settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0);

On settlbcam(), the kernel will create TLB entries according to the flag,
PAGE_KERNEL_X.

settlbcam()
{
...
TLBCAM[index].MAS1 = MAS1_VALID
| MAS1_IPROT | MAS1_TSIZE(tsize) | MAS1_TID(pid);
^
These entries cannot be invalidated by the
kernel since MAS1_IPROT is set on TLB property.
...
if (flags  _PAGE_USER) {
   TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
   TLBCAM[index].MAS3 |= ((flags  _PAGE_RW) ? MAS3_UW : 0);
}

For classic BookE (flags  _PAGE_USER) is 'zero' so it's fine.
But on boards like the the Freescale P4080, we want to support 36-bit
physical address on it. So the following options may be set:

CONFIG_FSL_BOOKE=y
CONFIG_PTE_64BIT=y
CONFIG_PHYS_64BIT=y

As a result, boards like the P4080 will introduce PTE format as Book3E.
As per the file: arch/powerpc/include/asm/pgtable-ppc32.h

  * #elif defined(CONFIG_FSL_BOOKE)  defined(CONFIG_PTE_64BIT)
  * #include asm/pte-book3e.h

So PAGE_KERNEL_X is __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX) and the
book3E version of _PAGE_KERNEL_RWX is defined with:

  (_PAGE_BAP_SW | _PAGE_BAP_SR | _PAGE_DIRTY | _PAGE_BAP_SX)

Note the _PAGE_BAP_SR, which is also defined in the book3E _PAGE_USER:

  #define _PAGE_USER(_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */

So the possibility exists to wrongly assign the user MAS3_URWX bits
to kernel (PAGE_KERNEL_X) address space via the following code fragment:

if (flags  _PAGE_USER) {
   TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
   TLBCAM[index].MAS3 |= ((flags  _PAGE_RW) ? MAS3_UW : 0);
}

Here is a dump of the TLB info from Simics with the above code present:
--
L2 TLB1
GT   SSS UUU V I
 Row  Logical   PhysicalSS TLPID  TID  WIMGE XWR XWR F P   V
- - --- -- - - - --- --- - -   -
  0   c000-cfff 0-00fff 00 0 0   M   XWR XWR 0 1   1
  1   d000-dfff 01000-01fff 00 0 0   M   XWR XWR 0 1   1
  2   e000-efff 02000-02fff 00 0 0   M   XWR XWR 0 1   1

Actually this conditional code was only used for two legacy functions:

  1: support KGDB to set break point.
 KGDB already dropped this; now uses its core write to set break point.

  2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE)
 for device IO space.
 This use case is also removed from the latest PowerPC kernel.

So it looks like the deletion of these 4 lines of code was simply
overlooked when the above two cases went away.

With the code deleted, the TLB appears without U having XWR as below:

---
L2 TLB1
GT   SSS UUU V I
 Row  Logical   PhysicalSS TLPID  TID  WIMGE XWR XWR F P   V
- - --- -- - - - --- --- - -   -
  0   c000-cfff 0-00fff 00 0 0   M   XWR 0 1   1
  1   d000-dfff 01000-01fff 00 0 0   M   XWR 0 1   1
  2   e000-efff 02000-02fff 00 0 0   M   XWR 0 1   1

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
Signed-off-by: Paul Gortmaker paul.gortma...@windriver.com
---
 arch/powerpc/mm/fsl_booke_mmu.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index d5fa5f2..9de7e1b 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -136,11 +136,6 @@ static void settlbcam(int index, unsigned long virt, 
phys_addr_t phys,
if (mmu_has_feature(MMU_FTR_BIG_PHYS))
TLBCAM[index].MAS7 = (u64)phys  32;
 
-   if (flags  _PAGE_USER) {
-  TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
-  TLBCAM[index].MAS3 |= ((flags  _PAGE_RW) ? MAS3_UW : 0);
-   }
-
tlbcam_addrs[index].start = virt;
tlbcam_addrs[index].limit = virt + size - 1;
tlbcam_addrs[index].phys = phys;
-- 
1.7.2.1

___
Linuxppc-dev mailing list

Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop

2010-09-23 Thread Mikael Pettersson
Randy Dunlap writes:
  On Thu, 23 Sep 2010 13:53:18 +0200 Mikael Pettersson wrote:
  
   Running modules_install from a newly built 2.6.36-rc5 kernel
   on my 32-bit PowerMac results in:
   
   WARNING: Module 
   /lib/modules/2.6.36-rc5/kernel/drivers/i2c/busses/i2c-powermac.ko ignored, 
   due to loop
   WARNING: Loop detected: 
   /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko needs of_i2c.ko 
   which needs i2c-core.ko again!
   WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko 
   ignored, due to loop
   WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-dev.ko 
   ignored, due to loop
   WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/of/of_i2c.ko 
   ignored, due to loop
   WARNING: Module /lib/modules/2.6.36-rc5/kernel/sound/ppc/snd-powermac.ko 
   ignored, due to loop
   
grep '.*I2C.*=' .config
   CONFIG_OF_I2C=m
   CONFIG_I2C=m
   CONFIG_I2C_BOARDINFO=y
   CONFIG_I2C_CHARDEV=m
   CONFIG_I2C_POWERMAC=m
   
   I can't say exactly when this started, haven't built kernels on this
   box in a while.
  
  
  No kconfig warnings?

Not that I recall.  I can check tomorrow if necessary.

  Please post your full .config file.

#
# Automatically generated make config: don't edit
#
# CONFIG_PPC64 is not set

#
# Processor support
#
CONFIG_PPC_BOOK3S_32=y
# CONFIG_PPC_85xx is not set
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
# CONFIG_E200 is not set
CONFIG_PPC_BOOK3S=y
CONFIG_6xx=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_32=y
# CONFIG_PPC_MM_SLICES is not set
CONFIG_PPC_HAVE_PMU_SUPPORT=y
# CONFIG_SMP is not set
CONFIG_PPC32=y
CONFIG_WORD_SIZE=32
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
CONFIG_MMU=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
# CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set
CONFIG_IRQ_PER_CPU=y
CONFIG_NR_IRQS=64
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
# CONFIG_ARCH_NO_VIRT_TO_BUS is not set
CONFIG_PPC=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_NVRAM=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_OF=y
# CONFIG_PPC_UDBG_16550 is not set
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_DEFAULT_UIMAGE is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
CONFIG_CONSTRUCTORS=y

#
# General setup
#
# CONFIG_EXPERIMENTAL is not set
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_TINY_RCU is not set
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=32
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_SYSFS_DEPRECATED_V2 is not set
# CONFIG_RELAY is not set
# CONFIG_NAMESPACES is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
CONFIG_RD_GZIP=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_RD_LZO is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_EMBEDDED=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
# CONFIG_SIGNALFD is not set
# CONFIG_TIMERFD is not set
# CONFIG_EVENTFD is not set
CONFIG_SHMEM=y
# CONFIG_AIO is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
# CONFIG_PERF_EVENTS is not set
# CONFIG_PERF_COUNTERS is not set
# CONFIG_VM_EVENT_COUNTERS is not set
CONFIG_PCI_QUIRKS=y
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_PROFILING is not set
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y

#
# GCOV-based kernel profiling
#
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_BLOCK=y
# CONFIG_LBDAF 

Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.

2010-09-23 Thread Alan Cox
 Please do not introduce useless additional layers for clock sync. Load
 these ptp clocks like the other regular clock modules and make them sync
 system time like any other clock.

I don't think you understand PTP. PTP has masters, a system can need to
be honouring multiple conflicting masters at once.

 Really guys: I want a PTP solution! Now! And not some idiotic additional
 kernel layers that just pass bits around because its so much fun and
 screws up clock accurary in due to the latency noise introduced while
 having so much fun with the bits.

There are some interesting complications in putting a PTP sync
interface in kernel.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread john stultz
On Thu, 2010-09-23 at 14:15 -0500, Christoph Lameter wrote:
 On Thu, 23 Sep 2010, john stultz wrote:
 
  This was my initial gut reaction as well, but in the end, I agree with
  Richard that in the case of one or multiple PTP hardware clocks, we
  really can't abstract over the different time domains.
 
 My (arguably still superficial) review of the source does not show
 anything that would make me reach that conclusion.
 
  I really don't think the PTP clock can be used as a clocksource sanely.
 
  First, the hardware access is much to slow for system timekeeping.
 
 The HPET or pit timesource are also quite slow these days. You only need
 access periodically to essentially tune the TSC ratio.

If we're using the TSC, then we're not using the PTP clock as you
suggest. Further the HPET and PIT aren't used to steer the system time
when we are using the TSC as a clocksource. Its only used to calibrate
the initial constant freq used by the timekeeping code (and if its
non-constant, we throw it out).

  Second, there is the problem that the system time is a software clock,
  and adjustments made (like freq) are made in the layer that interprets
  the underlying hardware cycle counter. Adjustments made in PTP (in order
  to sync the network timestamps) are made at the hardware level.
 
 From what I can see the PTP clocks are periodic hardware cycle counters
 like any other clock that we currently support. If its configurable enough
 then setup a hardware cycle counter that mimics nanoseconds since the
 epoch as closely as possible and use that to sync the TSC rate to. Makes
 it very easy.

I guess I'm confused by what you're suggesting.
If we're using the TSC, then that's the clocksource timekeeping uses.
The original issue seemed to be around the suggestion of using the PTP
clock as a clocksource, which I don't think is really feasible.

Again, that's because
1) The PTP access latency is slow (so is the PIT, true enough, but no
one should be using the PIT as a clocksource unless they really have no
better hardware - its really only useful for 486s and old freq scaling
laptops that have no other stable clocksource).

2) The way PTP clocks are steered to sync with network time causes their
hardware freq to actually change. Since these adjustments are done on
the hardware clock level, and not on the system time level, the
adjustments to sync the system time/freq would then be made incorrect by
PTP hardware adjustments. 

3) Further, the PTP hardware counter can be simply set to a new offset
to put it in line with the network time. This could cause trouble with
timekeeping much like unsynced TSCs do.


Now, what you seem to be suggesting is to use the TSC (or whatever
clocksource the system time is using) but to steer the system time using
the PTP clock. This is actually what is being proposed, however, the
steering is done in userland. This is due to the fact that there are two
components to the steering, 1) adjusting the PTP clock hardware to
network time and 2) adjusting the system time to the PTP hardware. By
exposing the PTP clock to userland via the posix clocks interface, we
allow this to easily be done.


  This would cause a disconnect between the hardware freq understood by
  the system time management code and the actual hardware freq.
 
 We can switch underlying clocks for system time already. We can adapt to a
 different hw frequency.

Actually no. The timekeeping code requires a fixed freq counter. Dealing
with hardware freq changes is difficult, because error is introduced by
the latency between when the freq changes and when the timekeeping code
is notified of it. So the system treats the hardware counters as fixed
freq. Now, hardware does vary freq ever so slightly as thermal
conditions change, but this is addressed in userland and corrected via
adjtimex.

  But then I do not know why adjust the freq? I
 thought the point was that the periodic clock was network synchronized and
 can be used as the master clock for multiple machines?

Not parsing that. What do you mean by periodic clock?

  Richard, I'd actually strike this paragraph from the rational, as I feel
  it has the tendency to confuse as it suggests having the PHC as a
  clocksource is feasible when really it isn't. Or alternatively, maybe
  express more clearly why its not feasible, so it doesn't just seem like
  a minor design choice.
 
 Sorry but I still feel that this is pretty much a misguided approach that
 creates unnecessary layers in the kernel.

Unnecessary layers? Where? This approach has less in-kernel layers, as
it exposes the PTP clock to userland, instead of trying to layer things
on top of it and stretching the system time abstraction to cover it.

  The trivial easy approach was
 not done (copy a driver from drivers/clocksource, modify so that it
 programs access to a centralized periodic ptp signal and uses it for
 system sync).

I disagree.

I've argued through the approach trying to keep it all internal to the
kernel, 

Re: [PATCH 2/2] PPC4xx: Merge xor.h and dma.h into onefile ppc440spe-dma.h

2010-09-23 Thread Dan Williams
On Fri, Sep 17, 2010 at 6:42 PM,  tma...@apm.com wrote:
 From: Tirumala Marri tma...@apm.com
 This patch combines drivers/dma/ppc4xx/xor.h and driver/dma/dma/ppc4xx/dma.h
 into drivers/dma/ppc4xx/ppx440spe-dma.h .


Is this just code churn, or do we gain anything by combining these
header files?  Don't add ppc440spe- back to the prefix, we're
already in the ppc4xx directory, unless the file will only contain
definitions that are relevant to ppc440spe.

--
Dan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c

2010-09-23 Thread Dan Williams

On 9/17/2010 6:42 PM, tma...@apm.com wrote:

From: Tirumala Marritma...@apm.com

This patch generalizes the existing drver/dma/ppc4xx/adma.c, so that
common code can be shared between different similar DMA engine
drivers in other SoCs.

Signed-off-by: Tirumala R Marritma...@apm.com
---
  drivers/dma/ppc4xx/adma.c| 4370 +++---
  drivers/dma/ppc4xx/adma.h|  116 +-
  drivers/dma/ppc4xx/ppc4xx-adma.h | 4020 +++
  3 files changed, 4357 insertions(+), 4149 deletions(-)
  create mode 100644 drivers/dma/ppc4xx/ppc4xx-adma.h



Will both versions of this driver exist in the same kernel build?  For 
example the iop-adma driver supports iop13xx and iop3xx, but we select 
the archtitecture at build time?  Or, as I assume in this case, will the 
two (maybe more?) ppc4xx adma drivers all be built in the same image, 
more like ioatdma?


In the latter case I would recommend a file structure like:

drivers/dma/ppc4xx/adma.c
drivers/dma/ppc4xx/adma_440spe.c
drivers/dma/ppc4xx/adma_460ex.c

With patches to move the chipset specific pieces to their own file. 
Minimizing the code churn in adma.c, or at least showing a progression 
of what is unique and needs to be moved.


This would be similar to how ioatdma is structured and compiles a single 
driver to cover the three major hardware revisions.


--
Dan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.

2010-09-23 Thread Christoph Lameter
On Thu, 23 Sep 2010, Alan Cox wrote:

  Please do not introduce useless additional layers for clock sync. Load
  these ptp clocks like the other regular clock modules and make them sync
  system time like any other clock.

 I don't think you understand PTP. PTP has masters, a system can need to
 be honouring multiple conflicting masters at once.

The upshot of it all has to be some synchronized notion of time regardless
of how many other things are going on under the hood. And the spec here
suggests a hardware able to generate periodic accurate events that can be
used to sync system time.

  Really guys: I want a PTP solution! Now! And not some idiotic additional
  kernel layers that just pass bits around because its so much fun and
  screws up clock accurary in due to the latency noise introduced while
  having so much fun with the bits.

 There are some interesting complications in putting a PTP sync
 interface in kernel.

If the PTP logic internally has to juggle multiple clocks then that is a
complication for the driver ok. In any case the driver ultimately has to
provide *one* source of time for the system to sync to.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT

2010-09-23 Thread Scott Wood
On Thu, 23 Sep 2010 16:10:15 -0400
Paul Gortmaker paul.gortma...@windriver.com wrote:

 So the possibility exists to wrongly assign the user MAS3_URWX bits
 to kernel (PAGE_KERNEL_X) address space via the following code fragment:
 
 if (flags  _PAGE_USER) {
TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
TLBCAM[index].MAS3 |= ((flags  _PAGE_RW) ? MAS3_UW : 0);
 }
 
 Here is a dump of the TLB info from Simics with the above code present:
 --
 L2 TLB1
 GT   SSS UUU V I
  Row  Logical   PhysicalSS TLPID  TID  WIMGE XWR XWR F P  
  V
 - - --- -- - - - --- --- - -  
  -
   0   c000-cfff 0-00fff 00 0 0   M   XWR XWR 0 1  
  1
   1   d000-dfff 01000-01fff 00 0 0   M   XWR XWR 0 1  
  1
   2   e000-efff 02000-02fff 00 0 0   M   XWR XWR 0 1  
  1
 
 Actually this conditional code was only used for two legacy functions:
 
   1: support KGDB to set break point.
  KGDB already dropped this; now uses its core write to set break point.
 
   2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE)
  for device IO space.
  This use case is also removed from the latest PowerPC kernel.

io_block_mapping() went away, but the feature itself is still useful
and might come back with something like this:

http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg33851.html

...though I'm not sure why such mappings would ever have user access.

This could end up being used for large user pages by something like
hugetlbfs or KVM, though.  I don't think we want to make large user
pages fail, especailly if it just happens with the 32-bit page table
format (which i may not what the person adding such a feature tests
with).

I don't see a generic accessor that can test PTE flags for user
access -- in the absence of one, I guess we need an ifdef here.  Or at
least put in a comment so anyone who adds a userspace use knows they
need to fix it.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread john stultz
On Thu, 2010-09-23 at 21:36 +0100, Alan Cox wrote:
 So as far as the POSIX standard is concerned, offering a clock id
 to represent the PHC would be acceptable.
 
 But completely useless as you may have more than one entirely different
 time managed by PTP and in which you are not master but must work with
 the timebases provided.

I don't see how this is a problem, as it exposes the multiple hardware
clocks via different posix clock ids. So in the boundary clock case, you
can configure which side is the client and which side is the master in a
config file and the PTPd will appropriately steer them individually.


 
  /sys/class/timesource/name/id
  /sys/class/ptp/ptp_clock_X/id
  
  Note: I am not too sure that this is exactly what people imagined,
but it is my best understanding so far. I gleaned two
different ideas about where to offer the clock id. In order
to keep just one way, I will be happy to remove the less
popular one.
 
 I see no fix proposed for the race condition I pointed out. This doesn't
 work.

So, if I recall this was: How do you keep the module from unloading
while its being used? 

There may need to be proper locking for unregistering the posix clock_id
on module unload, but I don't think we need a use-count to prevent the
module from being unloaded.

My question would be: How do we handle a USB network device ($14.99 now
with PTP!) being unplugged? We can't say Sorry! That's in use!. So we
note the hardware is gone, and return the proper error code.

Or am I missing something else?


 If the Linux system time is synchronized to the PHC via the PPS
 
 To which PHC we can have several
 
 + Intel IXP465
   - Auxiliary Slave/Master Mode Snapshot (optional interrupt)
   - Target Time (optional interrupt)
 
 And about 40 already supported by char driver interface clocks and rtcs
 in the kernel...

And those char driver interfaces are all subtly different.

I actually recently submitted an RFC to expose the RTC devices via the
posix clock/timer interface, because working with the RTC hardware
device directly is terrible for managing alarm interrupts. 

For instance, you easily run into the case where your TV recording
application programs an alarm to record your favorite show at 8pm. Then
your backup script programs an alarm to wake up at 2am to do your
nightly backups. Your box suspends and the next morning, you're missing
your favorite show!


 I'd say the inability to have multiple clocks and the race condition
 because of the clockid stuff leaves the proposal dead in the water.
 
 It also ignores the existing APIs we have floating around attached to
 devices.
 
 You need to make one small important change. You need to take the POSIX
 crap about enumerating things out and shoot it, bury it at a crossroads
 and sprinkle holy water on it.

We agree the list-by-name stuff isn't the way to go. :)


 Drop the clockid_t and swap it for a file handle like a proper Unix or
 Linux interface. The rest is much the same
 
   fd = open /sys/class/timesource/[whatever]
 
   various queries you may want to do to check the name etc
 
   fclock_adjtime(fd, ...)
   
 
 The posix interface is fundamentally flawed. It only works for staticly
 enumerable objects. Unix avoided that forty years ago by making the
 identifier a handle which immediately cures all your object lifetime
 problems in one swoop.

So, I don't really see how that's so different from what is being
proposed. The clock_id is dynamically assigned per registered clock, and
exposed via the sysfs interface from ptp hardware entry.

The only difference is the open/close reference counting, which I don't
think is necessary here (since we can't always keep the hardware from
going away).

thanks
-john


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread Christoph Lameter
On Thu, 23 Sep 2010, john stultz wrote:

  The HPET or pit timesource are also quite slow these days. You only need
  access periodically to essentially tune the TSC ratio.

 If we're using the TSC, then we're not using the PTP clock as you
 suggest. Further the HPET and PIT aren't used to steer the system time
 when we are using the TSC as a clocksource. Its only used to calibrate
 the initial constant freq used by the timekeeping code (and if its
 non-constant, we throw it out).

There is no other scalable time source available for fast timer access
than the time stamp counter in the cpu. Other time source require
memory accesses which is inherently slower.

An accurate other time source is used to adjust this clock. NTP does that
via the clock interfaces from user space which has its problems with
accuracy. PTP can provide the network synced time access
that would a more accurate calibration of the time.

 2) The way PTP clocks are steered to sync with network time causes their
 hardware freq to actually change. Since these adjustments are done on
 the hardware clock level, and not on the system time level, the
 adjustments to sync the system time/freq would then be made incorrect by
 PTP hardware adjustments.

Right. So use these as a way to fine tune the TSC clock (and thereby the
system time).

 3) Further, the PTP hardware counter can be simply set to a new offset
 to put it in line with the network time. This could cause trouble with
 timekeeping much like unsynced TSCs do.

You can do the same for system time.

 Now, what you seem to be suggesting is to use the TSC (or whatever
 clocksource the system time is using) but to steer the system time using
 the PTP clock. This is actually what is being proposed, however, the
 steering is done in userland. This is due to the fact that there are two
 components to the steering, 1) adjusting the PTP clock hardware to
 network time and 2) adjusting the system time to the PTP hardware. By
 exposing the PTP clock to userland via the posix clocks interface, we
 allow this to easily be done.

Userland code would introduce latencies that would make sub microsecond
time sync very difficult.

  We can switch underlying clocks for system time already. We can adapt to a
  different hw frequency.

 Actually no. The timekeeping code requires a fixed freq counter. Dealing
 with hardware freq changes is difficult, because error is introduced by
 the latency between when the freq changes and when the timekeeping code
 is notified of it. So the system treats the hardware counters as fixed
 freq. Now, hardware does vary freq ever so slightly as thermal
 conditions change, but this is addressed in userland and corrected via
 adjtimex.

Acadmic hair splitting? I have repeatedly switched between different
clocks on various systems. So its difficult but we do it?

 Unnecessary layers? Where? This approach has less in-kernel layers, as
 it exposes the PTP clock to userland, instead of trying to layer things
 on top of it and stretching the system time abstraction to cover it.

You dont need the user APIs if you directly use the PTP time source to
steer the system clock. In fact I think you have to do it in kernel space
since user space latencies will degrade accuracy otherwise.

 I've argued through the approach trying to keep it all internal to the
 kernel, but to do so would be anything but trivial. Further, there's the
 case of master-clocks, where the PTP hardware must be synced to system
 time, instead of the other way around. And then there's the case of
 boundary-clocks, which may have multiple PTP hardware clocks that have
 to be synced.

Ok maybe we need some sort of control interface to manage the clock like
the others have.

 I think exposing this through the posix clock interface is really the
 best approach. Its not a static clockid, so its not something most apps
 will ever have to deal with, but it allows the few apps that really need
 to have access to the PTP clock hardware can do so in a clean way.

It implies clock tuning in userspace for a potential sub microsecond
accurate clock. The clock accuracy will be limited by user space
latencies and noise. You wont be able to discipline the system clock
accurately.

The posix clocks today assumes one notion of real time in the kernel.
All clocks increase in lockstep (aside from offset updates). This approach
here result in multiple notions of time increasing at various speeds.
And it implies that someone is user space is trying to tinker around with
extremely low latencies using system call APIs that take much longer than
these intervals to process the data.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread Jacob Keller

  In contrast to the standard Linux system clock, a PHC is
  adjustable in hardware, for example using frequency compensation
  registers or a VCO. The ability to directly tune the PHC is
  essential to reap the benefit of hardware timestamping.

 There is a reason for not being able to shift posix clocks: The system has
 one time base. The various clocks are contributing to maintaining that
 sytem wide time.

 Adjusting clocks is absolutely essential for proper functioning of the PTP
protocol. The slave obtains and calculates the offset from master and uses
that in order to adjust the clock properly, The problem is that the
timestamps are done via the hardware. We need a method to expose that
hardware so that the ptp software can properly adjust those clocks.


 I do not understand why you want to maintain different clocks running at
 different speeds. Certainly interesting for some uses I guess that I
 do not have the energy to imagine right now. But can we get the PTP killer
 feature of synchronized accurate system time first?


The problem is maintaining a hardware clock at the correct speed/frequency
and time. The timestamping is done via hardware, and that hardware clock
needs to be accurate. We need to be able to modify that clock. Yes, having
the system time be the same value would be nice, but the problem comes
because we don't want to jump through hoops to keep that hardware clock
accurate to the ptp protocol running on the network.




 Instead, the patch set provides a way to offer a Pulse Per Second
 (PPS) event from the PHC to the Linux PPS subsystem. A user space
 application can read the PPS events and tune the system clock, just
 like when using other external time sources like radio clocks or
 GPS.

 User space is subject to various latencies created by the OS etc. I would
 that in order to have fine grained (read microsecond) accurary we would
 have to run the portions that are relevant to obtaining the desired
 accuracy in the kernel.


All of the necessary features for microsecond or better accuracy are done
via the hardware. You can get accuracy to within 10 mircoseconds while only
sending sync packets and such once per second. The reason is because the
hardware timestamps are very accurate. But if we can't properly adjust the
clocks time and frequency, we cannot maintain the accuracy of the
timestamps.



 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread Alan Cox
O I don't see how this is a problem, as it exposes the multiple hardware
 clocks via different posix clock ids. So in the boundary clock case, you
 can configure which side is the client and which side is the master in a
 config file and the PTPd will appropriately steer them individually.

They may all be slaves - that means you can't treat them as part of
system time.
 

 on module unload, but I don't think we need a use-count to prevent the
 module from being unloaded.
 
 My question would be: How do we handle a USB network device ($14.99 now
 with PTP!) being unplugged? We can't say Sorry! That's in use!. So we
 note the hardware is gone, and return the proper error code.
 
 Or am I missing something else?

Open list
Oh number 31 appears to be the device I want
Close list

USB unplugged
Random other device plugged

clock_op(31, )

Oh bugger I've just reprogrammed the wrong time source.

We don't have stop the device being removed, instead of a disaster you get

clock_op(fd, blah)
-ENODEV

which btw is how just about everything else USB works when you pull the
hardware.

  And about 40 already supported by char driver interface clocks and rtcs
  in the kernel...
 
 And those char driver interfaces are all subtly different.
 
 I actually recently submitted an RFC to expose the RTC devices via the
 posix clock/timer interface, because working with the RTC hardware
 device directly is terrible for managing alarm interrupts. 

Given that driver interfaces are sane and posix clock/timer interfaces
have totally broken enumeration maybe you have it backwards. But if you
follow through to my proposal maybe there is a saner answer still
 
 For instance, you easily run into the case where your TV recording
 application programs an alarm to record your favorite show at 8pm. Then
 your backup script programs an alarm to wake up at 2am to do your
 nightly backups. Your box suspends and the next morning, you're missing
 your favorite show!

Poor resource management, and yes I'd agree you want a sensible interface.


  Drop the clockid_t and swap it for a file handle like a proper Unix or
  Linux interface. The rest is much the same
  
  fd = open /sys/class/timesource/[whatever]
  
  various queries you may want to do to check the name etc
  
  fclock_adjtime(fd, ...)
  
  
  The posix interface is fundamentally flawed. It only works for staticly
  enumerable objects. Unix avoided that forty years ago by making the
  identifier a handle which immediately cures all your object lifetime
  problems in one swoop.
 
 So, I don't really see how that's so different from what is being
 proposed. The clock_id is dynamically assigned per registered clock, and
 exposed via the sysfs interface from ptp hardware entry.
 
 The only difference is the open/close reference counting, which I don't
 think is necessary here (since we can't always keep the hardware from
 going away).

It is absolutely neccessary in order that you can be sure that two calls
actually relate to the *same* device. It's as fundamental as the
difference betweeh chmod and fchmod although with the added ugliness of
some random numeric identifier stuck in the middle.

It also btw makes it much easier to fix up the existing random collection
of /dev/rtc devices - because you can open them and issue fclock_adjtime
if we are careful how we do it and it makes sense.

Alan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread Alan Cox
 There is no other scalable time source available for fast timer access
 than the time stamp counter in the cpu. Other time source require
 memory accesses which is inherently slower.

On what hardware ?

 An accurate other time source is used to adjust this clock. NTP does that
 via the clock interfaces from user space which has its problems with
 accuracy. PTP can provide the network synced time access
 that would a more accurate calibration of the time.

Accuracy does not require speed of access. Accuracy requires predictible
latency of access.

 Userland code would introduce latencies that would make sub microsecond
 time sync very difficult.

You can take a multiple micro-second I/O stall or SMI trap on a PC so you
already lost the battle on the platform you seem to be discussing.

 You dont need the user APIs if you directly use the PTP time source to
 steer the system clock. In fact I think you have to do it in kernel space
 since user space latencies will degrade accuracy otherwise.

PTP is not a 'time source' it is one or more source of time. The
distinction is rather important.

 It implies clock tuning in userspace for a potential sub microsecond
 accurate clock. The clock accuracy will be limited by user space
 latencies and noise. You wont be able to discipline the system clock
 accurately.

Noise matters, latency doesn't. And the kernel is getting more and more
real time support all the time.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.

2010-09-23 Thread Christian Riesch

Alan Cox wrote:

Please do not introduce useless additional layers for clock sync. Load
these ptp clocks like the other regular clock modules and make them sync
system time like any other clock.


I don't think you understand PTP. PTP has masters, a system can need to
be honouring multiple conflicting masters at once.


AFAIK the master's should not be conflicting. The Best Master Clock 
algorithm (BMC) defined in IEEE1588 selects the best master clock. This 
clock distributes its notion of time on the network while the other 
masters, that is the other clocks/nodes that are configured to 
potentially become a master, keep quiet. So usually we will only have 
one source of time (the master clock selected by the BMC) and we will 
steer our single PHC (PTP hardware clock) to follow this master (Of 
course there may be use-cases that require more than one PTP clock, 
e.g., for research purposes).


However, if the clock selected by the BMC is switched off, loses its 
network connection..., the second best clock is selected by the BMC and 
becomes master. This clock may be less accurate and thus our slave clock 
has to switch from one notion of time to another. Is that the conflict 
you mentioned?


Christian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread Christian Riesch

Alan Cox wrote:

It implies clock tuning in userspace for a potential sub microsecond
accurate clock. The clock accuracy will be limited by user space
latencies and noise. You wont be able to discipline the system clock
accurately.


Noise matters, latency doesn't. 


Well put! That's why we need hardware support for PTP timestamping to 
reduce the noise, but get along well with the clock servo that is 
steering the PHC in user space.


Christian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread john stultz
On Thu, 2010-09-23 at 15:49 -0500, Christoph Lameter wrote:
 On Thu, 23 Sep 2010, john stultz wrote:
 
   The HPET or pit timesource are also quite slow these days. You only need
   access periodically to essentially tune the TSC ratio.
 
  If we're using the TSC, then we're not using the PTP clock as you
  suggest. Further the HPET and PIT aren't used to steer the system time
  when we are using the TSC as a clocksource. Its only used to calibrate
  the initial constant freq used by the timekeeping code (and if its
  non-constant, we throw it out).
 
 There is no other scalable time source available for fast timer access
 than the time stamp counter in the cpu. Other time source require
 memory accesses which is inherently slower.

Right, but no one likes the HPET or ACPI PM for a clocksource, its just
the TSC isn't usable in some cases, so they have to be used.

We don't want to force folks to decide between closely sycned time and
fast time reads. So that is part of the reason why PTP as a clocksource
isn't a good idea.

 An accurate other time source is used to adjust this clock. NTP does that
 via the clock interfaces from user space which has its problems with
 accuracy. PTP can provide the network synced time access
 that would a more accurate calibration of the time.

Calibration isn't whats needed here (it is an issue, but a separate one
- and I've got some patches if you're interested!) as its a one-time
source of error and can be corrected by ntp today without trouble.
Adjustments to the system time is something that has to be done
continuously to handle for variable thermal drift over time.

  2) The way PTP clocks are steered to sync with network time causes their
  hardware freq to actually change. Since these adjustments are done on
  the hardware clock level, and not on the system time level, the
  adjustments to sync the system time/freq would then be made incorrect by
  PTP hardware adjustments.
 
 Right. So use these as a way to fine tune the TSC clock (and thereby the
 system time).

So you're then not suggesting to use the PTP as a clocksource.

Using the PTP hardware to adjust the system time freq is exactly whats
being proposed.

  3) Further, the PTP hardware counter can be simply set to a new offset
  to put it in line with the network time. This could cause trouble with
  timekeeping much like unsynced TSCs do.
 
 You can do the same for system time.

Settimeofday does allow CLOCK_REALTIME to jump, but the CLOCK_MONOTONIC
time cannot jump around. Having a clocksource that is non-monotonic
would break this.

  Now, what you seem to be suggesting is to use the TSC (or whatever
  clocksource the system time is using) but to steer the system time using
  the PTP clock. This is actually what is being proposed, however, the
  steering is done in userland. This is due to the fact that there are two
  components to the steering, 1) adjusting the PTP clock hardware to
  network time and 2) adjusting the system time to the PTP hardware. By
  exposing the PTP clock to userland via the posix clocks interface, we
  allow this to easily be done.
 
 Userland code would introduce latencies that would make sub microsecond
 time sync very difficult.

The design actually avoids most userland induced latency.

1) On the PTP hardware syncing point, the reference packet gets
timestamped with the PTP hardware time on arrival. This allows the
offset calculation to be done in userland without introducing latency.

2) On the system syncing side, the proposal for the PPS interrupt allows
the PTP hardware to trigger an interrupt on the second boundary that
would take a timestamp of the system time. Then the pps interface allows
for the timestamp to be read from userland allowing the offset to be
calculated without introducing additional latency.


   We can switch underlying clocks for system time already. We can adapt to a
   different hw frequency.
 
  Actually no. The timekeeping code requires a fixed freq counter. Dealing
  with hardware freq changes is difficult, because error is introduced by
  the latency between when the freq changes and when the timekeeping code
  is notified of it. So the system treats the hardware counters as fixed
  freq. Now, hardware does vary freq ever so slightly as thermal
  conditions change, but this is addressed in userland and corrected via
  adjtimex.
 
 Acadmic hair splitting? I have repeatedly switched between different
 clocks on various systems. So its difficult but we do it?

Sure, we handle the fairly-rare case of switching clocksources. And that
introduces a bit of error each time. But one doesn't expect to be
switching clock-sources every second and still keep synced time.


  Unnecessary layers? Where? This approach has less in-kernel layers, as
  it exposes the PTP clock to userland, instead of trying to layer things
  on top of it and stretching the system time abstraction to cover it.
 
 You dont need the user APIs if you directly use the PTP time source to

Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT

2010-09-23 Thread Benjamin Herrenschmidt
On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote:
 I don't see a generic accessor that can test PTE flags for user
 access -- in the absence of one, I guess we need an ifdef here.  Or at
 least put in a comment so anyone who adds a userspace use knows they
 need to fix it. 

We could make up one in powerpc arch at least

#define pte_user(val) ((val  _PAGE_USER) == _PAGE_USER)

would do

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: ppc44x - how do i optimize driver for tlb hits

2010-09-23 Thread Benjamin Herrenschmidt
On Thu, 2010-09-23 at 10:12 -0500, Ayman El-Khashab wrote:
 I've implemented a working driver on my 460EX.  it allocates a couple
 of buffers of 4MB each.  I have a custom memcmp algorithm in asm that
 is extremely fast in user space, but 1/2 as fast when run on these
 buffers.
 
 my tests are showing that the algorithm seems to be memory bandwidth
 bound.  my guess is that i am having tlb or cache misses (my algo
 uses the dbct) that is slowing performance.  curiously when in user
 space, i can affect the performance by small changes in the size of
 the buffer, i.e. 4MB + 32B is fast, 4MB + 4K is much worse.
 
 Can i adjust my driver code that is using kmalloc to make sure that
 the ppc44x has 4MB tlb entries for these and that they stay put?

Anything you allocate with kmalloc() is going to be mapped by bolted
256M TLB entries, so there should be no TLB misses happening in the
kernel case.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support

2010-09-23 Thread john stultz
On Thu, 2010-09-23 at 22:30 +0100, Alan Cox wrote:
 O I don't see how this is a problem, as it exposes the multiple hardware
  clocks via different posix clock ids. So in the boundary clock case, you
  can configure which side is the client and which side is the master in a
  config file and the PTPd will appropriately steer them individually.
 
 They may all be slaves - that means you can't treat them as part of
 system time.

Sure, and that's something one would configure. So I'm not sure I see
how exposing the different hardware bits via a clock_id is problematic.
They're just clocks that are being exposed. The steering of system time
to PTP or PTP to system time  (or just PTP to other PTP clocks).


  on module unload, but I don't think we need a use-count to prevent the
  module from being unloaded.
  
  My question would be: How do we handle a USB network device ($14.99 now
  with PTP!) being unplugged? We can't say Sorry! That's in use!. So we
  note the hardware is gone, and return the proper error code.
  
  Or am I missing something else?
 
 Open list
 Oh number 31 appears to be the device I want
 Close list
 
   USB unplugged
   Random other device plugged
 
 clock_op(31, )
 
 Oh bugger I've just reprogrammed the wrong time source.

Ok. So its just the issue of clock_id reuse. I was confusing it with
some sort of module use counting issue.  And yea, I can see how it might
be  easier to re-use the file descriptor then re-implementing the reuse
logic in the posix-clock registration.


 We don't have stop the device being removed, instead of a disaster you get
 
   clock_op(fd, blah)
   -ENODEV
 
 which btw is how just about everything else USB works when you pull the
 hardware.

Right, which was what I was thinking as well, but assuming we didn't
re-use clockids quickly.
 
  So, I don't really see how that's so different from what is being
  proposed. The clock_id is dynamically assigned per registered clock, and
  exposed via the sysfs interface from ptp hardware entry.
  
  The only difference is the open/close reference counting, which I don't
  think is necessary here (since we can't always keep the hardware from
  going away).
 
 It is absolutely neccessary in order that you can be sure that two calls
 actually relate to the *same* device. It's as fundamental as the
 difference betweeh chmod and fchmod although with the added ugliness of
 some random numeric identifier stuck in the middle.
 
 It also btw makes it much easier to fix up the existing random collection
 of /dev/rtc devices - because you can open them and issue fclock_adjtime
 if we are careful how we do it and it makes sense.

Wait, you're suggesting we add new fclock_* calls that duplicate the
posix interface? That doesn't sound great to me.

What did you think of Kyle Moffett's suggestion of utilizing the fd to
map to the clock_id which could then be used by the posix clocks
interface?

Although I'm still not sure if it wouldn't be so hard to just simply
increment the id on each registration and index to a clock through a
reasonably small hash table. I suspect that would solve the
enumeration/reuse issue without much trouble (but again, I'm open to
being corrected if I'm missing something larger).

But yes, in summary, this is an issue to be addressed one way or
another.

thanks
-john


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.

2010-09-23 Thread Benjamin Herrenschmidt
On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote:
 A new syscall is introduced that allows tuning of a POSIX clock. The
 syscall is implemented for four architectures: arm, blackfin, powerpc,
 and x86.
 
 The new syscall, clock_adjtime, takes two parameters, the clock ID,
 and a pointer to a struct timex. The semantics of the timex struct
 have been expanded by one additional mode flag, which allows an
 absolute offset correction. When specificied, the clock offset is
 immediately corrected by adding the given time value to the current
 time value.

Any reason why you CC'ed device-tree discuss ?

This list is getting way too much unrelated stuff, which I find
annoying, it would be nice if we were all a bit more careful here with
our CC lists.

Cheers,
Ben.

 Signed-off-by: Richard Cochran richard.coch...@omicron.at
 ---
  arch/arm/include/asm/unistd.h  |1 +
  arch/arm/kernel/calls.S|1 +
  arch/blackfin/include/asm/unistd.h |3 +-
  arch/blackfin/mach-common/entry.S  |1 +
  arch/powerpc/include/asm/systbl.h  |1 +
  arch/powerpc/include/asm/unistd.h  |3 +-
  arch/x86/ia32/ia32entry.S  |1 +
  arch/x86/include/asm/unistd_32.h   |3 +-
  arch/x86/include/asm/unistd_64.h   |2 +
  arch/x86/kernel/syscall_table_32.S |1 +
  include/linux/posix-timers.h   |3 +
  include/linux/syscalls.h   |2 +
  include/linux/timex.h  |3 +-
  kernel/compat.c|  136 
 +++-
  kernel/posix-cpu-timers.c  |4 +
  kernel/posix-timers.c  |   17 +
  16 files changed, 130 insertions(+), 52 deletions(-)
 
 diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
 index c891eb7..f58d881 100644
 --- a/arch/arm/include/asm/unistd.h
 +++ b/arch/arm/include/asm/unistd.h
 @@ -396,6 +396,7 @@
  #define __NR_fanotify_init   (__NR_SYSCALL_BASE+367)
  #define __NR_fanotify_mark   (__NR_SYSCALL_BASE+368)
  #define __NR_prlimit64   (__NR_SYSCALL_BASE+369)
 +#define __NR_clock_adjtime   (__NR_SYSCALL_BASE+370)
  
  /*
   * The following SWIs are ARM private.
 diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
 index 5c26ecc..430de4c 100644
 --- a/arch/arm/kernel/calls.S
 +++ b/arch/arm/kernel/calls.S
 @@ -379,6 +379,7 @@
   CALL(sys_fanotify_init)
   CALL(sys_fanotify_mark)
   CALL(sys_prlimit64)
 +/* 370 */CALL(sys_clock_adjtime)
  #ifndef syscalls_counted
  .equ syscalls_padding, ((NR_syscalls + 3)  ~3) - NR_syscalls
  #define syscalls_counted
 diff --git a/arch/blackfin/include/asm/unistd.h 
 b/arch/blackfin/include/asm/unistd.h
 index 14fcd25..79ad99b 100644
 --- a/arch/blackfin/include/asm/unistd.h
 +++ b/arch/blackfin/include/asm/unistd.h
 @@ -392,8 +392,9 @@
  #define __NR_fanotify_init   371
  #define __NR_fanotify_mark   372
  #define __NR_prlimit64   373
 +#define __NR_clock_adjtime   374
  
 -#define __NR_syscall 374
 +#define __NR_syscall 375
  #define NR_syscalls  __NR_syscall
  
  /* Old optional stuff no one actually uses */
 diff --git a/arch/blackfin/mach-common/entry.S 
 b/arch/blackfin/mach-common/entry.S
 index af1bffa..ee68730 100644
 --- a/arch/blackfin/mach-common/entry.S
 +++ b/arch/blackfin/mach-common/entry.S
 @@ -1631,6 +1631,7 @@ ENTRY(_sys_call_table)
   .long _sys_fanotify_init
   .long _sys_fanotify_mark
   .long _sys_prlimit64
 + .long _sys_clock_adjtime
  
   .rept NR_syscalls-(.-_sys_call_table)/4
   .long _sys_ni_syscall
 diff --git a/arch/powerpc/include/asm/systbl.h 
 b/arch/powerpc/include/asm/systbl.h
 index 3d21266..2485d8f 100644
 --- a/arch/powerpc/include/asm/systbl.h
 +++ b/arch/powerpc/include/asm/systbl.h
 @@ -329,3 +329,4 @@ COMPAT_SYS(rt_tgsigqueueinfo)
  SYSCALL(fanotify_init)
  COMPAT_SYS(fanotify_mark)
  SYSCALL_SPU(prlimit64)
 +COMPAT_SYS_SPU(clock_adjtime)
 diff --git a/arch/powerpc/include/asm/unistd.h 
 b/arch/powerpc/include/asm/unistd.h
 index 597e6f9..85d5067 100644
 --- a/arch/powerpc/include/asm/unistd.h
 +++ b/arch/powerpc/include/asm/unistd.h
 @@ -348,10 +348,11 @@
  #define __NR_fanotify_init   323
  #define __NR_fanotify_mark   324
  #define __NR_prlimit64   325
 +#define __NR_clock_adjtime   326
  
  #ifdef __KERNEL__
  
 -#define __NR_syscalls326
 +#define __NR_syscalls327
  
  #define __NR__exit __NR_exit
  #define NR_syscalls  __NR_syscalls
 diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
 index 518bb99..0ed7896 100644
 --- a/arch/x86/ia32/ia32entry.S
 +++ b/arch/x86/ia32/ia32entry.S
 @@ -851,4 +851,5 @@ ia32_sys_call_table:
   .quad sys_fanotify_init
   .quad sys32_fanotify_mark
   .quad sys_prlimit64 /* 340 */
 + .quad compat_sys_clock_adjtime
  ia32_syscall_end:
 diff --git a/arch/x86/include/asm/unistd_32.h 
 b/arch/x86/include/asm/unistd_32.h

Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop

2010-09-23 Thread Randy Dunlap
On Thu, 23 Sep 2010 22:16:32 +0200 Mikael Pettersson wrote:

 Randy Dunlap writes:
   On Thu, 23 Sep 2010 13:53:18 +0200 Mikael Pettersson wrote:
   
Running modules_install from a newly built 2.6.36-rc5 kernel
on my 32-bit PowerMac results in:

WARNING: Module 
 /lib/modules/2.6.36-rc5/kernel/drivers/i2c/busses/i2c-powermac.ko ignored, 
 due to loop
WARNING: Loop detected: 
 /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko needs of_i2c.ko which 
 needs i2c-core.ko again!
WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko 
 ignored, due to loop
WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-dev.ko 
 ignored, due to loop
WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/of/of_i2c.ko 
 ignored, due to loop
WARNING: Module /lib/modules/2.6.36-rc5/kernel/sound/ppc/snd-powermac.ko 
 ignored, due to loop

 grep '.*I2C.*=' .config
CONFIG_OF_I2C=m
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_POWERMAC=m

I can't say exactly when this started, haven't built kernels on this
box in a while.
   
   
   No kconfig warnings?
 
 Not that I recall.  I can check tomorrow if necessary.

No kconfig warnings.  I checked with your .config file.

   Please post your full .config file.

Just a matter of module i2c-core calls of_ functions and module of_i2c calls
i2c_ functions.  Hmph.  Something for Grant, Jean, and Ben to work out.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] fsldma: add support to 36-bit physical address

2010-09-23 Thread Dan Williams
On Tue, Sep 21, 2010 at 10:41 PM, Kumar Gala ga...@kernel.crashing.org wrote:

 On Sep 21, 2010, at 5:34 PM, Timur Tabi wrote:

 On Tue, Sep 21, 2010 at 5:17 PM, Scott Wood scottw...@freescale.com wrote:

 It needs to be the actual device that is performing the DMA -- the
 platform may need to do things such as IOMMU manipulation where
 knowing the device matters.

 Ok, this all makes sense.  So it appears that the patch is valid, at
 least in theory.  I would like to see some testing of it, but I
 realize that may be too difficult.  There's no easy way to force an
 allocation above 4GB.

 I think the patch is pretty safe w/o testing.  However I agree we need a 
 better solution to testing 36-bit addressing.

I'll take that as an acked-by, but I'll wait for the next version of
the patch with the completed changelog before acting on it.

--
Dan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v1 3/4] PPC4xx: New file with SoC specific functions

2010-09-23 Thread tmarri
From: Tirumala Marri tma...@apm.com

This patch creates new file with SoC dependent functions.

Signed-off-by: Tirumala R Marri tma...@apm.com
---
V1:
  * Remove all 440SPe specific references.
  * Move some of the code from header file to c file.
---
 drivers/dma/ppc4xx/ppc4xx-adma.c | 1658 ++
 1 files changed, 1658 insertions(+), 0 deletions(-)
 create mode 100644 drivers/dma/ppc4xx/ppc4xx-adma.c

diff --git a/drivers/dma/ppc4xx/ppc4xx-adma.c b/drivers/dma/ppc4xx/ppc4xx-adma.c
new file mode 100644
index 000..5a5da23
--- /dev/null
+++ b/drivers/dma/ppc4xx/ppc4xx-adma.c
@@ -0,0 +1,1658 @@
+/*
+ * Copyright (C) 2006-2009 DENX Software Engineering.
+ *
+ * Author: Yuri Tikhonov y...@emcraft.com
+ *
+ * Further porting to arch/powerpc by
+ * Anatolij Gustschin ag...@denx.de
+ * Tirumala R Marri tma...@apm.com
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This driver supports the asynchrounous DMA copy and RAID engines available
+ * on the AMCC PPC440SPe Processors.
+ * Based on the Intel Xscale(R) family of I/O Processors (IOP 32x, 33x, 134x)
+ * ADMA driver written by D.Williams.
+ */
+
+#include linux/of.h
+#include linux/of_platform.h
+#include asm/dcr.h
+#include asm/dcr-regs.h
+#include linux/async_tx.h
+#include linux/dma-mapping.h
+#include linux/slab.h
+#include adma.h
+#if defined(CONFIG_440SPe) || defined(CONFIG_440SP)
+#include ppc440spe-dma.h
+#endif
+#include ppc4xx-adma.h
+
+/* This array is used in data-check operations for storing a pattern */
+static char ppc4xx_qword[16];
+static atomic_t ppc4xx_adma_err_irq_ref;
+static unsigned int ppc4xx_mq_dcr_len;
+
+/* These are used in enable  check routines
+ */
+static u32 ppc4xx_r6_enabled;
+static struct completion ppc4xx_r6_test_comp;
+
+static struct page *ppc4xx_rxor_srcs[32];
+
+static dcr_host_t ppc4xx_mq_dcr_host;
+/* Pointer to DMA0, DMA1 CP/CS FIFO */
+static void *ppc4xx_dma_fifo_buf;
+
+static char *ppc_adma_errors[] = {
+   [PPC_ADMA_INIT_OK] = ok,
+   [PPC_ADMA_INIT_MEMRES] = failed to get memory resource,
+   [PPC_ADMA_INIT_MEMREG] = failed to request memory region,
+   [PPC_ADMA_INIT_ALLOC] = failed to allocate memory for adev 
+   structure,
+   [PPC_ADMA_INIT_COHERENT] = failed to allocate coherent memory for 
+   hardware descriptors,
+   [PPC_ADMA_INIT_CHANNEL] = failed to allocate memory for channel,
+   [PPC_ADMA_INIT_IRQ1] = failed to request first irq,
+   [PPC_ADMA_INIT_IRQ2] = failed to request second irq,
+   [PPC_ADMA_INIT_REGISTER] = failed to register dma async device,
+};
+
+static void ppc4xx_adma_dma2rxor_set_mult(struct ppc4xx_adma_desc_slot *desc,
+ int index, u8 mult);
+static void print_cb_list(struct ppc4xx_adma_chan *chan,
+ struct ppc4xx_adma_desc_slot *iter);
+/**
+ * ppc4xx_can_rxor - check if the operands may be processed with RXOR
+ */
+static int ppc4xx_can_rxor(struct page **srcs, int src_cnt, size_t len)
+{
+   int i, order = 0, state = 0;
+   int idx = 0;
+
+   if (unlikely(!(src_cnt  1)))
+   return 0;
+
+   BUG_ON(src_cnt  ARRAY_SIZE(ppc4xx_rxor_srcs));
+
+   /* Skip holes in the source list before checking */
+   for (i = 0; i  src_cnt; i++) {
+   if (!srcs[i])
+   continue;
+   ppc4xx_rxor_srcs[idx++] = srcs[i];
+   }
+   src_cnt = idx;
+
+   for (i = 1; i  src_cnt; i++) {
+   char *cur_addr = page_address(ppc4xx_rxor_srcs[i]);
+   char *old_addr = page_address(ppc4xx_rxor_srcs[i - 1]);
+
+   switch (state) {
+   case 0:
+   if (cur_addr == old_addr + len) {
+   /* direct RXOR */
+   order = 1;
+   state = 1;
+   } else if (old_addr == cur_addr + len) {
+   /* reverse RXOR */
+   order = -1;
+   state = 1;
+   } else
+   goto out;
+   

[PATCH v1 4/4] PPC4xx: Merge files to create single 440spe header

2010-09-23 Thread tmarri
From: Tirumala Marri tma...@apm.com

This patch merges dma.h and xor.h to create ppc440spe-dma.h

Signed-off-by: Tirumala R Marri tma...@apm.com
---
V1:
  * No change.
---
 drivers/dma/ppc4xx/dma.h   |  223 -
 drivers/dma/ppc4xx/ppc440spe-dma.h |  318 
 drivers/dma/ppc4xx/xor.h   |  110 -
 3 files changed, 318 insertions(+), 333 deletions(-)
 delete mode 100644 drivers/dma/ppc4xx/dma.h
 create mode 100644 drivers/dma/ppc4xx/ppc440spe-dma.h
 delete mode 100644 drivers/dma/ppc4xx/xor.h

diff --git a/drivers/dma/ppc4xx/dma.h b/drivers/dma/ppc4xx/dma.h
deleted file mode 100644
index bcde2df..000
--- a/drivers/dma/ppc4xx/dma.h
+++ /dev/null
@@ -1,223 +0,0 @@
-/*
- * 440SPe's DMA engines support header file
- *
- * 2006-2009 (C) DENX Software Engineering.
- *
- * Author: Yuri Tikhonov y...@emcraft.com
- *
- * This file is licensed under the term of  the GNU General Public License
- * version 2. The program licensed as is without any warranty of any
- * kind, whether express or implied.
- */
-
-#ifndef_PPC440SPE_DMA_H
-#define _PPC440SPE_DMA_H
-
-#include linux/types.h
-
-/* Number of elements in the array with statical CDBs */
-#defineMAX_STAT_DMA_CDBS   16
-/* Number of DMA engines available on the contoller */
-#define DMA_ENGINES_NUM2
-
-/* Maximum h/w supported number of destinations */
-#define DMA_DEST_MAX_NUM   2
-
-/* FIFO's params */
-#define DMA0_FIFO_SIZE 0x1000
-#define DMA1_FIFO_SIZE 0x1000
-#define DMA_FIFO_ENABLE(112)
-
-/* DMA Configuration Register. Data Transfer Engine PLB Priority: */
-#define DMA_CFG_DXEPR_LP   (026)
-#define DMA_CFG_DXEPR_HP   (326)
-#define DMA_CFG_DXEPR_HHP  (226)
-#define DMA_CFG_DXEPR_HHHP (126)
-
-/* DMA Configuration Register. DMA FIFO Manager PLB Priority: */
-#define DMA_CFG_DFMPP_LP   (023)
-#define DMA_CFG_DFMPP_HP   (323)
-#define DMA_CFG_DFMPP_HHP  (223)
-#define DMA_CFG_DFMPP_HHHP (123)
-
-/* DMA Configuration Register. Force 64-byte Alignment */
-#define DMA_CFG_FALGN  (1  19)
-
-/*UIC0:*/
-#define D0CPF_INT  (112)
-#define D0CSF_INT  (111)
-#define D1CPF_INT  (110)
-#define D1CSF_INT  (19)
-/*UIC1:*/
-#define DMAE_INT   (19)
-
-/* I2O IOP Interrupt Mask Register */
-#define I2O_IOPIM_P0SNE(13)
-#define I2O_IOPIM_P0EM (15)
-#define I2O_IOPIM_P1SNE(16)
-#define I2O_IOPIM_P1EM (18)
-
-/* DMA CDB fields */
-#define DMA_CDB_MSK(0xF)
-#define DMA_CDB_64B_ADDR   (12)
-#define DMA_CDB_NO_INT (13)
-#define DMA_CDB_STATUS_MSK (0x3)
-#define DMA_CDB_ADDR_MSK   (0xFFF0)
-
-/* DMA CDB OpCodes */
-#define DMA_CDB_OPC_NO_OP  (0x00)
-#define DMA_CDB_OPC_MV_SG1_SG2 (0x01)
-#define DMA_CDB_OPC_MULTICAST  (0x05)
-#define DMA_CDB_OPC_DFILL128   (0x24)
-#define DMA_CDB_OPC_DCHECK128  (0x23)
-
-#define DMA_CUED_XOR_BASE  (0x1000)
-#define DMA_CUED_XOR_HB(0x0008)
-
-#ifdef CONFIG_440SP
-#define DMA_CUED_MULT1_OFF 0
-#define DMA_CUED_MULT2_OFF 8
-#define DMA_CUED_MULT3_OFF 16
-#define DMA_CUED_REGION_OFF24
-#define DMA_CUED_XOR_WIN_MSK   (0xFC00)
-#else
-#define DMA_CUED_MULT1_OFF 2
-#define DMA_CUED_MULT2_OFF 10
-#define DMA_CUED_MULT3_OFF 18
-#define DMA_CUED_REGION_OFF26
-#define DMA_CUED_XOR_WIN_MSK   (0xF000)
-#endif
-
-#define DMA_CUED_REGION_MSK0x3
-#define DMA_RXOR1230x0
-#define DMA_RXOR1240x1
-#define DMA_RXOR1250x2
-#define DMA_RXOR12 0x3
-
-/* S/G addresses */
-#define DMA_CDB_SG_SRC 1
-#define DMA_CDB_SG_DST12
-#define DMA_CDB_SG_DST23
-
-/*
- * DMAx engines Command Descriptor Block Type
- */
-struct dma_cdb {
-   /*
-* Basic CDB structure (Table 20-17, p.499, 440spe_um_1_22.pdf)
-*/
-   u8  pad0[2];/* reserved */
-   u8  attr;   /* attributes */
-   u8  opc;/* opcode */
-   u32 sg1u;   /* upper SG1 address */
-   u32 sg1l;   /* lower SG1 address */
-   u32 cnt;/* SG count, 3B used */
-   u32 sg2u;   /* upper SG2 address */
-   u32 sg2l;   /* lower SG2 address */
-   u32 sg3u;   /* upper SG3 address */
-   u32 sg3l;   /* lower SG3 address */
-};
-
-/*
- * DMAx hardware registers (p.515 in 440SPe UM 1.22)
- */
-struct dma_regs {
-   u32 cpfpl;
-   u32 cpfph;
-   u32 csfpl;
-   u32 csfph;
-   u32 dsts;
-   u32 cfg;
-   u8  pad0[0x8];
-   u16 cpfhp;
-   u16 cpftp;
-   u16 csfhp;
-   u16 csftp;
-   u8  pad1[0x8];
-   u32 acpl;
-   u32 acph;
-   u32 s1bpl;
-   u32

Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.

2010-09-23 Thread Thomas Gleixner
On Fri, 24 Sep 2010, Benjamin Herrenschmidt wrote:
 On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote:
  The new syscall, clock_adjtime, takes two parameters, the clock ID,
  and a pointer to a struct timex. The semantics of the timex struct
  have been expanded by one additional mode flag, which allows an
  absolute offset correction. When specificied, the clock offset is
  immediately corrected by adding the given time value to the current
  time value.
 
 Any reason why you CC'ed device-tree discuss ?
 
 This list is getting way too much unrelated stuff, which I find
 annoying, it would be nice if we were all a bit more careful here with
 our CC lists.

Says the guy who missed to trim the useless context of the original
mail, which made me scroll down all the way just to find out that
there is nothing to see.

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: ppc44x - how do i optimize driver for tlb hits

2010-09-23 Thread Ayman El-Khashab
On Fri, Sep 24, 2010 at 08:01:04AM +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2010-09-23 at 10:12 -0500, Ayman El-Khashab wrote:
  I've implemented a working driver on my 460EX.  it allocates a couple
  of buffers of 4MB each.  I have a custom memcmp algorithm in asm that
  is extremely fast in user space, but 1/2 as fast when run on these
  buffers.
  
  my tests are showing that the algorithm seems to be memory bandwidth
  bound.  my guess is that i am having tlb or cache misses (my algo
  uses the dbct) that is slowing performance.  curiously when in user
  space, i can affect the performance by small changes in the size of
  the buffer, i.e. 4MB + 32B is fast, 4MB + 4K is much worse.
  
  Can i adjust my driver code that is using kmalloc to make sure that
  the ppc44x has 4MB tlb entries for these and that they stay put?
 
 Anything you allocate with kmalloc() is going to be mapped by bolted
 256M TLB entries, so there should be no TLB misses happening in the
 kernel case.
 

Hi Ben, can you or somebody elaborate?  I saw the pinned tlb in 44x_mmu.c.
Perhaps I don't understand the code fully, but it appears to map 256MB
of lowmem into a pinned tlb.  I am not sure what phys address lowmem
means, but I assumed (possibly incorrectly) that it is 0-256MB.  When I
get the physical addresses for my buffers after kmalloc, they all have
addresses that are within my DRAM but start at about the 440MB mark. I
end up passing those phys addresses to my DMA engine.

When my compare runs it takes a huge amount of time in the assembly code
doing memory fetches which makes me think that there are either tons of
cache misses (despite the prefetching) or the entries have been purged
from the TLB and must be obtained again.  As an experiment, I disabled
my cache prefetch code and the algo took forever.  Next I altered the
asm to do the same amount of data but a smaller amount over and over 
so that less if fetched from main memory.  That executed very quickly.
From that I drew the conclusion that the algorithm is memory bandwidth
limited.

In a standalone configuration (i.e. algorithm just using user memory,
everything else identical), the speedup is 2-3x.  So the limitation 
is not a hardware limit, it must be something that is happening when
I execute the loads.  (it is a compare algorithm, so it only does
loads).

Thanks
Ayman


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c

2010-09-23 Thread Tirumala Marri

 Will both versions of this driver exist in the same kernel build?  For
 example the iop-adma driver supports iop13xx and iop3xx, but we select
 the archtitecture at build time?  Or, as I assume in this case, will
 the
 two (maybe more?) ppc4xx adma drivers all be built in the same image,
 more like ioatdma?


[Marri] We select the architecture at build time.




 In the latter case I would recommend a file structure like:

 drivers/dma/ppc4xx/adma.c
 drivers/dma/ppc4xx/adma_440spe.c
 drivers/dma/ppc4xx/adma_460ex.c

 With patches to move the chipset specific pieces to their own file.
 Minimizing the code churn in adma.c, or at least showing a progression
 of what is unique and needs to be moved.

 This would be similar to how ioatdma is structured and compiles a
 single
 driver to cover the three major hardware revisions.


[Marri]Looks like this driver is similar to iop-adma driver.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v1 1/4] PPC4xx: Generalizing ADMA driver modifications

2010-09-23 Thread Tirumala Marri

 Did you look at this changelog before sending?  It just deletes 4000
 lines of code??
[Marri] The reason I have to send it in different file is the size of the
patch.
There seem to be issue with patch sizes 200k or more.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v1 1/4] PPC4xx: Generalizing ADMA driver modifications

2010-09-23 Thread Dan Williams

On 9/23/2010 3:10 PM, tma...@apm.com wrote:

From: Tirumala Marritma...@apm.com

This patch generalizes the existing drver/dma/ppc4xx/adma.c, so that
common code can be shared between different similar DMA engine
drivers in other SoCs. Also Makefile and Kconfig changed to accommodate
PPC4XX.

Signed-off-by: Tirumala R Marritma...@apm.com
---
V1:
   * No change.
---
  arch/powerpc/include/asm/async_tx.h |4 +-
  drivers/dma/Kconfig |6 +-
  drivers/dma/Makefile|2 +-
  drivers/dma/ppc4xx/Makefile |2 +-
  drivers/dma/ppc4xx/adma.c   | 4437 +++
  drivers/dma/ppc4xx/adma.h   |   92 +-
  6 files changed, 354 insertions(+), 4189 deletions(-)


Did you look at this changelog before sending?  It just deletes 4000 
lines of code??


Moving and renaming code in one patch makes it very difficult to verify 
the result.  When generalizing code the first thing I want to see with a 
very quick glance at the patch(es) is that the existing implementation 
is not harmed.  One way to go about this is to first identify the 
portions of existing code that you want to reuse in your driver and the 
pieces that are truly ppc440spe specific.  Move the ppc440spe pieces to 
their own file (get this reviewed and approved by the ppc440spe 
authors).  The remaining code in adma.c will be assumed generic.  You 
can then have another patch to do a simple s/ppc440spe/ppc4xx/ in adma.c 
(no other logic changes or code movement).  Then you can introduce your 
ppc460ex unique implementation that calls into adma.c.


I don't want to see patches along the lines of rename 
drivers/dma/ppc4xx/adma.c to drivers/dma/ppc4xx/ppc4xx-adma.c because 
that is just redundant.  Assume that the existing generic file names are 
where the common code will lie and then add hw-implementation specific 
files to call into that base.


Another rule is that the conversion should be bisectable at every step, 
I should be able to apply each patch in the series and still have a 
functional/runnable result.


--
Dan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v1 3/4] PPC4xx: New file with SoC specific functions

2010-09-23 Thread Dan Williams

On 9/23/2010 3:11 PM, tma...@apm.com wrote:

From: Tirumala Marritma...@apm.com

This patch creates new file with SoC dependent functions.

Signed-off-by: Tirumala R Marritma...@apm.com
---
V1:
   * Remove all 440SPe specific references.


Maybe it renames ppc440spe to ppc4xx but it adds things like...


+#if defined(CONFIG_440SPe) || defined(CONFIG_440SP)
+   np = of_find_compatible_node(NULL, NULL, ibm,i2o-440spe);
+#endif


...in the code.  Which is 1) not generic and 2) I suspect causes a 
compile warning for using an uninitialized variable.



+   if (!np) {
+   pr_err(%s: can't find I2O device tree node\n,
+  __func__);
+   ret = -ENODEV;
+   goto err_req2;
+   }


It looks to me like the common code will need to have a few build 
dependent helper routines as it appears one instance of the driver 
cannot simultaneously support 440sp, 440spe, and 460ex.


--
Dan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v1 1/4] PPC4xx: Generalizing ADMA driver modifications

2010-09-23 Thread Dan Williams

On 9/23/2010 3:44 PM, Tirumala Marri wrote:


Did you look at this changelog before sending?  It just deletes 4000
lines of code??

[Marri] The reason I have to send it in different file is the size of the
patch.
There seem to be issue with patch sizes 200k or more.


Read the rest of what I wrote:


Moving and renaming code in one patch makes it very difficult to verify
the result.  When generalizing code the first thing I want to see with a
very quick glance at the patch(es) is that the existing implementation
is not harmed.  One way to go about this is to first identify the
portions of existing code that you want to reuse in your driver and the
pieces that are truly ppc440spe specific.  Move the ppc440spe pieces to
their own file (get this reviewed and approved by the ppc440spe
authors).  The remaining code in adma.c will be assumed generic.  You
can then have another patch to do a simple s/ppc440spe/ppc4xx/ in adma.c
(no other logic changes or code movement).  Then you can introduce your
ppc460ex unique implementation that calls into adma.c.


The patch would not be so large if you leave the existing code where it 
is and move the implementation specific pieces to their own file.


--
Dan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: ppc44x - how do i optimize driver for tlb hits

2010-09-23 Thread Benjamin Herrenschmidt
On Thu, 2010-09-23 at 17:35 -0500, Ayman El-Khashab wrote:
 Anything you allocate with kmalloc() is going to be mapped by bolted
  256M TLB entries, so there should be no TLB misses happening in the
  kernel case.
  
 
 Hi Ben, can you or somebody elaborate?  I saw the pinned tlb in
 44x_mmu.c.
 Perhaps I don't understand the code fully, but it appears to map 256MB
 of lowmem into a pinned tlb.  I am not sure what phys address lowmem
 means, but I assumed (possibly incorrectly) that it is 0-256MB. 

No. The first pinned entry (0...256M) is inserted by the asm code in
head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem
(typically up to 768M but various settings can change that) using more
256M entries.

Basically, all of lowmem is permanently mapped with such entries. 

 When I get the physical addresses for my buffers after kmalloc, they
 all have addresses that are within my DRAM but start at about the
 440MB mark. I end up passing those phys addresses to my DMA engine.

Anything you get from kmalloc is going to come from lowmem, and thus be
covered by those bolted TLB entries.

 When my compare runs it takes a huge amount of time in the assembly
 code doing memory fetches which makes me think that there are either
 tons of cache misses (despite the prefetching) or the entries have
 been purged

What prefetching ? IE. The DMA operation -will- flush things out of the
cache due to the DMA being not cache coherent on 44x. The 440 also
doesn't have a working HW prefetch engine afaik (it should be disabled
in FW or early asm on 440 cores and fused out in HW on 460 cores afaik).

So only explicit SW prefetching will help.

 from the TLB and must be obtained again.  As an experiment, I disabled
 my cache prefetch code and the algo took forever.  Next I altered the
 asm to do the same amount of data but a smaller amount over and over 
 so that less if fetched from main memory.  That executed very quickly.
 From that I drew the conclusion that the algorithm is memory
 bandwidth limited.

I don't know what exactly is going on, maybe your prefetch stride isn't
right for the HW setup, or something like that. You can use xmon 'u'
command to look at the TLB content. Check that we have the 256M entries
mapping your data, they should be there.

 In a standalone configuration (i.e. algorithm just using user memory,
 everything else identical), the speedup is 2-3x.  So the limitation 
 is not a hardware limit, it must be something that is happening when
 I execute the loads.  (it is a compare algorithm, so it only does
 loads). 

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.

2010-09-23 Thread Benjamin Herrenschmidt
On Fri, 2010-09-24 at 00:12 +0200, Thomas Gleixner wrote:
  This list is getting way too much unrelated stuff, which I find
  annoying, it would be nice if we were all a bit more careful here
 with
  our CC lists.
 
 Says the guy who missed to trim the useless context of the original
 mail, which made me scroll down all the way just to find out that
 there is nothing to see. 

Heh, you can usually ignore what's after my signature :-) At least I
didn't put my reply all the way down the bottom !

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: ppc44x - how do i optimize driver for tlb hits

2010-09-23 Thread Ayman El-Khashab
On Fri, Sep 24, 2010 at 11:07:24AM +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2010-09-23 at 17:35 -0500, Ayman El-Khashab wrote:
  Anything you allocate with kmalloc() is going to be mapped by bolted
   256M TLB entries, so there should be no TLB misses happening in the
   kernel case.
   
  
  Hi Ben, can you or somebody elaborate?  I saw the pinned tlb in
  44x_mmu.c.
  Perhaps I don't understand the code fully, but it appears to map 256MB
  of lowmem into a pinned tlb.  I am not sure what phys address lowmem
  means, but I assumed (possibly incorrectly) that it is 0-256MB. 
 
 No. The first pinned entry (0...256M) is inserted by the asm code in
 head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem
 (typically up to 768M but various settings can change that) using more
 256M entries.

Thanks Ben, appreciate all your wisdom and insight.

Ok, so my 460ex board has 512MB total, so how does that figure into 
the 768M?  Is there some other heuristic that determines how these
are mapped? 

 Basically, all of lowmem is permanently mapped with such entries. 
 
  When I get the physical addresses for my buffers after kmalloc, they
  all have addresses that are within my DRAM but start at about the
  440MB mark. I end up passing those phys addresses to my DMA engine.
 
 Anything you get from kmalloc is going to come from lowmem, and thus be
 covered by those bolted TLB entries.

So is it reasonable to assume that everything on my system will come from
pinned TLB entries?

 
  When my compare runs it takes a huge amount of time in the assembly
  code doing memory fetches which makes me think that there are either
  tons of cache misses (despite the prefetching) or the entries have
  been purged
 
 What prefetching ? IE. The DMA operation -will- flush things out of the
 cache due to the DMA being not cache coherent on 44x. The 440 also
 doesn't have a working HW prefetch engine afaik (it should be disabled
 in FW or early asm on 440 cores and fused out in HW on 460 cores afaik).

 So only explicit SW prefetching will help.
 

The DMA is what I use in the real world case to get data into and out 
of these buffers.  However, I can disable the DMA completely and do only
the kmalloc.  In this case I still see the same poor performance.  My
prefetching is part of my algo using the dcbt instructions.  I know the
instructions are effective b/c without them the algo is much less 
performant.  So yes, my prefetches are explicit.

  from the TLB and must be obtained again.  As an experiment, I disabled
  my cache prefetch code and the algo took forever.  Next I altered the
  asm to do the same amount of data but a smaller amount over and over 
  so that less if fetched from main memory.  That executed very quickly.
  From that I drew the conclusion that the algorithm is memory
  bandwidth limited.
 
 I don't know what exactly is going on, maybe your prefetch stride isn't
 right for the HW setup, or something like that. You can use xmon 'u'
 command to look at the TLB content. Check that we have the 256M entries
 mapping your data, they should be there.

Ok, I will give that a try ... in addition, is there an easy way to use
any sort of gprof like tool to see the system performance?  What about
looking at the 44x performance counters in some meaningful way?  All
the experiments point to the fetching being slower in the full program
as opposed to the algo in a testbench, so I want to determine what it is
that could cause that.

thanks
ayman
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: ppc44x - how do i optimize driver for tlb hits

2010-09-23 Thread Benjamin Herrenschmidt

  No. The first pinned entry (0...256M) is inserted by the asm code in
  head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem
  (typically up to 768M but various settings can change that) using more
  256M entries.
 
 Thanks Ben, appreciate all your wisdom and insight.
 
 Ok, so my 460ex board has 512MB total, so how does that figure into 
 the 768M?  Is there some other heuristic that determines how these
 are mapped? 

Not really, it all fits in lowmem so it will be mapped with two pinned
256M entries.

Basically, we try to map all memory with those entries in the linear
mapping. But since we only have 1G of address space available when
PAGE_OFFSET is c000, and we need some of that for vmalloc, ioremap,
etc... we thus limit that mapping to 768M currently.

If you have more memory, you will see only 768M unless you use
CONFIG_HIGHMEM, which allows the kernel to exploit more physical
memory. 

In this case, only the first 768M are permanently mapped (and
accessible), but you can allocate pages in highmem which can still be
mapped into user space and need kmap/kunmap calls to be accessed by the
kernel.

However, in your case you don't need highmem, everything fits in lowmem,
so the kernel will just use 2x256M of bolted TLB entries to map that
permanently.

Note also that kmalloc() always return lowmem.

 So is it reasonable to assume that everything on my system will come from
 pinned TLB entries?

Yes.

 The DMA is what I use in the real world case to get data into and out 
 of these buffers.  However, I can disable the DMA completely and do only
 the kmalloc.  In this case I still see the same poor performance.  My
 prefetching is part of my algo using the dcbt instructions.  I know the
 instructions are effective b/c without them the algo is much less 
 performant.  So yes, my prefetches are explicit.

Could be some effect of the cache structure, L2 cache, cache geometry
(number of ways etc...). You might be able to alleviate that by changing
the stride of your prefetch.

Unfortunately, I'm not familiar enough with the 440 micro architecture
and its caches to be able to help you much here.

 Ok, I will give that a try ... in addition, is there an easy way to use
 any sort of gprof like tool to see the system performance?  What about
 looking at the 44x performance counters in some meaningful way?  All
 the experiments point to the fetching being slower in the full program
 as opposed to the algo in a testbench, so I want to determine what it is
 that could cause that.

Does it have any useful performance counters ? I didn't think it did but
I may be mistaken.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT

2010-09-23 Thread Chen, Tiejun
 -Original Message-
 From: 
 linuxppc-dev-bounces+tiejun.chen=windriver@lists.ozlabs.or
 g 
 [mailto:linuxppc-dev-bounces+tiejun.chen=windriver@lists.o
zlabs.org] On Behalf Of Scott Wood
 Sent: Friday, September 24, 2010 4:34 AM
 To: Gortmaker, Paul
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH] powerpc: Fix invalid page flags in 
 create TLB CAM pathfor PTE_64BIT
 
 On Thu, 23 Sep 2010 16:10:15 -0400
 Paul Gortmaker paul.gortma...@windriver.com wrote:
 
  So the possibility exists to wrongly assign the user 
 MAS3_URWX bits 
  to kernel (PAGE_KERNEL_X) address space via the following 
 code fragment:
  
  if (flags  _PAGE_USER) {
 TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR;
 TLBCAM[index].MAS3 |= ((flags  _PAGE_RW) ? MAS3_UW : 0);
  }
  
  Here is a dump of the TLB info from Simics with the above 
 code present:
  --
  L2 TLB1
  GT  
  SSS UUU V I
   Row  Logical   PhysicalSS TLPID  TID  
 WIMGE XWR XWR F P   V
  - - --- -- - - 
 - --- --- - -   -
0   c000-cfff 0-00fff 00 0 0  
  M   XWR XWR 0 1   1
1   d000-dfff 01000-01fff 00 0 0  
  M   XWR XWR 0 1   1
2   e000-efff 02000-02fff 00 0 0  
  M   XWR XWR 0 1   1
  
  Actually this conditional code was only used for two legacy 
 functions:
  
1: support KGDB to set break point.
   KGDB already dropped this; now uses its core write to 
 set break point.
  
2: io_block_mapping() to create TLB in segmentation size 
 (not PAGE_SIZE)
   for device IO space.
   This use case is also removed from the latest PowerPC kernel.
 
 io_block_mapping() went away, but the feature itself is still 
 useful and might come back with something like this:
 
 http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg3
3851.html
 
 ...though I'm not sure why such mappings would ever have user access.
 
 This could end up being used for large user pages by 
 something like hugetlbfs or KVM, though.  I don't think we 
 want to make large user pages fail, especailly if it just 

Understand. 

Actually the following is my original modification. 
==
+#if defined(CONFIG_FSL_BOOKE)  defined(CONFIG_PTE_64BIT)
+   /* On there _PAGE_BAP_UR is always integrated into flag,
_PAGE_KERNEL_RWX 
+* and _PAGE_USER here. So we have to only check _PAGE_BAP_UR as
the condition.
+*/
+   if (flags  _PAGE_BAP_UR) {
+#else
if (flags  _PAGE_USER) {
+#endif

But I find there is no any usage for this, except for the above #1 KGDB
and #2 io_block_mapping(). So I think it's possible to remove this
completely :)

 happens with the 32-bit page table format (which i may not 
 what the person adding such a feature tests with).
 
 I don't see a generic accessor that can test PTE flags for 
 user access -- in the absence of one, I guess we need an 
 ifdef here.  Or at least put in a comment so anyone who adds 
 a userspace use knows they need to fix it.

I already notice Ben's advice and looks fine to us.

Tiejun

 
 -Scott
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list

2010-09-23 Thread Michael Ellerman
On Thu, 2010-09-23 at 09:13 -0400, Neil Horman wrote:
 On Thu, Sep 23, 2010 at 08:57:20PM +1000, Michael Neuling wrote:
  
+   if (fwrite(line, strlen(line) - 1, 1, file) == 0)
   
   if (fputs(line, file) == EOF)
  
  Good point thanks... new patch below
  
  Mikey
  
  irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
  
  On pseries powerpc, IPIs are registered with an IRQ number so
  /proc/interrupts looks like this on a 2 core/2 thread machine:
  
 CPU0   CPU1   CPU2   CPU3
   16:316428232905141138794 983121   XICS Level   
   IPI
   18:2605674  0 304994  0   XICS Level   
   lan0
   30: 400057  0 169209  0   XICS Level   
   ibmvscsi
  LOC: 133734  77250 106425  91951   Local timer interrupts
  SPU:  0  0  0  0   Spurious interrupts
  CNT:  0  0  0  0   Performance monitoring 
  interrupts
  MCE:  0  0  0  0   Machine check exceptions
  
  Unfortunately this means irqbalance attempts to set the affinity of IPIs
  which is not possible.  So in the above case, when irqbalance is in
  performance mode due to heavy IPI, lan0 and ibmvscsi activity, it
  sometimes attempts to put the IPIs on one core (CPU01) and lan0 and
  ibmvscsi on the other core (CPU23).  This is suboptimal as we want lan0
  and ibmvscsi to be on separate cores and IPIs to be ignored.
  
  When irqblance attempts writes to the IPI smp_affinity (ie.
  /proc/irq/16/smp_affinity in the above example) it fails but irqbalance
  ignores currently ignores this.
  
  This patch catches these write fails and in this case adds that IRQ
  number to the banned IRQ list.  This will catch the above IPI case and
  any other IRQ where the SMP affinity can't be set.
  
  Tested on POWER6, POWER7 and x86.
  
  Signed-off-by: Michael Neuling mi...@neuling.org
  
  Index: irqbalance/irqlist.c
  ===
  --- irqbalance.orig/irqlist.c
  +++ irqbalance/irqlist.c
  @@ -67,7 +67,7 @@
  DIR *dir;
  struct dirent *entry;
  char *c, *c2;
  -   int nr , count = 0;
  +   int nr , count = 0, can_set = 1;
  char buf[PATH_MAX];
  sprintf(buf, /proc/irq/%i, number);
  dir = opendir(buf);
  @@ -80,7 +80,7 @@
  size_t size = 0;
  FILE *file;
  sprintf(buf, /proc/irq/%i/smp_affinity, number);
  -   file = fopen(buf, r);
  +   file = fopen(buf, r+);
  if (!file)
  continue;
  if (getline(line, size, file)==0) {
  @@ -89,7 +89,14 @@
  continue;
  }
  cpumask_parse_user(line, strlen(line), irq-mask);
  -   fclose(file);
  +   /*
  +* Check that we can write the affinity, if
  +* not take it out of the list.
  +*/
  +   if (fputs(line, file) == EOF)
  +   can_set = 0;

 This is maybe a nit, but writing to the affinity file can fail for a few
 different reasons, some of them permanent, some transient.  For instance, if
 we're in a memory constrained condition temporarily irq_affinity_proc_write
 might return -ENOMEM.  

Yeah true, usually followed shortly by your kernel going so far into
swap you never get it back, or OOMing, but I guess it's possible.

 Might it be better to modify this code so that, instead
 of using fputs to merge the various errors into an EOF, we use some other 
 write
 method that lets us better determine the error and selectively ban the 
 interrupt
 only for those errors which we consider permanent?

Yep. It seems fputs() gives you know way to get the actual error from
write(), so it looks we'll need to switch to open/write, but that's
probably not so terrible.

cheers



signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT

2010-09-23 Thread Chen, Tiejun
 -Original Message-
 From: 
 linuxppc-dev-bounces+tiejun.chen=windriver@lists.ozlabs.or
 g 
 [mailto:linuxppc-dev-bounces+tiejun.chen=windriver@lists.o
 zlabs.org] On Behalf Of Benjamin Herrenschmidt
 Sent: Friday, September 24, 2010 5:59 AM
 To: Scott Wood
 Cc: Gortmaker, Paul; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH] powerpc: Fix invalid page flags in 
 create TLB CAM pathfor PTE_64BIT
 
 On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote:
  I don't see a generic accessor that can test PTE flags for 
 user access 
  -- in the absence of one, I guess we need an ifdef here.  
 Or at least 
  put in a comment so anyone who adds a userspace use knows 
 they need to 
  fix it.
 
 We could make up one in powerpc arch at least
 
 #define pte_user(val) ((val  _PAGE_USER) == _PAGE_USER)
 

Looks good. 

Ben and Scott,

But for the patched issue we're discussing we have to do #ifdef that as
my original modification. Right? Or do you have other suggestion? Then I
can improve that as v2.

Thanks
Tiejun 

 would do
 
 Cheers,
 Ben.
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: MPC8641D PEX: programming OWBAR in Endpoint mode?

2010-09-23 Thread Chen, Tiejun
 -Original Message-
 From: david.hag...@gmail.com [mailto:david.hag...@gmail.com] 
 Sent: Thursday, September 23, 2010 10:44 PM
 To: Chen, Tiejun
 Cc: David Hagood; linuxppc-...@ozlabs.org
 Subject: RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
 
  -Original Message-
  via the BARs.
 
  I read your email again and something hint me. I notice you clarify 
  you already condigure InBound successfully.
 
 I am programming BOTH the inbound ATMUs to make PPC memory 
 available to the root complex, AND programming outbound ATMUs 
 to enable the PPC to bus master to the root complex's memory 
 space on PCIe.
 

Right but this should be done for RC mode, not for EP mode we're
discussing.

Tiejun

 I am NOT attempting to program the IWBARs - as you noted, 
 they get programmed by the root complex via PCI config operations.
 
 
   And as my above comment I'm afraid you mix up InBound and 
 OutBound on 
  EP mode?
 
 No, I am NOT confusing the two - that is why I am being VERY 
 EXPLICIT about accessing the OUTBOUND ATMUs.
 
 The only reason I mention the inbound ATMUs is to demonstrate 
 that the physical layer is working.
 
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev