Re: [U-Boot] cuImage and multi image?
Can you paste the whole log from the u-boot prompt? In the previous run the ramdisk image was corrupted because the single image was loaded at 0x80. But the boot message showed that the initrd image was at 0x0066c000-0x009ae825. So it was over the 8MB area. However after the load address was changed to 0x0400 (64MB), the ramdisk still seemed corrupted but with different error messages. = bootm ## Booting image at 0400 ... Image Name: Linux-2.6.33.5 Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size: 4424922 Bytes = 4.2 MB Load Address: 0040 Entry Point: 00400554 Verifying Checksum ... OK Uncompressing Kernel Image ... OK Memory - 0x0 0x800 (128MB) ENET0: local-mac-address - 00:09:9b:01:58:64 CPU clock-frequency - 0x7270e00 (120MHz) CPU timebase-frequency - 0x7270e0 (8MHz) CPU bus-frequency - 0x3938700 (60MHz) zImage starting: loaded at 0x0040 (sp: 0x07d1cbd0) Allocating 0x22a1e1 bytes for kernel ... gunzipping (0x - 0x0040c000:0x0066b0ac)...done 0x21c6c8 bytes Attached initrd image at 0x0066c000-0x009ae825 initrd head: 0x1f8b0808 Linux/PowerPC load: root=/dev/ram Finalizing device tree... flat tree at 0x9bb300 Using my870 machine description Linux version 2.6.33.5 (sh...@ubuntu) (gcc version 4.2.2) #4 Tue Sep 21 09:23:51 PDT 2010 Found initrd at 0xc066c000:0xc09ae825 The following shows the boot message that the same kernel and the same ramdisk were loaded separately. The difference is that when boot from two separate images, the ramdisk is loaded to the top of RAM (0x79d9000-0x7d1b825). While when booting from the single image, the ramdisk is loaded to the place immediately after the uncompressed kernel image (0x0066c000-0x009ae825). I'm not familiar with how the kernel uses the memory. But it seems clear from this failure that the kernel overwrites to where the initrd locates. Anyone can shed some light on why the kernel would overwrite the initrd area? BTW, if the initrd is small enough, the single image method works well. Maybe we should have relocated the initrd to the top of available ram just like u-boot's bootm? = bootm 100 200 ## Booting image at 0100 ... Image Name: Linux-2.6.33.5 Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size:1040228 Bytes = 1015.8 kB Load Address: 0040 Entry Point: 00400554 Verifying Checksum ... OK Uncompressing Kernel Image ... OK ## Loading RAMDisk Image at 0200 ... Image Name: 16MB Ramdisk Image Type: PowerPC Linux RAMDisk Image (gzip compressed) Data Size:3418149 Bytes = 3.3 MB Load Address: Entry Point: Verifying Checksum ... OK Loading Ramdisk to 079d9000, end 07d1b825 ... OK Memory - 0x0 0x800 (128MB) ENET0: local-mac-address - 00:09:9b:01:58:64 CPU clock-frequency - 0x7270e00 (120MHz) CPU timebase-frequency - 0x7270e0 (8MHz) CPU bus-frequency - 0x3938700 (60MHz) zImage starting: loaded at 0x0040 (sp: 0x07d1cbd0) Allocating 0x22a1e1 bytes for kernel ... gunzipping (0x - 0x0040c000:0x0066b0ac)...done 0x21c6c8 bytes Using loader supplied ramdisk at 0x79d9000-0x7d1b825 initrd head: 0x1f8b0808 Linux/PowerPC load: root=/dev/ram Finalizing device tree... flat tree at 0x678300 Using my870 machine description Linux version 2.6.33.5 (sh...@ubuntu) (gcc version 4.2.2) #4 Tue Sep 21 09:23:51 PDT 2010 Found initrd at 0xc79d9000:0xc7d1b825 Thanks, -Shawn. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 12/20] powerpc: change to new flag variables
Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y. Signed-off-by: matt mooney m...@muteddisk.com --- arch/powerpc/kernel/vdso32/Makefile |6 +++--- arch/powerpc/kernel/vdso64/Makefile |6 +++--- arch/powerpc/kvm/Makefile |2 +- arch/powerpc/lib/Makefile |4 +--- arch/powerpc/math-emu/Makefile |2 +- arch/powerpc/mm/Makefile|4 +--- arch/powerpc/oprofile/Makefile |4 +--- arch/powerpc/platforms/iseries/Makefile |2 +- arch/powerpc/platforms/pseries/Makefile | 11 +++ arch/powerpc/sysdev/Makefile|4 +--- arch/powerpc/xmon/Makefile |4 +--- 11 files changed, 17 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile index 51ead52..9a7946c 100644 --- a/arch/powerpc/kernel/vdso32/Makefile +++ b/arch/powerpc/kernel/vdso32/Makefile @@ -14,10 +14,10 @@ obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32)) GCOV_PROFILE := n -EXTRA_CFLAGS := -shared -fno-common -fno-builtin -EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso32.so.1 \ +ccflags-y := -shared -fno-common -fno-builtin +ccflags-y += -nostdlib -Wl,-soname=linux-vdso32.so.1 \ $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) -EXTRA_AFLAGS := -D__VDSO32__ -s +asflags-y := -D__VDSO32__ -s obj-y += vdso32_wrapper.o extra-y += vdso32.lds diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile index 79da65d..8c500d8 100644 --- a/arch/powerpc/kernel/vdso64/Makefile +++ b/arch/powerpc/kernel/vdso64/Makefile @@ -9,10 +9,10 @@ obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64)) GCOV_PROFILE := n -EXTRA_CFLAGS := -shared -fno-common -fno-builtin -EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso64.so.1 \ +ccflags-y := -shared -fno-common -fno-builtin +ccflags-y += -nostdlib -Wl,-soname=linux-vdso64.so.1 \ $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) -EXTRA_AFLAGS := -D__VDSO64__ -s +asflags-y := -D__VDSO64__ -s obj-y += vdso64_wrapper.o extra-y += vdso64.lds diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index d45c818..4d68638 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -4,7 +4,7 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -EXTRA_CFLAGS += -Ivirt/kvm -Iarch/powerpc/kvm +ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o) diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 5bb89c8..e4b0c07 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -4,9 +4,7 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -ifeq ($(CONFIG_PPC64),y) -EXTRA_CFLAGS += -mno-minimal-toc -endif +ccflags-$(CONFIG_PPC64):= -mno-minimal-toc CFLAGS_REMOVE_code-patching.o = -pg CFLAGS_REMOVE_feature-fixups.o = -pg diff --git a/arch/powerpc/math-emu/Makefile b/arch/powerpc/math-emu/Makefile index 0c16ab9..7d1dba0 100644 --- a/arch/powerpc/math-emu/Makefile +++ b/arch/powerpc/math-emu/Makefile @@ -15,4 +15,4 @@ obj-$(CONFIG_SPE) += math_efp.o CFLAGS_fabs.o = -fno-builtin-fabs CFLAGS_math.o = -fno-builtin-fabs -EXTRA_CFLAGS = -I. -Iinclude/math-emu -w +ccflags-y = -I. -Iinclude/math-emu -w diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index ce68708..53102f3 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -4,9 +4,7 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -ifeq ($(CONFIG_PPC64),y) -EXTRA_CFLAGS += -mno-minimal-toc -endif +ccflags-$(CONFIG_PPC64):= -mno-minimal-toc obj-y := fault.o mem.o pgtable.o gup.o \ init_$(CONFIG_WORD_SIZE).o \ diff --git a/arch/powerpc/oprofile/Makefile b/arch/powerpc/oprofile/Makefile index e219ca4..73456c4 100644 --- a/arch/powerpc/oprofile/Makefile +++ b/arch/powerpc/oprofile/Makefile @@ -1,8 +1,6 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -ifeq ($(CONFIG_PPC64),y) -EXTRA_CFLAGS += -mno-minimal-toc -endif +ccflags-$(CONFIG_PPC64):= -mno-minimal-toc obj-$(CONFIG_OPROFILE) += oprofile.o diff --git a/arch/powerpc/platforms/iseries/Makefile b/arch/powerpc/platforms/iseries/Makefile index ce01492..a7602b1 100644 --- a/arch/powerpc/platforms/iseries/Makefile +++ b/arch/powerpc/platforms/iseries/Makefile @@ -1,4 +1,4 @@ -EXTRA_CFLAGS += -mno-minimal-toc +ccflags-y := -mno-minimal-toc obj-y += exception.o obj-y += hvlog.o hvlpconfig.o lpardata.o setup.o dt.o mf.o lpevents.o \ diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile index 046ace9..7ee1599 100644 --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -1,10 +1,5 @@ -ifeq ($(CONFIG_PPC64),y) -EXTRA_CFLAGS += -mno-minimal-toc -endif - -ifeq
RE: Modifying mpc8308rdb.dts
posting patches beats waiting for an indefinite amount of time :) Kim Well, yes I suppose so.. Only I have noticed that only 1-2 people post these kind of patches, so I thought maybe there were some kind of unwritten agreement that no one else were to tamper with their work. I will dig into the documentation and figure out how to post patches then. Thanks :) -- Maria ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
On Wed, 2010-09-22 at 16:04 +1000, Michael Neuling wrote: When irqblance attempts writes to the IPI smp_affinity (ie. /proc/irq/16/smp_affinity in the above example) it fails but irqbalance ignores currently ignores this. This patch catches these write fails and in this case adds that IRQ number to the banned IRQ list. This will catch the above IPI case and any other IRQ where the SMP affinity can't be set. Cool! Index: irqbalance/irqlist.c === --- irqbalance.orig/irqlist.c +++ irqbalance/irqlist.c @@ -67,7 +67,7 @@ DIR *dir; struct dirent *entry; char *c, *c2; - int nr , count = 0; + int nr , count = 0, can_set = 1; char buf[PATH_MAX]; sprintf(buf, /proc/irq/%i, number); dir = opendir(buf); @@ -80,7 +80,7 @@ size_t size = 0; FILE *file; sprintf(buf, /proc/irq/%i/smp_affinity, number); - file = fopen(buf, r); + file = fopen(buf, r+); if (!file) continue; if (getline(line, size, file)==0) { @@ -89,7 +89,14 @@ continue; } cpumask_parse_user(line, strlen(line), irq-mask); - fclose(file); + /* + * Check that we can write the affinity, if + * not take it out of the list. + */ + if (fwrite(line, strlen(line) - 1, 1, file) == 0) if (fputs(line, file) == EOF) ? cheers signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 12/20] powerpc: change to new flag variables
Hi Matt, On Wed, 22 Sep 2010 23:51:09 -0700 matt mooney m...@muteddisk.com wrote: Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y. This looks good. One comment below ... --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -1,10 +1,5 @@ -ifeq ($(CONFIG_PPC64),y) -EXTRA_CFLAGS += -mno-minimal-toc -endif - -ifeq ($(CONFIG_PPC_PSERIES_DEBUG),y) -EXTRA_CFLAGS += -DDEBUG -endif +ccflags-$(CONFIG_PPC64) := -mno-minimal-toc +ccflags-$(CONFIG_PPC_PSERIES_DEBUG) += -DDEBUG obj-y:= lpar.o hvCall.o nvram.o reconfig.o \ setup.o iommu.o event_sources.o ras.o \ @@ -23,7 +18,7 @@ obj-$(CONFIG_MEMORY_HOTPLUG)+= hotplug-memory.o obj-$(CONFIG_HVC_CONSOLE)+= hvconsole.o obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_HCALL_STATS)+= hvCall_inst.o -obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o +obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o obj-$(CONFIG_CMM)+= cmm.o obj-$(CONFIG_DTL)+= dtl.o This looks like a spurious extra hunk. -- Cheers, Stephen Rothwells...@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ pgp107zH4uvNr.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
+ if (fwrite(line, strlen(line) - 1, 1, file) == 0) if (fputs(line, file) == EOF) Good point thanks... new patch below Mikey irqbalance, powerpc: add IRQs without settable SMP affinity to banned list On pseries powerpc, IPIs are registered with an IRQ number so /proc/interrupts looks like this on a 2 core/2 thread machine: CPU0 CPU1 CPU2 CPU3 16:316428232905141138794 983121 XICS Level IPI 18:2605674 0 304994 0 XICS Level lan0 30: 400057 0 169209 0 XICS Level ibmvscsi LOC: 133734 77250 106425 91951 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions Unfortunately this means irqbalance attempts to set the affinity of IPIs which is not possible. So in the above case, when irqbalance is in performance mode due to heavy IPI, lan0 and ibmvscsi activity, it sometimes attempts to put the IPIs on one core (CPU01) and lan0 and ibmvscsi on the other core (CPU23). This is suboptimal as we want lan0 and ibmvscsi to be on separate cores and IPIs to be ignored. When irqblance attempts writes to the IPI smp_affinity (ie. /proc/irq/16/smp_affinity in the above example) it fails but irqbalance ignores currently ignores this. This patch catches these write fails and in this case adds that IRQ number to the banned IRQ list. This will catch the above IPI case and any other IRQ where the SMP affinity can't be set. Tested on POWER6, POWER7 and x86. Signed-off-by: Michael Neuling mi...@neuling.org Index: irqbalance/irqlist.c === --- irqbalance.orig/irqlist.c +++ irqbalance/irqlist.c @@ -67,7 +67,7 @@ DIR *dir; struct dirent *entry; char *c, *c2; - int nr , count = 0; + int nr , count = 0, can_set = 1; char buf[PATH_MAX]; sprintf(buf, /proc/irq/%i, number); dir = opendir(buf); @@ -80,7 +80,7 @@ size_t size = 0; FILE *file; sprintf(buf, /proc/irq/%i/smp_affinity, number); - file = fopen(buf, r); + file = fopen(buf, r+); if (!file) continue; if (getline(line, size, file)==0) { @@ -89,7 +89,14 @@ continue; } cpumask_parse_user(line, strlen(line), irq-mask); - fclose(file); + /* +* Check that we can write the affinity, if +* not take it out of the list. +*/ + if (fputs(line, file) == EOF) + can_set = 0; + if (fclose(file)) + can_set = 0; free(line); } else if (strcmp(entry-d_name,allowed_affinity)==0) { char *line = NULL; @@ -122,7 +129,7 @@ count++; /* if there is no choice in the allowed mask, don't bother to balance */ - if (count2) + if ((count2) || (can_set == 0)) irq-balance_level = BALANCE_NONE; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
On Thu, 2010-09-23 at 05:21 +0200, Chen, Tiejun wrote: I can get the device to show up on the host's PCI bus, I can This only ensure you can access the PCIe configure space. Not quite: I can also read the BARs that I program, and the memory behind them on the PPC. program the inbound ATMUs such that the BARS are updated when the host (re-)scans them, but I cannot for the life of me get What value are configured to IntBound REGs? I can program them at run time via sysfs on the PPC's side, so there is no single set of values. However, I am pointing them at the PPC's RAM space, and as I stated above, I can read the PPC's RAM from the host side via the BARs. How do you configure OWS of PEXOWAR? I means you still access that if OWS is match the whole target memory size even when '0' is as the internal platform address. As I understand it, not if the OWS is not correctly mapped on the PPC side - the PEX outbound ATMU's OWBAR must be mapped to a region of the PPCs address space that is also mapped to PEX in the LAW. The LAW does NOT indicate that PPC address 0 is mapped to the PEX. Out_be32 should be fine for atmu REGs. And also you can refe to the function, setup_pci_atmu setup_one_atmu, on the file, arch/powerpc/sysdev/fsl_pci.c, to know how to access atmu REGs. Often you should disable them, configure then enable/invoke atmu antry as normal configuring sequent. I have tried disabling the outbound ATMU when I program it, with no change. I have looked at the functions you mention, and that is a part of my confusion, as they aren't doing anything different than I am. Additionally I'm a bit afraid your initial phase :) As you know PCIe would be used as RC mode on Freescale PowerPC kernel. So I don't know if you also drop this path on your kernel to conflict each other :) I have tried doing this under a kernel built without PCI support with no change. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop
Running modules_install from a newly built 2.6.36-rc5 kernel on my 32-bit PowerMac results in: WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/busses/i2c-powermac.ko ignored, due to loop WARNING: Loop detected: /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko needs of_i2c.ko which needs i2c-core.ko again! WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-dev.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/of/of_i2c.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/sound/ppc/snd-powermac.ko ignored, due to loop grep '.*I2C.*=' .config CONFIG_OF_I2C=m CONFIG_I2C=m CONFIG_I2C_BOARDINFO=y CONFIG_I2C_CHARDEV=m CONFIG_I2C_POWERMAC=m I can't say exactly when this started, haven't built kernels on this box in a while. /Mikael ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [U-Boot] cuImage and multi image?
-Original Message- From: Shawn Jin [mailto:shawnx...@gmail.com] Sent: Thursday, September 23, 2010 4:23 AM To: Chen, Tiejun Cc: Scott Wood; ppcdev; uboot Subject: Re: [U-Boot] cuImage and multi image? I have a large ramdisk image. The size of the image itself (i.e. the *.gz) is about 4MB. When the ramdisk was being decompressed Did you try to change link_address on the file, arch/powerpc/boot/wrapper? No. I don't have to. Right? The link_address is still 0x40. I means you can change link_address to other value according to the Image size. Try set link_address='0x500'. Did you try boot the uImage and the ramdisk separately? For example, you can boot this as the following command: # bootm ${kernel_addr} ${ramdisk_addr} ${fdt_addr} Mine is a cuImage. I'm pretty sure that my ramdisk is valid when it's a separate image. I used bootm kernel_addr ramdisk_addr to boot. Can you paste the whole log from the u-boot prompt? In the previous run the ramdisk image was corrupted because the single image was loaded at 0x80. But the boot message showed that the initrd image was at 0x0066c000-0x009ae825. So it was over the 8MB area. However after the load address was changed to 0x0400 (64MB), the ramdisk still seemed corrupted but with different error messages. This should be the same reason, 'uncompression error'. Cheers Tiejun = bootm ## Booting image at 0400 ... Image Name: Linux-2.6.33.5 Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size:4424922 Bytes = 4.2 MB Load Address: 0040 Entry Point: 00400554 Verifying Checksum ... OK Uncompressing Kernel Image ... OK Memory - 0x0 0x800 (128MB) ENET0: local-mac-address - 00:09:9b:01:58:64 CPU clock-frequency - 0x7270e00 (120MHz) CPU timebase-frequency - 0x7270e0 (8MHz) CPU bus-frequency - 0x3938700 (60MHz) zImage starting: loaded at 0x0040 (sp: 0x07d1cbd0) Allocating 0x22a1e1 bytes for kernel ... gunzipping (0x - 0x0040c000:0x0066b0ac)...done 0x21c6c8 bytes Attached initrd image at 0x0066c000-0x009ae825 initrd head: 0x1f8b0808 Linux/PowerPC load: root=/dev/ram Finalizing device tree... flat tree at 0x9bb300 Using my870 machine description Linux version 2.6.33.5 (sh...@ubuntu) (gcc version 4.2.2) #4 Tue Sep 21 09:23:51 PDT 2010 Found initrd at 0xc066c000:0xc09ae825 Zone PFN ranges: DMA 0x - 0x8000 Normal 0x8000 - 0x8000 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0x - 0x8000 MMU: Allocated 72 bytes of context maps for 16 contexts Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512 Kernel command line: root=/dev/ram PID hash table entries: 512 (order: -1, 2048 bytes) Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) Memory: 124072k/131072k available (2080k kernel code, 6836k reserved, 84k data, 52k bss, 104k init) Kernel virtual memory layout: * 0xfffdf000..0xf000 : fixmap * 0xfde0..0xfe00 : consistent mem * 0xfddfa000..0xfde0 : early ioremap * 0xc900..0xfddfa000 : vmalloc ioremap SLUB: Genslabs=12, HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 snipped RAMDISK: gzip image found at block 0 uncompression error VFS: Mounted root (ext2 filesystem) readonly on device 1:0. Freeing unused kernel memory: 104k init EXT2-fs (ram0): error: ext2_check_page: bad entry in directory #336: : unaligned directory entry - offset=0, inode=74187384, rec_len=2081, name_len=126 EXT2-fs (ram0): error: remounting filesystem read-only attempt to access beyond end of device ram0: rw=0, want=156831968, limit=32768 Buffer I/O error on device ram0, logical block 78415983 attempt to access beyond end of device ram0: rw=0, want=112233212, limit=32768 Buffer I/O error on device ram0, logical block 56116605 attempt to access beyond end of device ram0: rw=0, want=6626681482, limit=32768 Buffer I/O error on device ram0, logical block 3313340740 attempt to access beyond end of device ram0: rw=0, want=184684282, limit=32768 Buffer I/O error on device ram0, logical block 92342140 Kernel panic - not syncing: No init found. Try passing init= option to kernel. Call Trace: [c7821f30] [c0006cd8] show_stack+0x40/0x168 (unreliable) [c7821f70] [c001cefc] panic+0x8c/0x178 [c7821fc0] [c00026d4] init_post+0xe4/0xf4 [c7821fd0] [c01ee224] kernel_init+0x108/0x130 [c7821ff0] [c000dcc0] kernel_thread+0x4c/0x68 Rebooting in 180 seconds.. Thanks, -Shawn. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
On Thu, Sep 23, 2010 at 08:57:20PM +1000, Michael Neuling wrote: + if (fwrite(line, strlen(line) - 1, 1, file) == 0) if (fputs(line, file) == EOF) Good point thanks... new patch below Mikey irqbalance, powerpc: add IRQs without settable SMP affinity to banned list On pseries powerpc, IPIs are registered with an IRQ number so /proc/interrupts looks like this on a 2 core/2 thread machine: CPU0 CPU1 CPU2 CPU3 16:316428232905141138794 983121 XICS Level IPI 18:2605674 0 304994 0 XICS Level lan0 30: 400057 0 169209 0 XICS Level ibmvscsi LOC: 133734 77250 106425 91951 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions Unfortunately this means irqbalance attempts to set the affinity of IPIs which is not possible. So in the above case, when irqbalance is in performance mode due to heavy IPI, lan0 and ibmvscsi activity, it sometimes attempts to put the IPIs on one core (CPU01) and lan0 and ibmvscsi on the other core (CPU23). This is suboptimal as we want lan0 and ibmvscsi to be on separate cores and IPIs to be ignored. When irqblance attempts writes to the IPI smp_affinity (ie. /proc/irq/16/smp_affinity in the above example) it fails but irqbalance ignores currently ignores this. This patch catches these write fails and in this case adds that IRQ number to the banned IRQ list. This will catch the above IPI case and any other IRQ where the SMP affinity can't be set. Tested on POWER6, POWER7 and x86. Signed-off-by: Michael Neuling mi...@neuling.org Index: irqbalance/irqlist.c === --- irqbalance.orig/irqlist.c +++ irqbalance/irqlist.c @@ -67,7 +67,7 @@ DIR *dir; struct dirent *entry; char *c, *c2; - int nr , count = 0; + int nr , count = 0, can_set = 1; char buf[PATH_MAX]; sprintf(buf, /proc/irq/%i, number); dir = opendir(buf); @@ -80,7 +80,7 @@ size_t size = 0; FILE *file; sprintf(buf, /proc/irq/%i/smp_affinity, number); - file = fopen(buf, r); + file = fopen(buf, r+); if (!file) continue; if (getline(line, size, file)==0) { @@ -89,7 +89,14 @@ continue; } cpumask_parse_user(line, strlen(line), irq-mask); - fclose(file); + /* + * Check that we can write the affinity, if + * not take it out of the list. + */ + if (fputs(line, file) == EOF) + can_set = 0; This is maybe a nit, but writing to the affinity file can fail for a few different reasons, some of them permanent, some transient. For instance, if we're in a memory constrained condition temporarily irq_affinity_proc_write might return -ENOMEM. Might it be better to modify this code so that, instead of using fputs to merge the various errors into an EOF, we use some other write method that lets us better determine the error and selectively ban the interrupt only for those errors which we consider permanent? Otherwise this looks fine to me. Thanks Neil ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
-Original Message- From: David Hagood [mailto:david.hag...@gmail.com] Sent: Thursday, September 23, 2010 7:11 PM To: Chen, Tiejun; linuxppc-...@ozlabs.org Subject: RE: MPC8641D PEX: programming OWBAR in Endpoint mode? On Thu, 2010-09-23 at 05:21 +0200, Chen, Tiejun wrote: I can get the device to show up on the host's PCI bus, I can This only ensure you can access the PCIe configure space. Not quite: I can also read the BARs that I program, and the memory behind them on the PPC. Absolutely. program the inbound ATMUs such that the BARS are updated when the host (re-)scans them, but I cannot for the life of me get What value are configured to IntBound REGs? I can program them at run time via sysfs on the PPC's side, so there is no single set of values. However, I am pointing them at the PPC's RAM space, and as I stated above, I can read the PPC's RAM from the host side via the BARs. I read your email again and something hint me. I notice you clarify you already condigure InBound successfully. Right? If so I'm a bit confused. For PCIe EP mode PEXIWBARs are not implemented in the memory-mapped space. If you read any PEXIWBAR these registers always return zero regardless of writing any value at first. You only can program 4 inbound BARs by type 0 configure action like normal PCIe device. How do you configure OWS of PEXOWAR? I means you still access that if OWS is match the whole target memory size even when '0' is as the internal platform address. As I understand it, not if the OWS is not correctly mapped on the PPC side - the PEX outbound ATMU's OWBAR must be mapped to a region of the PPCs address space that is also mapped to PEX in the LAW. The LAW does NOT indicate that PPC address 0 is mapped to the PEX. If there is no any law entry for PCIe the kernel should trap machine check when you access PCIe space. And as my above comment I'm afraid you mix up InBound and OutBound on EP mode? So you always read zero from your so-called OutBound? I means that should be PEXIWBAR in fact. I'm not sure but you can check this. Out_be32 should be fine for atmu REGs. And also you can refe to the function, setup_pci_atmu setup_one_atmu, on the file, arch/powerpc/sysdev/fsl_pci.c, to know how to access atmu REGs. Often you should disable them, configure then enable/invoke atmu antry as normal configuring sequent. I have tried disabling the outbound ATMU when I program it, with no change. I have looked at the functions you mention, and that is a part of my confusion, as they aren't doing anything different than I am. I only means you can refer how to access these registers. Additionally I'm a bit afraid your initial phase :) As you know PCIe would be used as RC mode on Freescale PowerPC kernel. So I don't know if you also drop this path on your kernel to conflict each other :) I have tried doing this under a kernel built without PCI support with no change. Good. Tiejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
-Original Message- via the BARs. I read your email again and something hint me. I notice you clarify you already condigure InBound successfully. I am programming BOTH the inbound ATMUs to make PPC memory available to the root complex, AND programming outbound ATMUs to enable the PPC to bus master to the root complex's memory space on PCIe. I am NOT attempting to program the IWBARs - as you noted, they get programmed by the root complex via PCI config operations. And as my above comment I'm afraid you mix up InBound and OutBound on EP mode? No, I am NOT confusing the two - that is why I am being VERY EXPLICIT about accessing the OUTBOUND ATMUs. The only reason I mention the inbound ATMUs is to demonstrate that the physical layer is working. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
ppc44x - how do i optimize driver for tlb hits
I've implemented a working driver on my 460EX. it allocates a couple of buffers of 4MB each. I have a custom memcmp algorithm in asm that is extremely fast in user space, but 1/2 as fast when run on these buffers. my tests are showing that the algorithm seems to be memory bandwidth bound. my guess is that i am having tlb or cache misses (my algo uses the dbct) that is slowing performance. curiously when in user space, i can affect the performance by small changes in the size of the buffer, i.e. 4MB + 32B is fast, 4MB + 4K is much worse. Can i adjust my driver code that is using kmalloc to make sure that the ppc44x has 4MB tlb entries for these and that they stay put? thanks ayman ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
Here is the sixth version of my patch set adding PTP hardware clock support to the Linux kernel. The main difference to v5 is that the character device interface has been replaced with one based on the posix clock system calls. The first three patches add necessary background support in the posix clock code. The last five add the new PTP hardware clock features. Previously, I had tried to present the posix clock changes all by themselves, but commentators asked to see the whole context. What follows is a rather lengthy discussion of the various design issues. Table of Contents = 1 Introduction 2 Previous Discussions 3 Design Issues 3.1 Clock Operations 3.2 Character Device vs System Calls 3.2.1 Using the POSIX Clock API 3.2.2 Tuning a POSIX Clock 3.2.3 Dynamic POSIX Clock IDs 3.3 Synchronizing the Linux System Time 3.4 Ancillary PHC Operations 3.5 User timers 4 Drivers 4.1 Supported Hardware Clocks 4.2 Open Driver Issues 4.2.1 DP83640 4.2.2 IXP465 1 Introduction ~~~ The aim of this patch set is to add support for PTP hardware clocks into the Linux kernel. In the following description, we use the abbreviation PHC to mean PTP hardware clock. Support for obtaining timestamps from a PHC already exists via the SO_TIMESTAMPING socket option, integrated in kernel version 2.6.30. This patch set completes the picture by allow user space programs to adjust the PHC and to control its ancillary features. 2 Previous Discussions ~~~ This patch set previously appeared on the netdev list. Since V5 of the character device patch set, the discussion has moved to the lkml. - PTP hardware clock as a character device V5 [http://lkml.org/lkml/2010/8/16/90] - POSIX clock tuning syscall with static clock ids [http://lkml.org/lkml/2010/8/23/49] - POSIX clock tuning syscall with dynamic clock ids [http://lkml.org/lkml/2010/9/3/119] 3 Design Issues 3.1 Clock Operations = Based on experience with several commercially available PHCs, we identified a set of essential operations and a set of ancillary operations. - Basic clock operations 1. Set time 2. Get time 3. Shift the clock by a given offset atomically 4. Adjust clock frequency - Ancillary clock features 1. Time stamp external events 2. Enable Linux PPS subsystem events 3. Periodic output signals 4. One shot or periodic alarms, with CPU interrupt The patch set includes examples of the first two ancillary features, and implementing the third point for a particular PHC is fairly straightforward. The fourth point is discussed below. 3.2 Character Device vs System Calls = This patch set started out as a class driver that exposes the PHC as a character device with standardized ioctls. Since several clock operations in the ioctl interface mimic the POSIX clock API, the suggestion was made to expose the PHC as a new clockid_t. POSIX defines the CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_PROCESS_CPUTIME_ID, and CLOCK_THREAD_CPUTIME_ID clock ids. As to other possible clock ids, the standard offers the following hint: An implementation may also support additional clocks. The interpretation of time values for these clocks is unspecified. So as far as the POSIX standard is concerned, offering a clock id to represent the PHC would be acceptable. From discussions on the lkml, a repeated wish was to ensure that any changes in the POSIX clock code would be general enough to support other new hardware clocks that might appear in the future, not just the particulars of PHCs. 3.2.1 Using the POSIX Clock API Looking at the mapping from PHC operation to the POSIX clock API, we see that two of the basic clock operations, marked with *, have no POSIX equivalent. The items marked NA are peculiar to PHCs and will be discussed separately, below. Clock Operation POSIX function -+- Set time clock_gettime Get time clock_settime Shift the clock * Adjust clock frequency* -+- Time stamp external eventsNA Enable PPS events NA Periodic output signals NA One shot or periodic alarms timer_create, timer_settime In contrast to the standard Linux system clock, a PHC is adjustable in hardware, for example using
[PATCH 2/8] posix clocks: dynamic clock ids.
This patch augments the POSIX clock code to offer a dynamic clock creation method. Instead of registering a hard coded clock ID, modules may call create_posix_clock(), which returns a new clock ID. Signed-off-by: Richard Cochran richard.coch...@omicron.at --- include/linux/posix-timers.h |7 ++- include/linux/time.h |2 ++ kernel/posix-timers.c| 41 ++--- 3 files changed, 42 insertions(+), 8 deletions(-) diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index abf61cc..08aa4da 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -68,6 +68,7 @@ struct k_itimer { }; struct k_clock { + clockid_t id; int res;/* in nanoseconds */ int (*clock_getres) (const clockid_t which_clock, struct timespec *tp); int (*clock_set) (const clockid_t which_clock, struct timespec * tp); @@ -86,7 +87,11 @@ struct k_clock { struct itimerspec * cur_setting); }; -void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock); +/* Regsiter a posix clock with a well known clock id. */ +int register_posix_clock(const clockid_t id, struct k_clock *clock); + +/* Create a new posix clock with a dynamic clock id. */ +clockid_t create_posix_clock(struct k_clock *clock); /* error handlers for timer_create, nanosleep and settime */ int do_posix_clock_nonanosleep(const clockid_t, int flags, struct timespec *, diff --git a/include/linux/time.h b/include/linux/time.h index 9f15ac7..914c48d 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -299,6 +299,8 @@ struct itimerval { #define CLOCKS_MASK(CLOCK_REALTIME | CLOCK_MONOTONIC) #define CLOCKS_MONOCLOCK_MONOTONIC +#define CLOCK_INVALID -1 + /* * The various flags for setting POSIX.1b interval timers: */ diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c index 446b566..67fba5c 100644 --- a/kernel/posix-timers.c +++ b/kernel/posix-timers.c @@ -132,6 +132,8 @@ static DEFINE_SPINLOCK(idr_lock); */ static struct k_clock posix_clocks[MAX_CLOCKS]; +static DECLARE_BITMAP(clocks_map, MAX_CLOCKS); +static DEFINE_MUTEX(clocks_mux); /* protects 'posix_clocks' and 'clocks_map' */ /* * These ones are defined below. @@ -484,18 +486,43 @@ static struct pid *good_sigevent(sigevent_t * event) return task_pid(rtn); } -void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock) +int register_posix_clock(const clockid_t id, struct k_clock *clock) { - if ((unsigned) clock_id = MAX_CLOCKS) { - printk(POSIX clock register failed for clock_id %d\n, - clock_id); - return; - } + struct k_clock *kc; + int err = 0; - posix_clocks[clock_id] = *new_clock; + mutex_lock(clocks_mux); + if (test_bit(id, clocks_map)) { + pr_err(clock_id %d already registered\n, id); + err = -EBUSY; + goto out; + } + kc = posix_clocks[id]; + *kc = *clock; + kc-id = id; + set_bit(id, clocks_map); +out: + mutex_unlock(clocks_mux); + return err; } EXPORT_SYMBOL_GPL(register_posix_clock); +clockid_t create_posix_clock(struct k_clock *clock) +{ + clockid_t id; + + mutex_lock(clocks_mux); + id = find_first_zero_bit(clocks_map, MAX_CLOCKS); + mutex_unlock(clocks_mux); + + if (id MAX_CLOCKS) { + register_posix_clock(id, clock); + return id; + } + return CLOCK_INVALID; +} +EXPORT_SYMBOL_GPL(create_posix_clock); + static struct k_itimer * alloc_posix_timer(void) { struct k_itimer *tmr; -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
A new syscall is introduced that allows tuning of a POSIX clock. The syscall is implemented for four architectures: arm, blackfin, powerpc, and x86. The new syscall, clock_adjtime, takes two parameters, the clock ID, and a pointer to a struct timex. The semantics of the timex struct have been expanded by one additional mode flag, which allows an absolute offset correction. When specificied, the clock offset is immediately corrected by adding the given time value to the current time value. Signed-off-by: Richard Cochran richard.coch...@omicron.at --- arch/arm/include/asm/unistd.h |1 + arch/arm/kernel/calls.S|1 + arch/blackfin/include/asm/unistd.h |3 +- arch/blackfin/mach-common/entry.S |1 + arch/powerpc/include/asm/systbl.h |1 + arch/powerpc/include/asm/unistd.h |3 +- arch/x86/ia32/ia32entry.S |1 + arch/x86/include/asm/unistd_32.h |3 +- arch/x86/include/asm/unistd_64.h |2 + arch/x86/kernel/syscall_table_32.S |1 + include/linux/posix-timers.h |3 + include/linux/syscalls.h |2 + include/linux/timex.h |3 +- kernel/compat.c| 136 +++- kernel/posix-cpu-timers.c |4 + kernel/posix-timers.c | 17 + 16 files changed, 130 insertions(+), 52 deletions(-) diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h index c891eb7..f58d881 100644 --- a/arch/arm/include/asm/unistd.h +++ b/arch/arm/include/asm/unistd.h @@ -396,6 +396,7 @@ #define __NR_fanotify_init (__NR_SYSCALL_BASE+367) #define __NR_fanotify_mark (__NR_SYSCALL_BASE+368) #define __NR_prlimit64 (__NR_SYSCALL_BASE+369) +#define __NR_clock_adjtime (__NR_SYSCALL_BASE+370) /* * The following SWIs are ARM private. diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S index 5c26ecc..430de4c 100644 --- a/arch/arm/kernel/calls.S +++ b/arch/arm/kernel/calls.S @@ -379,6 +379,7 @@ CALL(sys_fanotify_init) CALL(sys_fanotify_mark) CALL(sys_prlimit64) +/* 370 */ CALL(sys_clock_adjtime) #ifndef syscalls_counted .equ syscalls_padding, ((NR_syscalls + 3) ~3) - NR_syscalls #define syscalls_counted diff --git a/arch/blackfin/include/asm/unistd.h b/arch/blackfin/include/asm/unistd.h index 14fcd25..79ad99b 100644 --- a/arch/blackfin/include/asm/unistd.h +++ b/arch/blackfin/include/asm/unistd.h @@ -392,8 +392,9 @@ #define __NR_fanotify_init 371 #define __NR_fanotify_mark 372 #define __NR_prlimit64 373 +#define __NR_clock_adjtime 374 -#define __NR_syscall 374 +#define __NR_syscall 375 #define NR_syscalls__NR_syscall /* Old optional stuff no one actually uses */ diff --git a/arch/blackfin/mach-common/entry.S b/arch/blackfin/mach-common/entry.S index af1bffa..ee68730 100644 --- a/arch/blackfin/mach-common/entry.S +++ b/arch/blackfin/mach-common/entry.S @@ -1631,6 +1631,7 @@ ENTRY(_sys_call_table) .long _sys_fanotify_init .long _sys_fanotify_mark .long _sys_prlimit64 + .long _sys_clock_adjtime .rept NR_syscalls-(.-_sys_call_table)/4 .long _sys_ni_syscall diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 3d21266..2485d8f 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -329,3 +329,4 @@ COMPAT_SYS(rt_tgsigqueueinfo) SYSCALL(fanotify_init) COMPAT_SYS(fanotify_mark) SYSCALL_SPU(prlimit64) +COMPAT_SYS_SPU(clock_adjtime) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index 597e6f9..85d5067 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -348,10 +348,11 @@ #define __NR_fanotify_init 323 #define __NR_fanotify_mark 324 #define __NR_prlimit64 325 +#define __NR_clock_adjtime 326 #ifdef __KERNEL__ -#define __NR_syscalls 326 +#define __NR_syscalls 327 #define __NR__exit __NR_exit #define NR_syscalls__NR_syscalls diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 518bb99..0ed7896 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -851,4 +851,5 @@ ia32_sys_call_table: .quad sys_fanotify_init .quad sys32_fanotify_mark .quad sys_prlimit64 /* 340 */ + .quad compat_sys_clock_adjtime ia32_syscall_end: diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h index b766a5e..b6f73f1 100644 --- a/arch/x86/include/asm/unistd_32.h +++ b/arch/x86/include/asm/unistd_32.h @@ -346,10 +346,11 @@ #define __NR_fanotify_init 338 #define __NR_fanotify_mark 339 #define __NR_prlimit64 340 +#define __NR_clock_adjtime 341 #ifdef __KERNEL__ -#define NR_syscalls 341 +#define NR_syscalls 342 #define
[PATCH 3/8] posix clocks: introduce a sysfs presence.
This patch adds a 'timesource' class into sysfs. Each registered POSIX clock appears by name under /sys/class/timesource. The idea is to expose to user space the dynamic mapping between clock devices and clock IDs. Signed-off-by: Richard Cochran richard.coch...@omicron.at --- Documentation/ABI/testing/sysfs-timesource | 24 drivers/char/mmtimer.c |1 + include/linux/posix-timers.h |4 +++ kernel/posix-cpu-timers.c |2 + kernel/posix-timers.c | 40 5 files changed, 71 insertions(+), 0 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-timesource diff --git a/Documentation/ABI/testing/sysfs-timesource b/Documentation/ABI/testing/sysfs-timesource new file mode 100644 index 000..f991de2 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-timesource @@ -0,0 +1,24 @@ +What: /sys/class/timesource/ +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This directory contains files and directories + providing a standardized interface to the available + time sources. + +What: /sys/class/timesource/name/ +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This directory contains the attributes of a time + source registered with the POSIX clock subsystem. + +What: /sys/class/timesource/name/id +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This file contains the clock ID (a non-negative + integer) of the named time source registered with the + POSIX clock subsystem. This value may be passed as the + first argument to the POSIX clock and timer system + calls. See man CLOCK_GETRES(2) and TIMER_CREATE(2). diff --git a/drivers/char/mmtimer.c b/drivers/char/mmtimer.c index ea7c99f..e9173e3 100644 --- a/drivers/char/mmtimer.c +++ b/drivers/char/mmtimer.c @@ -758,6 +758,7 @@ static int sgi_timer_set(struct k_itimer *timr, int flags, } static struct k_clock sgi_clock = { + .name = sgi_cycle, .res = 0, .clock_set = sgi_clock_set, .clock_get = sgi_clock_get, diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 08aa4da..64e6fee 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -67,7 +67,11 @@ struct k_itimer { } it; }; +#define KCLOCK_MAX_NAME 32 + struct k_clock { + char name[KCLOCK_MAX_NAME]; + struct device *dev; clockid_t id; int res;/* in nanoseconds */ int (*clock_getres) (const clockid_t which_clock, struct timespec *tp); diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c index e1c2e7b..df9cbab 100644 --- a/kernel/posix-cpu-timers.c +++ b/kernel/posix-cpu-timers.c @@ -1611,6 +1611,7 @@ static long thread_cpu_nsleep_restart(struct restart_block *restart_block) static __init int init_posix_cpu_timers(void) { struct k_clock process = { + .name = process_cputime, .clock_getres = process_cpu_clock_getres, .clock_get = process_cpu_clock_get, .clock_set = do_posix_clock_nosettime, @@ -1619,6 +1620,7 @@ static __init int init_posix_cpu_timers(void) .nsleep_restart = process_cpu_nsleep_restart, }; struct k_clock thread = { + .name = thread_cputime, .clock_getres = thread_cpu_clock_getres, .clock_get = thread_cpu_clock_get, .clock_set = do_posix_clock_nosettime, diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c index 67fba5c..719aa11 100644 --- a/kernel/posix-timers.c +++ b/kernel/posix-timers.c @@ -46,6 +46,7 @@ #include linux/wait.h #include linux/workqueue.h #include linux/module.h +#include linux/device.h /* * Management arrays for POSIX timers. Timers are kept in slab memory @@ -135,6 +136,8 @@ static struct k_clock posix_clocks[MAX_CLOCKS]; static DECLARE_BITMAP(clocks_map, MAX_CLOCKS); static DEFINE_MUTEX(clocks_mux); /* protects 'posix_clocks' and 'clocks_map' */ +static struct class *timesource_class; + /* * These ones are defined below. */ @@ -271,20 +274,40 @@ static int posix_get_coarse_res(const clockid_t which_clock, struct timespec *tp *tp = ktime_to_timespec(KTIME_LOW_RES); return 0; } + +/* + * sysfs attributes + */ + +static ssize_t show_clock_id(struct device *dev, +struct device_attribute *attr, char *page) +{ + struct k_clock *kc = dev_get_drvdata(dev); + return snprintf(page, PAGE_SIZE-1, %d\n, kc-id); +} + +static struct device_attribute timesource_dev_attrs[] = { + __ATTR(id, 0444, show_clock_id, NULL), +
[PATCH 4/8] ptp: Added a brand new class driver for ptp clocks.
This patch adds an infrastructure for hardware clocks that implement IEEE 1588, the Precision Time Protocol (PTP). A class driver offers a registration method to particular hardware clock drivers. Each clock is presented as a standard POSIX clock. The ancillary clock features are exposed in two different ways, via the sysfs and by a character device. Signed-off-by: Richard Cochran richard.coch...@omicron.at --- Documentation/ABI/testing/sysfs-ptp | 107 ++ Documentation/ptp/ptp.txt | 94 + Documentation/ptp/testptp.c | 358 Documentation/ptp/testptp.mk| 33 +++ drivers/Kconfig |2 + drivers/Makefile|1 + drivers/ptp/Kconfig | 27 +++ drivers/ptp/Makefile|6 + drivers/ptp/ptp_chardev.c | 178 drivers/ptp/ptp_clock.c | 382 +++ drivers/ptp/ptp_private.h | 64 ++ drivers/ptp/ptp_sysfs.c | 235 + include/linux/Kbuild|1 + include/linux/ptp_clock.h | 79 +++ include/linux/ptp_clock_kernel.h| 139 + 15 files changed, 1706 insertions(+), 0 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-ptp create mode 100644 Documentation/ptp/ptp.txt create mode 100644 Documentation/ptp/testptp.c create mode 100644 Documentation/ptp/testptp.mk create mode 100644 drivers/ptp/Kconfig create mode 100644 drivers/ptp/Makefile create mode 100644 drivers/ptp/ptp_chardev.c create mode 100644 drivers/ptp/ptp_clock.c create mode 100644 drivers/ptp/ptp_private.h create mode 100644 drivers/ptp/ptp_sysfs.c create mode 100644 include/linux/ptp_clock.h create mode 100644 include/linux/ptp_clock_kernel.h diff --git a/Documentation/ABI/testing/sysfs-ptp b/Documentation/ABI/testing/sysfs-ptp new file mode 100644 index 000..47142ce --- /dev/null +++ b/Documentation/ABI/testing/sysfs-ptp @@ -0,0 +1,107 @@ +What: /sys/class/ptp/ +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This directory contains files and directories + providing a standardized interface to the ancillary + features of PTP hardware clocks. + +What: /sys/class/ptp/ptpN/ +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This directory contains the attributes of the Nth PTP + hardware clock registered into the PTP class driver + subsystem. + +What: /sys/class/ptp/ptpN/clock_id +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This file contains the POSIX clock ID (a non-negative + integer) corresponding to the PTP hardware clock. This + value may be passed as the first argument to the POSIX + clock and timer system calls. See man CLOCK_GETRES(2) + and TIMER_CREATE(2). + +What: /sys/class/ptp/ptpN/clock_name +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This file contains the name of the PTP hardware clock + as a human readable string. + +What: /sys/class/ptp/ptpN/max_adjustment +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This file contains the PTP hardware clock's maximum + frequency adjustment value (a positive integer) in + parts per billion. + +What: /sys/class/ptp/ptpN/n_alarms +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This file contains the number of periodic or one shot + alarms offer by the PTP hardware clock. + +What: /sys/class/ptp/ptpN/n_external_timestamps +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This file contains the number of external timestamp + channels offered by the PTP hardware clock. + +What: /sys/class/ptp/ptpN/n_periodic_outputs +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This file contains the number of programmable periodic + output channels offered by the PTP hardware clock. + +What: /sys/class/ptp/ptpN/pps_avaiable +Date: September 2010 +Contact: Richard Cochran richardcoch...@gmail.com +Description: + This file indicates whether the PTP hardware clock + supports a Pulse Per Second to the host CPU. Reading + 1 means that the PPS is supported, while 0 means + not supported. +
[PATCH 5/8] ptp: Added a simulated PTP hardware clock.
This patch adds a driver that simulates a PTP hardware clock. The driver serves as a simple example for writing real clock driver and can be used for testing the PTP clock API. The basic clock operations are implemented using the system clock, and the ancillary clock operations are simulated. Signed-off-by: Richard Cochran richard.coch...@omicron.at --- drivers/ptp/Kconfig | 14 drivers/ptp/Makefile|1 + drivers/ptp/ptp_linux.c | 165 +++ kernel/time/ntp.c |2 + 4 files changed, 182 insertions(+), 0 deletions(-) create mode 100644 drivers/ptp/ptp_linux.c diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig index 17be208..94f329f 100644 --- a/drivers/ptp/Kconfig +++ b/drivers/ptp/Kconfig @@ -24,4 +24,18 @@ config PTP_1588_CLOCK To compile this driver as a module, choose M here: the module will be called ptp. +config PTP_1588_CLOCK_LINUX + tristate Simulated PTP clock + depends on PTP_1588_CLOCK + help + This driver adds support for a simulated PTP clock. It + implements the basic clock operations by using the standard + Linux system time. The driver simulates the ancillary clock + operations. This clock can be used to test PTP programs + provided they use software time stamps for the PTP Ethernet + packets. + + To compile this driver as a module, choose M here: the module + will be called ptp_linux. + endmenu diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile index 480e2af..266d4f2 100644 --- a/drivers/ptp/Makefile +++ b/drivers/ptp/Makefile @@ -4,3 +4,4 @@ ptp-y := ptp_clock.o ptp_chardev.o ptp_sysfs.o obj-$(CONFIG_PTP_1588_CLOCK) += ptp.o +obj-$(CONFIG_PTP_1588_CLOCK_LINUX) += ptp_linux.o diff --git a/drivers/ptp/ptp_linux.c b/drivers/ptp/ptp_linux.c new file mode 100644 index 000..57b3da4 --- /dev/null +++ b/drivers/ptp/ptp_linux.c @@ -0,0 +1,165 @@ +/* + * PTP 1588 clock using the Linux system clock + * + * Copyright (C) 2010 OMICRON electronics GmbH + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ +#include linux/device.h +#include linux/err.h +#include linux/hrtimer.h +#include linux/init.h +#include linux/kernel.h +#include linux/module.h +#include linux/timex.h + +#include linux/ptp_clock_kernel.h + +static struct ptp_clock *linux_clock; + +DEFINE_SPINLOCK(adjtime_lock); + +static int ptp_linux_adjfreq(void *priv, s32 ppb) +{ + struct timex txc; + s64 tmp = ppb; + int err; + pr_debug(ptp_linux: adjfreq ppb=%d\n, ppb); + txc.freq = div_s64(tmp16, 1000); + txc.modes = ADJ_FREQUENCY; + err = do_adjtimex(txc); + return err 0 ? err : 0; +} + +static int ptp_linux_adjtime(void *priv, struct timespec *ts) +{ + s64 delta; + ktime_t now; + struct timespec t2; + unsigned long flags; + int err; + + delta = 10LL * ts-tv_sec + ts-tv_nsec; + + spin_lock_irqsave(adjtime_lock, flags); + + now = ktime_get_real(); + + now = delta 0 ? ktime_sub_ns(now, -delta) : ktime_add_ns(now, delta); + + t2 = ktime_to_timespec(now); + + err = do_settimeofday(t2); + + spin_unlock_irqrestore(adjtime_lock, flags); + + return err; +} + +static int ptp_linux_gettime(void *priv, struct timespec *ts) +{ + getnstimeofday(ts); + return 0; +} + +static int ptp_linux_settime(void *priv, struct timespec *ts) +{ + return do_settimeofday(ts); +} + +#define sim(x...) pr_warn(ptp_linux simulation: x) + +static int ptp_linux_enable(void *priv, struct ptp_clock_request *rq, int on) +{ + struct ptp_clock_event event; + ktime_t kt; + int i; + + switch (rq-type) { + + case PTP_CLK_REQ_EXTTS: + if (on) { + sim(enable external timestamped events\n); + for (i = 0; i 100; i++) { + kt = ktime_get_real(); + event.type = PTP_CLOCK_EXTTS; + event.index = 0; + event.timestamp = ktime_to_ns(kt); + ptp_clock_event(linux_clock, event); + } +
[PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
The eTSEC includes a PTP clock with quite a few features. This patch adds support for the basic clock adjustment functions, plus two external time stamps, one alarm, and the PPS callback. Signed-off-by: Richard Cochran richard.coch...@omicron.at --- Documentation/powerpc/dts-bindings/fsl/tsec.txt | 57 +++ arch/powerpc/boot/dts/mpc8313erdb.dts | 14 + arch/powerpc/boot/dts/mpc8572ds.dts | 14 + arch/powerpc/boot/dts/p2020ds.dts | 14 + arch/powerpc/boot/dts/p2020rdb.dts | 14 + drivers/net/Makefile|1 + drivers/net/gianfar_ptp.c | 447 +++ drivers/net/gianfar_ptp_reg.h | 113 ++ drivers/ptp/Kconfig | 13 + 9 files changed, 687 insertions(+), 0 deletions(-) create mode 100644 drivers/net/gianfar_ptp.c create mode 100644 drivers/net/gianfar_ptp_reg.h diff --git a/Documentation/powerpc/dts-bindings/fsl/tsec.txt b/Documentation/powerpc/dts-bindings/fsl/tsec.txt index edb7ae1..f6edbb8 100644 --- a/Documentation/powerpc/dts-bindings/fsl/tsec.txt +++ b/Documentation/powerpc/dts-bindings/fsl/tsec.txt @@ -74,3 +74,60 @@ Example: interrupt-parent = mpic; phy-handle = phy0 }; + +* Gianfar PTP clock nodes + +General Properties: + + - compatible Should be fsl,etsec-ptp + - reg Offset and length of the register set for the device + - interrupts There should be at least two interrupts. Some devices + have as many as four PTP related interrupts. + +Clock Properties: + + - tclk-period Timer reference clock period in nanoseconds. + - tmr-prsc Prescaler, divides the output clock. + - tmr-add Frequency compensation value. + - cksel0= external clock, 1= eTSEC system clock, 3= RTC clock input. + Currently the driver only supports choice 1. + - tmr-fiper1 Fixed interval period pulse generator. + - tmr-fiper2 Fixed interval period pulse generator. + - max-adj Maximum frequency adjustment in parts per billion. + + These properties set the operational parameters for the PTP + clock. You must choose these carefully for the clock to work right. + Here is how to figure good values: + + TimerOsc = system clock MHz + tclk_period = desired clock period nanoseconds + NominalFreq = 1000 / tclk_period MHz + FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0) + tmr_add = ceil(2^32 / FreqDivRatio) + OutputClock = NominalFreq / tmr_prsc MHz + PulseWidth = 1 / OutputClockmicroseconds + FiperFreq1 = desired frequency in Hz + FiperDiv1= 100 * OutputClock / FiperFreq1 + tmr_fiper1 = tmr_prsc * tclk_period * FiperDiv1 - tclk_period + max_adj = 10 * (FreqDivRatio - 1.0) - 1 + + The calculation for tmr_fiper2 is the same as for tmr_fiper1. The + driver expects that tmr_fiper1 will be correctly set to produce a 1 + Pulse Per Second (PPS) signal, since this will be offered to the PPS + subsystem to synchronize the Linux clock. + +Example: + + ptp_cl...@24e00 { + compatible = fsl,etsec-ptp; + reg = 0x24E00 0xB0; + interrupts = 12 0x8 13 0x8; + interrupt-parent = ipic ; + tclk-period = 10; + tmr-prsc= 100; + tmr-add = 0x99A4; + cksel = 0x1; + tmr-fiper1 = 0x3B9AC9F6; + tmr-fiper2 = 0x00018696; + max-adj = 65998; + }; diff --git a/arch/powerpc/boot/dts/mpc8313erdb.dts b/arch/powerpc/boot/dts/mpc8313erdb.dts index 183f2aa..85a7eaa 100644 --- a/arch/powerpc/boot/dts/mpc8313erdb.dts +++ b/arch/powerpc/boot/dts/mpc8313erdb.dts @@ -208,6 +208,20 @@ sleep = pmc 0x0030; }; + ptp_cl...@24e00 { + compatible = fsl,etsec-ptp; + reg = 0x24E00 0xB0; + interrupts = 12 0x8 13 0x8; + interrupt-parent = ipic ; + tclk-period = 10; + tmr-prsc= 100; + tmr-add = 0x99A4; + cksel = 0x1; + tmr-fiper1 = 0x3B9AC9F6; + tmr-fiper2 = 0x00018696; + max-adj = 65998; + }; + enet0: ether...@24000 { #address-cells = 1; #size-cells = 1; diff --git a/arch/powerpc/boot/dts/mpc8572ds.dts b/arch/powerpc/boot/dts/mpc8572ds.dts index cafc128..74208cd 100644 --- a/arch/powerpc/boot/dts/mpc8572ds.dts +++ b/arch/powerpc/boot/dts/mpc8572ds.dts @@ -324,6 +324,20 @@ }; }; + ptp_cl...@24e00 { + compatible
[PATCH 7/8] ptp: Added a clock driver for the IXP46x.
This patch adds a driver for the hardware time stamping unit found on the IXP465. The basic clock operations and an external trigger are implemented. Signed-off-by: Richard Cochran richard.coch...@omicron.at --- arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h | 78 ++ drivers/net/arm/ixp4xx_eth.c | 191 ++ drivers/ptp/Kconfig | 13 + drivers/ptp/Makefile |1 + drivers/ptp/ptp_ixp46x.c | 345 + 5 files changed, 628 insertions(+), 0 deletions(-) create mode 100644 arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h create mode 100644 drivers/ptp/ptp_ixp46x.c diff --git a/arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h b/arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h new file mode 100644 index 000..729a6b2 --- /dev/null +++ b/arch/arm/mach-ixp4xx/include/mach/ixp46x_ts.h @@ -0,0 +1,78 @@ +/* + * PTP 1588 clock using the IXP46X + * + * Copyright (C) 2010 OMICRON electronics GmbH + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef _IXP46X_TS_H_ +#define _IXP46X_TS_H_ + +#define DEFAULT_ADDEND 0xF029 +#define TICKS_NS_SHIFT 4 + +struct ixp46x_channel_ctl { + u32 Ch_Control; /* 0x40 Time Synchronization Channel Control */ + u32 Ch_Event; /* 0x44 Time Synchronization Channel Event */ + u32 TxSnapLo; /* 0x48 Transmit Snapshot Low Register */ + u32 TxSnapHi; /* 0x4C Transmit Snapshot High Register */ + u32 RxSnapLo; /* 0x50 Receive Snapshot Low Register */ + u32 RxSnapHi; /* 0x54 Receive Snapshot High Register */ + u32 SrcUUIDLo; /* 0x58 Source UUID0 Low Register */ + u32 SrcUUIDHi; /* 0x5C Sequence Identifier/Source UUID0 High */ +}; + +struct ixp46x_ts_regs { + u32 Control; /* 0x00 Time Sync Control Register */ + u32 Event; /* 0x04 Time Sync Event Register */ + u32 Addend; /* 0x08 Time Sync Addend Register */ + u32 Accum; /* 0x0C Time Sync Accumulator Register */ + u32 Test;/* 0x10 Time Sync Test Register */ + u32 Unused; /* 0x14 */ + u32 RSysTime_Lo; /* 0x18 RawSystemTime_Low Register */ + u32 RSysTimeHi; /* 0x1C RawSystemTime_High Register */ + u32 SysTimeLo; /* 0x20 SystemTime_Low Register */ + u32 SysTimeHi; /* 0x24 SystemTime_High Register */ + u32 TrgtLo; /* 0x28 TargetTime_Low Register */ + u32 TrgtHi; /* 0x2C TargetTime_High Register */ + u32 ASMSLo; /* 0x30 Auxiliary Slave Mode Snapshot Low */ + u32 ASMSHi; /* 0x34 Auxiliary Slave Mode Snapshot High */ + u32 AMMSLo; /* 0x38 Auxiliary Master Mode Snapshot Low */ + u32 AMMSHi; /* 0x3C Auxiliary Master Mode Snapshot High */ + + struct ixp46x_channel_ctl channel[3]; +}; + +/* 0x00 Time Sync Control Register Bits */ +#define TSCR_AMM (13) +#define TSCR_ASM (12) +#define TSCR_TTM (11) +#define TSCR_RST (10) + +/* 0x04 Time Sync Event Register Bits */ +#define TSER_SNM (13) +#define TSER_SNS (12) +#define TTIPEND (11) + +/* 0x40 Time Synchronization Channel Control Register Bits */ +#define MASTER_MODE (10) +#define TIMESTAMP_ALL (11) + +/* 0x44 Time Synchronization Channel Event Register Bits */ +#define TX_SNAPSHOT_LOCKED (10) +#define RX_SNAPSHOT_LOCKED (11) + +#endif diff --git a/drivers/net/arm/ixp4xx_eth.c b/drivers/net/arm/ixp4xx_eth.c index 6028226..eaff9dd 100644 --- a/drivers/net/arm/ixp4xx_eth.c +++ b/drivers/net/arm/ixp4xx_eth.c @@ -30,9 +30,12 @@ #include linux/etherdevice.h #include linux/io.h #include linux/kernel.h +#include linux/net_tstamp.h #include linux/phy.h #include linux/platform_device.h +#include linux/ptp_classify.h #include linux/slab.h +#include mach/ixp46x_ts.h #include mach/npe.h #include mach/qmgr.h @@ -67,6 +70,14 @@ #define RXFREE_QUEUE(port_id) (NPE_ID(port_id) + 26) #define TXDONE_QUEUE 31 +#define PTP_SLAVE_MODE 1 +#define PTP_MASTER_MODE2 +#define PORT2CHANNEL(p)1 +/* + * PHYSICAL_ID(p-id) ? + * TODO - Figure out correct mapping. + */ + /* TX Control Registers */ #define TX_CNTRL0_TX_EN0x01 #define TX_CNTRL0_HALFDUPLEX 0x02 @@ -171,6 +182,8 @@ struct port { int id; /*
[PATCH 8/8] ptp: Added a clock driver for the National Semiconductor PHYTER.
This patch adds support for the PTP clock found on the DP83640. The basic clock operations and one external time stamp have been implemented. Signed-off-by: Richard Cochran richard.coch...@omicron.at --- drivers/net/phy/Kconfig | 29 ++ drivers/net/phy/Makefile |1 + drivers/net/phy/dp83640.c | 887 + drivers/net/phy/dp83640_reg.h | 261 4 files changed, 1178 insertions(+), 0 deletions(-) create mode 100644 drivers/net/phy/dp83640.c create mode 100644 drivers/net/phy/dp83640_reg.h diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index eb799b3..2e6463d 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -77,6 +77,35 @@ config NATIONAL_PHY ---help--- Currently supports the DP83865 PHY. +config DP83640_PHY + tristate Driver for the National Semiconductor DP83640 PHYTER + depends on PTP_1588_CLOCK + depends on NETWORK_PHY_TIMESTAMPING + ---help--- + Supports the DP83640 PHYTER with IEEE 1588 features. + + This driver adds support for using the DP83640 as a PTP + clock. This clock is only useful if your PTP programs are + getting hardware time stamps on the PTP Ethernet packets + using the SO_TIMESTAMPING API. + + In order for this to work, your MAC driver must also + implement the skb_tx_timetamp() function. + +config DP83640_PHY_STATUS_FRAMES + bool DP83640 Status Frames + default y + depends on DP83640_PHY + ---help--- + This option allows the DP83640 PHYTER driver to obtain time + stamps from the PHY via special status frames, rather than + reading over the MDIO bus. Using status frames is therefore + more efficient. However, if enabled, this option will cause + the driver to add a mutlicast address to the MAC. + + Say Y here, unless your MAC does not support multicast + destination addresses. + config STE10XP depends on PHYLIB tristate Driver for STMicroelectronics STe10Xp PHYs diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 13bebab..2333215 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -19,6 +19,7 @@ obj-$(CONFIG_FIXED_PHY) += fixed.o obj-$(CONFIG_MDIO_BITBANG) += mdio-bitbang.o obj-$(CONFIG_MDIO_GPIO)+= mdio-gpio.o obj-$(CONFIG_NATIONAL_PHY) += national.o +obj-$(CONFIG_DP83640_PHY) += dp83640.o obj-$(CONFIG_STE10XP) += ste10Xp.o obj-$(CONFIG_MICREL_PHY) += micrel.o obj-$(CONFIG_MDIO_OCTEON) += mdio-octeon.o diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c new file mode 100644 index 000..4cabd0d --- /dev/null +++ b/drivers/net/phy/dp83640.c @@ -0,0 +1,887 @@ +/* + * Driver for the National Semiconductor DP83640 PHYTER + * + * Copyright (C) 2010 OMICRON electronics GmbH + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ +#include linux/ethtool.h +#include linux/kernel.h +#include linux/list.h +#include linux/mii.h +#include linux/module.h +#include linux/net_tstamp.h +#include linux/netdevice.h +#include linux/phy.h +#include linux/ptp_classify.h +#include linux/ptp_clock_kernel.h + +#include dp83640_reg.h + +#ifdef CONFIG_DP83640_PHY_STATUS_FRAMES +#define USE_STATUS_FRAMES +#endif + +#define DP83640_PHY_ID 0x20005ce1 +#define PAGESEL0x13 +#define LAYER4 0x02 +#define LAYER2 0x01 +#define MAX_RXTS 4 +#define MAX_TXTS 4 +#define N_EXT_TS 1 +#define PSF_PTPVER 2 +#define PSF_EVNT 0x4000 +#define PSF_RX 0x2000 +#define PSF_TX 0x1000 +#define EXT_EVENT 1 +#define EXT_GPIO 1 + +#if defined(__BIG_ENDIAN) +#define ENDIAN_FLAG0 +#elif defined(__LITTLE_ENDIAN) +#define ENDIAN_FLAGPSF_ENDIAN +#endif + +#define SKB_PTP_TYPE(__skb) (*(unsigned int *)((__skb)-cb)) + +struct phy_rxts { + u16 ns_lo; /* ns[15:0] */ + u16 ns_hi; /* overflow[1:0], ns[29:16] */ + u16 sec_lo; /* sec[15:0] */ + u16 sec_hi; /* sec[31:16] */ + u16 seqid; /* sequenceId[15:0] */ + u16 msgtype; /* messageType[3:0], hash[11:0] */ +}; + +struct phy_txts { + u16 ns_lo; /* ns[15:0] */ + u16
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 23 Sep 2010, Richard Cochran wrote: Support for obtaining timestamps from a PHC already exists via the SO_TIMESTAMPING socket option, integrated in kernel version 2.6.30. This patch set completes the picture by allow user space programs to adjust the PHC and to control its ancillary features. Is there a way to use the PHC as a system clock? I think the main benefit of PTP is to have syncronized time on multiple machines in a cluster. That may mean getting rid of ntp and using an in kernel PHC based way to sync time. So as far as the POSIX standard is concerned, offering a clock id to represent the PHC would be acceptable. Sure but what would you do with it? HPET timer support has no such need. 3.2.1 Using the POSIX Clock API Looking at the mapping from PHC operation to the POSIX clock API, we see that two of the basic clock operations, marked with *, have no POSIX equivalent. The items marked NA are peculiar to PHCs and will be discussed separately, below. Clock Operation POSIX function -+- Set time clock_gettime Get time clock_settime Shift the clock * Adjust clock frequency* -+- Time stamp external eventsNA Enable PPS events NA Periodic output signals NA One shot or periodic alarms timer_create, timer_settime In contrast to the standard Linux system clock, a PHC is adjustable in hardware, for example using frequency compensation registers or a VCO. The ability to directly tune the PHC is essential to reap the benefit of hardware timestamping. There is a reason for not being able to shift posix clocks: The system has one time base. The various clocks are contributing to maintaining that sytem wide time. I do not understand why you want to maintain different clocks running at different speeds. Certainly interesting for some uses I guess that I do not have the energy to imagine right now. But can we get the PTP killer feature of synchronized accurate system time first? 3.3 Synchronizing the Linux System Time One could offer a PHC as a combined clock source and clock event device. The advantage of this approach would be that it obviates the need for synchronization when the PHC is selected as the system timer. However, some PHCs, namely the PHY based clocks, cannot be used in this way. Why not? Do PHY based clock not at least provide a counter that increments in synchronized intervals throughout the network? Instead, the patch set provides a way to offer a Pulse Per Second (PPS) event from the PHC to the Linux PPS subsystem. A user space application can read the PPS events and tune the system clock, just like when using other external time sources like radio clocks or GPS. User space is subject to various latencies created by the OS etc. I would that in order to have fine grained (read microsecond) accurary we would have to run the portions that are relevant to obtaining the desired accuracy in the kernel. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 12/20] powerpc: change to new flag variables
On 20:19 Thu 23 Sep , Stephen Rothwell wrote: Hi Matt, On Wed, 22 Sep 2010 23:51:09 -0700 matt mooney m...@muteddisk.com wrote: Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y. This looks good. One comment below ... --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -1,10 +1,5 @@ -ifeq ($(CONFIG_PPC64),y) -EXTRA_CFLAGS += -mno-minimal-toc -endif - -ifeq ($(CONFIG_PPC_PSERIES_DEBUG),y) -EXTRA_CFLAGS += -DDEBUG -endif +ccflags-$(CONFIG_PPC64):= -mno-minimal-toc +ccflags-$(CONFIG_PPC_PSERIES_DEBUG)+= -DDEBUG obj-y := lpar.o hvCall.o nvram.o reconfig.o \ setup.o iommu.o event_sources.o ras.o \ @@ -23,7 +18,7 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += hotplug-memory.o obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o -obj-$(CONFIG_PHYP_DUMP)+= phyp_dump.o +obj-$(CONFIG_PHYP_DUMP)+= phyp_dump.o obj-$(CONFIG_CMM) += cmm.o obj-$(CONFIG_DTL) += dtl.o This looks like a spurious extra hunk. Hi Stephen, Yeah your right, logically it doesn't follow from my changeset. I should have left it alone, but it was the only line in the file that didn't align properly with its surrounding area. -mfm ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 23 Sep 2010, Jacob Keller wrote: There is a reason for not being able to shift posix clocks: The system has one time base. The various clocks are contributing to maintaining that sytem wide time. Adjusting clocks is absolutely essential for proper functioning of the PTP protocol. The slave obtains and calculates the offset from master and uses that in order to adjust the clock properly, The problem is that the timestamps are done via the hardware. We need a method to expose that hardware so that the ptp software can properly adjust those clocks. There is no way to use that clock directly to avoid all the user space tuning etc? There are already tuning mechanisms in the kernel that do this with system time based on periodic clocks. If you calculate the nanoseconds since the epoch then you should be able to use that to tune system time. I do not understand why you want to maintain different clocks running at different speeds. Certainly interesting for some uses I guess that I do not have the energy to imagine right now. But can we get the PTP killer feature of synchronized accurate system time first? The problem is maintaining a hardware clock at the correct speed/frequency and time. The timestamping is done via hardware, and that hardware clock needs to be accurate. We need to be able to modify that clock. Yes, having the system time be the same value would be nice, but the problem comes because we don't want to jump through hoops to keep that hardware clock accurate to the ptp protocol running on the network. Then allow system time == hardware clock? All of the necessary features for microsecond or better accuracy are done via the hardware. You can get accuracy to within 10 mircoseconds while only sending sync packets and such once per second. The reason is because the hardware timestamps are very accurate. But if we can't properly adjust the clocks time and frequency, we cannot maintain the accuracy of the timestamps. You can already adjust the system time with the existing APIs. Tuning hardware clocks is currently done using device specific controls. But I would think that you do not need to expose this to user space if you can do it all in kernel. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop
On Thu, 23 Sep 2010 13:53:18 +0200 Mikael Pettersson wrote: Running modules_install from a newly built 2.6.36-rc5 kernel on my 32-bit PowerMac results in: WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/busses/i2c-powermac.ko ignored, due to loop WARNING: Loop detected: /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko needs of_i2c.ko which needs i2c-core.ko again! WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-dev.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/of/of_i2c.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/sound/ppc/snd-powermac.ko ignored, due to loop grep '.*I2C.*=' .config CONFIG_OF_I2C=m CONFIG_I2C=m CONFIG_I2C_BOARDINFO=y CONFIG_I2C_CHARDEV=m CONFIG_I2C_POWERMAC=m I can't say exactly when this started, haven't built kernels on this box in a while. No kconfig warnings? Please post your full .config file. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] De-couple sysfs memory directories from memory sections
* Nathan Fontenot nf...@austin.ibm.com [2010-09-22 09:15:43]: This set of patches decouples the concept that a single memory section corresponds to a single directory in /sys/devices/system/memory/. On systems with large amounts of memory (1+ TB) there are performance issues related to creating the large number of sysfs directories. For a powerpc machine with 1 TB of memory we are creating 63,000+ directories. This is resulting in boot times of around 45-50 minutes for systems with 1 TB of memory and 8 hours for systems with 2 TB of memory. With this patch set applied I am now seeing boot times of 5 minutes or less. The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against all sibling directories to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. The solution solved by this patch set is to allow a single directory in sysfs to span multiple memory sections. This is controlled by an optional architecturally defined function memory_block_size_bytes(). The default definition of this routine returns a memory block size equal to the memory section size. This maintains the current layout of sysfs memory directories as it appears to userspace to remain the same as it is today. For architectures that define their own version of this routine, as is done for powerpc in this patchset, the view in userspace would change such that each memoryXXX directory would span multiple memory sections. The number of sections spanned would depend on the value reported by memory_block_size_bytes. In both cases a new file 'end_phys_index' is created in each memoryXXX directory. This file will contain the physical id of the last memory section covered by the sysfs directory. For the default case, the value in 'end_phys_index' will be the same as in the existing 'phys_index' file. What does this mean for memory hotplug or hotunplug? -- Three Cheers, Balbir ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 2010-09-23 at 12:53 -0500, Christoph Lameter wrote: On Thu, 23 Sep 2010, Richard Cochran wrote: In contrast to the standard Linux system clock, a PHC is adjustable in hardware, for example using frequency compensation registers or a VCO. The ability to directly tune the PHC is essential to reap the benefit of hardware timestamping. There is a reason for not being able to shift posix clocks: The system has one time base. The various clocks are contributing to maintaining that sytem wide time. I do not understand why you want to maintain different clocks running at different speeds. Certainly interesting for some uses I guess that I do not have the energy to imagine right now. But can we get the PTP killer feature of synchronized accurate system time first? This was my initial gut reaction as well, but in the end, I agree with Richard that in the case of one or multiple PTP hardware clocks, we really can't abstract over the different time domains. 3.3 Synchronizing the Linux System Time One could offer a PHC as a combined clock source and clock event device. The advantage of this approach would be that it obviates the need for synchronization when the PHC is selected as the system timer. However, some PHCs, namely the PHY based clocks, cannot be used in this way. Why not? Do PHY based clock not at least provide a counter that increments in synchronized intervals throughout the network? I really don't think the PTP clock can be used as a clocksource sanely. First, the hardware access is much to slow for system timekeeping. Second, there is the problem that the system time is a software clock, and adjustments made (like freq) are made in the layer that interprets the underlying hardware cycle counter. Adjustments made in PTP (in order to sync the network timestamps) are made at the hardware level. This would cause a disconnect between the hardware freq understood by the system time management code and the actual hardware freq. Richard, I'd actually strike this paragraph from the rational, as I feel it has the tendency to confuse as it suggests having the PHC as a clocksource is feasible when really it isn't. Or alternatively, maybe express more clearly why its not feasible, so it doesn't just seem like a minor design choice. thanks -john ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 2010-09-23 at 19:30 +0200, Richard Cochran wrote: Here is the sixth version of my patch set adding PTP hardware clock support to the Linux kernel. The main difference to v5 is that the character device interface has been replaced with one based on the posix clock system calls. The first three patches add necessary background support in the posix clock code. The last five add the new PTP hardware clock features. Previously, I had tried to present the posix clock changes all by themselves, but commentators asked to see the whole context. Richard, Its great to see this work continue and the patch set is shaping up nicely! There's still a few details to work out, but I think the remaining issues are relatively small. 3.2.3 Dynamic POSIX Clock IDs -- The reaction on the list to having a static id like CLOCK_PTP was mostly negative. However, the idea of generating a clock id dynamically seems to have gained acceptance. The general idea is to advertise the available clock ids to user space via sysfs. This patch set implements two different ways: /sys/class/timesource/name/id /sys/class/ptp/ptp_clock_X/id Note: I am not too sure that this is exactly what people imagined, but it is my best understanding so far. I gleaned two different ideas about where to offer the clock id. In order to keep just one way, I will be happy to remove the less popular one. So yea, I'm not a fan of the timesource sysfs interface. One, I think the name is poor (posix_clocks or something a little more specific would be an improvement), and second, I don't like the dictionary interface, where one looks up the clock by name. Instead, I think having the id hanging off the class driver is much better, as it allows mapping the actual hardware to the id more clearly. So I'd drop the timesource listing. And maybe change id to clock_id so its a little more clear what the id is for. 3.3 Synchronizing the Linux System Time One could offer a PHC as a combined clock source and clock event device. The advantage of this approach would be that it obviates the need for synchronization when the PHC is selected as the system timer. However, some PHCs, namely the PHY based clocks, cannot be used in this way. Again, I'd scratch this. What I think you might want to mention is that an application like NTP could use the PTP clockid much like NTP currently can be configured to use the RTC to steer the system time. Possibly the PTPd could just do this, reducing the number of deamons and avoiding mixing NTP up in what is really a different sync algorithm. Instead, the patch set provides a way to offer a Pulse Per Second (PPS) event from the PHC to the Linux PPS subsystem. A user space application can read the PPS events and tune the system clock, just like when using other external time sources like radio clocks or GPS. Forgive me for a bit of a tangent here: So while I think this PPS method is a neat idea, I'm a little curious how much of a difference the PPS method for syncing the clock would be over just a simple reading of the two clocks and correcting the offset. It seems much of it depends on the read latency of the PTP hardware vs the interrupt latency. Also the PTP clock granularity would effect the read accuracy (like on the RTC, you don't really know how close to the second boundary you are). Have you done any such measurements between the two methods? I just wonder if it would actually be something noticeable, and if its not, how much lighter this patch-set would be without the PPS connection. Again, this isn't super critical, just trying to make sure we don't end up adding a bunch of code that doesn't end up being used. Also PPS interrupts are awfully frequent, so systems concerned with power-saving and deep idles probably would like something that could be done at a more coarse interval. 3.5 User timers Using the POSIX clock API gived user space the possibility to create and use timers with timer_create and timer_settime. In the current patch set the kernel functionality is not implemented, since there are some issues to consider first. I see two ways to do about this. 1. Implement the functionality anew. This approach might end up duplicating similar code that already exists. Also, looking at the hrtimer code, getting user timers right seems to have a number of gotchas and thorny issues. 2. Reuse the hrtimer code. Since the hrtimer code uses a clock event device under the hood, it might be possible (in theory) to offer capable PHCs as clock event devices. However, the current hrtimers are hard-coded to the event device via a per-cpu global. Perhaps one could associate an event device with a
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 23 Sep 2010, john stultz wrote: This was my initial gut reaction as well, but in the end, I agree with Richard that in the case of one or multiple PTP hardware clocks, we really can't abstract over the different time domains. My (arguably still superficial) review of the source does not show anything that would make me reach that conclusion. I really don't think the PTP clock can be used as a clocksource sanely. First, the hardware access is much to slow for system timekeeping. The HPET or pit timesource are also quite slow these days. You only need access periodically to essentially tune the TSC ratio. Second, there is the problem that the system time is a software clock, and adjustments made (like freq) are made in the layer that interprets the underlying hardware cycle counter. Adjustments made in PTP (in order to sync the network timestamps) are made at the hardware level. From what I can see the PTP clocks are periodic hardware cycle counters like any other clock that we currently support. If its configurable enough then setup a hardware cycle counter that mimics nanoseconds since the epoch as closely as possible and use that to sync the TSC rate to. Makes it very easy. This would cause a disconnect between the hardware freq understood by the system time management code and the actual hardware freq. We can switch underlying clocks for system time already. We can adapt to a different hw frequency. But then I do not know why adjust the freq? I thought the point was that the periodic clock was network synchronized and can be used as the master clock for multiple machines? Richard, I'd actually strike this paragraph from the rational, as I feel it has the tendency to confuse as it suggests having the PHC as a clocksource is feasible when really it isn't. Or alternatively, maybe express more clearly why its not feasible, so it doesn't just seem like a minor design choice. Sorry but I still feel that this is pretty much a misguided approach that creates unnecessary layers in the kernel. The trivial easy approach was not done (copy a driver from drivers/clocksource, modify so that it programs access to a centralized periodic ptp signal and uses it for system sync). ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
On Thu, 23 Sep 2010, Richard Cochran wrote: +* Gianfar PTP clock nodes + +General Properties: + + - compatible Should be fsl,etsec-ptp + - reg Offset and length of the register set for the device + - interrupts There should be at least two interrupts. Some devices + have as many as four PTP related interrupts. + +Clock Properties: + + - tclk-period Timer reference clock period in nanoseconds. + - tmr-prsc Prescaler, divides the output clock. + - tmr-add Frequency compensation value. + - cksel0= external clock, 1= eTSEC system clock, 3= RTC clock input. + Currently the driver only supports choice 1. + - tmr-fiper1 Fixed interval period pulse generator. + - tmr-fiper2 Fixed interval period pulse generator. + - max-adj Maximum frequency adjustment in parts per billion. + + These properties set the operational parameters for the PTP + clock. You must choose these carefully for the clock to work right. + Here is how to figure good values: + + TimerOsc = system clock MHz + tclk_period = desired clock period nanoseconds + NominalFreq = 1000 / tclk_period MHz + FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0) + tmr_add = ceil(2^32 / FreqDivRatio) + OutputClock = NominalFreq / tmr_prsc MHz + PulseWidth = 1 / OutputClockmicroseconds + FiperFreq1 = desired frequency in Hz + FiperDiv1= 100 * OutputClock / FiperFreq1 + tmr_fiper1 = tmr_prsc * tclk_period * FiperDiv1 - tclk_period + max_adj = 10 * (FreqDivRatio - 1.0) - 1 Great stuff for clock synchronization... + The calculation for tmr_fiper2 is the same as for tmr_fiper1. The + driver expects that tmr_fiper1 will be correctly set to produce a 1 + Pulse Per Second (PPS) signal, since this will be offered to the PPS + subsystem to synchronize the Linux clock. Argh. And conceptually completely screwed up. Why go through the PPS subsystem if you can directly tune the system clock based on a number of the cool periodic clock features that you have above? See how the other clocks do that easily? Look into drivers/clocksource. Add it there. Please do not introduce useless additional layers for clock sync. Load these ptp clocks like the other regular clock modules and make them sync system time like any other clock. Really guys: I want a PTP solution! Now! And not some idiotic additional kernel layers that just pass bits around because its so much fun and screws up clock accurary in due to the latency noise introduced while having so much fun with the bits. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote: A new syscall is introduced that allows tuning of a POSIX clock. The syscall is implemented for four architectures: arm, blackfin, powerpc, and x86. The new syscall, clock_adjtime, takes two parameters, the clock ID, and a pointer to a struct timex. The semantics of the timex struct have been expanded by one additional mode flag, which allows an absolute offset correction. When specificied, the clock offset is immediately corrected by adding the given time value to the current time value. So I'd still split this patch up a little bit more. 1) Patch that implements the ADJ_SETOFFSET (*and its implementation*) in do_adjtimex. 2) Patch that adds the new syscall and clock_id multiplexing. 3) Patches that wire it up to the rest of the architectures (there's still a bunch missing here). And one little nit in the code: diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c index 9ca4973..446b566 100644 --- a/kernel/posix-timers.c +++ b/kernel/posix-timers.c @@ -197,6 +197,14 @@ static int common_timer_create(struct k_itimer *new_timer) return 0; } +static inline int common_clock_adj(const clockid_t which_clock, struct timex *t) +{ + if (CLOCK_REALTIME == which_clock) + return do_adjtimex(t); + else + return -EOPNOTSUPP; +} Would it make sense to point to the do_adjtimex() in the k_clock definition for CLOCK_REALTIME rather then conditionalizing it here? thanks -john ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT
From: Tiejun Chen tiejun.c...@windriver.com There exists a four line chunk of code, which when configured for 64 bit address space, can incorrectly set certain page flags during the TLB creation. It turns out that this is legacy code that is no longer required, but since it isn't obvious why this is legacy code or why it causes problems, the below description covers both in detail. For powerpc bootstrap, the physical memory (at most 768M), is mapped into the kernel space via the following path: MMU_init() | + adjust_total_lowmem() | + map_mem_in_cams() | + settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0); On settlbcam(), the kernel will create TLB entries according to the flag, PAGE_KERNEL_X. settlbcam() { ... TLBCAM[index].MAS1 = MAS1_VALID | MAS1_IPROT | MAS1_TSIZE(tsize) | MAS1_TID(pid); ^ These entries cannot be invalidated by the kernel since MAS1_IPROT is set on TLB property. ... if (flags _PAGE_USER) { TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR; TLBCAM[index].MAS3 |= ((flags _PAGE_RW) ? MAS3_UW : 0); } For classic BookE (flags _PAGE_USER) is 'zero' so it's fine. But on boards like the the Freescale P4080, we want to support 36-bit physical address on it. So the following options may be set: CONFIG_FSL_BOOKE=y CONFIG_PTE_64BIT=y CONFIG_PHYS_64BIT=y As a result, boards like the P4080 will introduce PTE format as Book3E. As per the file: arch/powerpc/include/asm/pgtable-ppc32.h * #elif defined(CONFIG_FSL_BOOKE) defined(CONFIG_PTE_64BIT) * #include asm/pte-book3e.h So PAGE_KERNEL_X is __pgprot(_PAGE_BASE | _PAGE_KERNEL_RWX) and the book3E version of _PAGE_KERNEL_RWX is defined with: (_PAGE_BAP_SW | _PAGE_BAP_SR | _PAGE_DIRTY | _PAGE_BAP_SX) Note the _PAGE_BAP_SR, which is also defined in the book3E _PAGE_USER: #define _PAGE_USER(_PAGE_BAP_UR | _PAGE_BAP_SR) /* Can be read */ So the possibility exists to wrongly assign the user MAS3_URWX bits to kernel (PAGE_KERNEL_X) address space via the following code fragment: if (flags _PAGE_USER) { TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR; TLBCAM[index].MAS3 |= ((flags _PAGE_RW) ? MAS3_UW : 0); } Here is a dump of the TLB info from Simics with the above code present: -- L2 TLB1 GT SSS UUU V I Row Logical PhysicalSS TLPID TID WIMGE XWR XWR F P V - - --- -- - - - --- --- - - - 0 c000-cfff 0-00fff 00 0 0 M XWR XWR 0 1 1 1 d000-dfff 01000-01fff 00 0 0 M XWR XWR 0 1 1 2 e000-efff 02000-02fff 00 0 0 M XWR XWR 0 1 1 Actually this conditional code was only used for two legacy functions: 1: support KGDB to set break point. KGDB already dropped this; now uses its core write to set break point. 2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE) for device IO space. This use case is also removed from the latest PowerPC kernel. So it looks like the deletion of these 4 lines of code was simply overlooked when the above two cases went away. With the code deleted, the TLB appears without U having XWR as below: --- L2 TLB1 GT SSS UUU V I Row Logical PhysicalSS TLPID TID WIMGE XWR XWR F P V - - --- -- - - - --- --- - - - 0 c000-cfff 0-00fff 00 0 0 M XWR 0 1 1 1 d000-dfff 01000-01fff 00 0 0 M XWR 0 1 1 2 e000-efff 02000-02fff 00 0 0 M XWR 0 1 1 Signed-off-by: Tiejun Chen tiejun.c...@windriver.com Signed-off-by: Paul Gortmaker paul.gortma...@windriver.com --- arch/powerpc/mm/fsl_booke_mmu.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c index d5fa5f2..9de7e1b 100644 --- a/arch/powerpc/mm/fsl_booke_mmu.c +++ b/arch/powerpc/mm/fsl_booke_mmu.c @@ -136,11 +136,6 @@ static void settlbcam(int index, unsigned long virt, phys_addr_t phys, if (mmu_has_feature(MMU_FTR_BIG_PHYS)) TLBCAM[index].MAS7 = (u64)phys 32; - if (flags _PAGE_USER) { - TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR; - TLBCAM[index].MAS3 |= ((flags _PAGE_RW) ? MAS3_UW : 0); - } - tlbcam_addrs[index].start = virt; tlbcam_addrs[index].limit = virt + size - 1; tlbcam_addrs[index].phys = phys; -- 1.7.2.1 ___ Linuxppc-dev mailing list
Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop
Randy Dunlap writes: On Thu, 23 Sep 2010 13:53:18 +0200 Mikael Pettersson wrote: Running modules_install from a newly built 2.6.36-rc5 kernel on my 32-bit PowerMac results in: WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/busses/i2c-powermac.ko ignored, due to loop WARNING: Loop detected: /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko needs of_i2c.ko which needs i2c-core.ko again! WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-dev.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/of/of_i2c.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/sound/ppc/snd-powermac.ko ignored, due to loop grep '.*I2C.*=' .config CONFIG_OF_I2C=m CONFIG_I2C=m CONFIG_I2C_BOARDINFO=y CONFIG_I2C_CHARDEV=m CONFIG_I2C_POWERMAC=m I can't say exactly when this started, haven't built kernels on this box in a while. No kconfig warnings? Not that I recall. I can check tomorrow if necessary. Please post your full .config file. # # Automatically generated make config: don't edit # # CONFIG_PPC64 is not set # # Processor support # CONFIG_PPC_BOOK3S_32=y # CONFIG_PPC_85xx is not set # CONFIG_PPC_8xx is not set # CONFIG_40x is not set # CONFIG_44x is not set # CONFIG_E200 is not set CONFIG_PPC_BOOK3S=y CONFIG_6xx=y CONFIG_PPC_FPU=y CONFIG_ALTIVEC=y CONFIG_PPC_STD_MMU=y CONFIG_PPC_STD_MMU_32=y # CONFIG_PPC_MM_SLICES is not set CONFIG_PPC_HAVE_PMU_SUPPORT=y # CONFIG_SMP is not set CONFIG_PPC32=y CONFIG_WORD_SIZE=32 # CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set CONFIG_MMU=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set # CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK is not set CONFIG_IRQ_PER_CPU=y CONFIG_NR_IRQS=64 CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_ARCH_HAS_ILOG2_U32=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_FIND_NEXT_BIT=y # CONFIG_ARCH_NO_VIRT_TO_BUS is not set CONFIG_PPC=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_NVRAM=y CONFIG_SCHED_OMIT_FRAME_POINTER=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_PPC_OF=y # CONFIG_PPC_UDBG_16550 is not set # CONFIG_GENERIC_TBSYNC is not set CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_DEFAULT_UIMAGE is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y # CONFIG_PPC_DCR_NATIVE is not set # CONFIG_PPC_DCR_MMIO is not set CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config CONFIG_CONSTRUCTORS=y # # General setup # # CONFIG_EXPERIMENTAL is not set CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE= CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_TINY_RCU is not set # CONFIG_RCU_TRACE is not set CONFIG_RCU_FANOUT=32 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=14 # CONFIG_SYSFS_DEPRECATED_V2 is not set # CONFIG_RELAY is not set # CONFIG_NAMESPACES is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_RD_GZIP=y # CONFIG_RD_BZIP2 is not set # CONFIG_RD_LZMA is not set # CONFIG_RD_LZO is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_EMBEDDED=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y # CONFIG_SIGNALFD is not set # CONFIG_TIMERFD is not set # CONFIG_EVENTFD is not set CONFIG_SHMEM=y # CONFIG_AIO is not set CONFIG_HAVE_PERF_EVENTS=y # # Kernel Performance Events And Counters # # CONFIG_PERF_EVENTS is not set # CONFIG_PERF_COUNTERS is not set # CONFIG_VM_EVENT_COUNTERS is not set CONFIG_PCI_QUIRKS=y # CONFIG_SLUB_DEBUG is not set # CONFIG_COMPAT_BRK is not set # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set # CONFIG_PROFILING is not set CONFIG_HAVE_OPROFILE=y # CONFIG_KPROBES is not set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_ARCH_TRACEHOOK=y CONFIG_HAVE_DMA_ATTRS=y CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y CONFIG_HAVE_DMA_API_DEBUG=y # # GCOV-based kernel profiling # # CONFIG_HAVE_GENERIC_DMA_COHERENT is not set CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_MODULES=y # CONFIG_MODULE_FORCE_LOAD is not set CONFIG_MODULE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_BLOCK=y # CONFIG_LBDAF
Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
Please do not introduce useless additional layers for clock sync. Load these ptp clocks like the other regular clock modules and make them sync system time like any other clock. I don't think you understand PTP. PTP has masters, a system can need to be honouring multiple conflicting masters at once. Really guys: I want a PTP solution! Now! And not some idiotic additional kernel layers that just pass bits around because its so much fun and screws up clock accurary in due to the latency noise introduced while having so much fun with the bits. There are some interesting complications in putting a PTP sync interface in kernel. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 2010-09-23 at 14:15 -0500, Christoph Lameter wrote: On Thu, 23 Sep 2010, john stultz wrote: This was my initial gut reaction as well, but in the end, I agree with Richard that in the case of one or multiple PTP hardware clocks, we really can't abstract over the different time domains. My (arguably still superficial) review of the source does not show anything that would make me reach that conclusion. I really don't think the PTP clock can be used as a clocksource sanely. First, the hardware access is much to slow for system timekeeping. The HPET or pit timesource are also quite slow these days. You only need access periodically to essentially tune the TSC ratio. If we're using the TSC, then we're not using the PTP clock as you suggest. Further the HPET and PIT aren't used to steer the system time when we are using the TSC as a clocksource. Its only used to calibrate the initial constant freq used by the timekeeping code (and if its non-constant, we throw it out). Second, there is the problem that the system time is a software clock, and adjustments made (like freq) are made in the layer that interprets the underlying hardware cycle counter. Adjustments made in PTP (in order to sync the network timestamps) are made at the hardware level. From what I can see the PTP clocks are periodic hardware cycle counters like any other clock that we currently support. If its configurable enough then setup a hardware cycle counter that mimics nanoseconds since the epoch as closely as possible and use that to sync the TSC rate to. Makes it very easy. I guess I'm confused by what you're suggesting. If we're using the TSC, then that's the clocksource timekeeping uses. The original issue seemed to be around the suggestion of using the PTP clock as a clocksource, which I don't think is really feasible. Again, that's because 1) The PTP access latency is slow (so is the PIT, true enough, but no one should be using the PIT as a clocksource unless they really have no better hardware - its really only useful for 486s and old freq scaling laptops that have no other stable clocksource). 2) The way PTP clocks are steered to sync with network time causes their hardware freq to actually change. Since these adjustments are done on the hardware clock level, and not on the system time level, the adjustments to sync the system time/freq would then be made incorrect by PTP hardware adjustments. 3) Further, the PTP hardware counter can be simply set to a new offset to put it in line with the network time. This could cause trouble with timekeeping much like unsynced TSCs do. Now, what you seem to be suggesting is to use the TSC (or whatever clocksource the system time is using) but to steer the system time using the PTP clock. This is actually what is being proposed, however, the steering is done in userland. This is due to the fact that there are two components to the steering, 1) adjusting the PTP clock hardware to network time and 2) adjusting the system time to the PTP hardware. By exposing the PTP clock to userland via the posix clocks interface, we allow this to easily be done. This would cause a disconnect between the hardware freq understood by the system time management code and the actual hardware freq. We can switch underlying clocks for system time already. We can adapt to a different hw frequency. Actually no. The timekeeping code requires a fixed freq counter. Dealing with hardware freq changes is difficult, because error is introduced by the latency between when the freq changes and when the timekeeping code is notified of it. So the system treats the hardware counters as fixed freq. Now, hardware does vary freq ever so slightly as thermal conditions change, but this is addressed in userland and corrected via adjtimex. But then I do not know why adjust the freq? I thought the point was that the periodic clock was network synchronized and can be used as the master clock for multiple machines? Not parsing that. What do you mean by periodic clock? Richard, I'd actually strike this paragraph from the rational, as I feel it has the tendency to confuse as it suggests having the PHC as a clocksource is feasible when really it isn't. Or alternatively, maybe express more clearly why its not feasible, so it doesn't just seem like a minor design choice. Sorry but I still feel that this is pretty much a misguided approach that creates unnecessary layers in the kernel. Unnecessary layers? Where? This approach has less in-kernel layers, as it exposes the PTP clock to userland, instead of trying to layer things on top of it and stretching the system time abstraction to cover it. The trivial easy approach was not done (copy a driver from drivers/clocksource, modify so that it programs access to a centralized periodic ptp signal and uses it for system sync). I disagree. I've argued through the approach trying to keep it all internal to the kernel,
Re: [PATCH 2/2] PPC4xx: Merge xor.h and dma.h into onefile ppc440spe-dma.h
On Fri, Sep 17, 2010 at 6:42 PM, tma...@apm.com wrote: From: Tirumala Marri tma...@apm.com This patch combines drivers/dma/ppc4xx/xor.h and driver/dma/dma/ppc4xx/dma.h into drivers/dma/ppc4xx/ppx440spe-dma.h . Is this just code churn, or do we gain anything by combining these header files? Don't add ppc440spe- back to the prefix, we're already in the ppc4xx directory, unless the file will only contain definitions that are relevant to ppc440spe. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c
On 9/17/2010 6:42 PM, tma...@apm.com wrote: From: Tirumala Marritma...@apm.com This patch generalizes the existing drver/dma/ppc4xx/adma.c, so that common code can be shared between different similar DMA engine drivers in other SoCs. Signed-off-by: Tirumala R Marritma...@apm.com --- drivers/dma/ppc4xx/adma.c| 4370 +++--- drivers/dma/ppc4xx/adma.h| 116 +- drivers/dma/ppc4xx/ppc4xx-adma.h | 4020 +++ 3 files changed, 4357 insertions(+), 4149 deletions(-) create mode 100644 drivers/dma/ppc4xx/ppc4xx-adma.h Will both versions of this driver exist in the same kernel build? For example the iop-adma driver supports iop13xx and iop3xx, but we select the archtitecture at build time? Or, as I assume in this case, will the two (maybe more?) ppc4xx adma drivers all be built in the same image, more like ioatdma? In the latter case I would recommend a file structure like: drivers/dma/ppc4xx/adma.c drivers/dma/ppc4xx/adma_440spe.c drivers/dma/ppc4xx/adma_460ex.c With patches to move the chipset specific pieces to their own file. Minimizing the code churn in adma.c, or at least showing a progression of what is unique and needs to be moved. This would be similar to how ioatdma is structured and compiles a single driver to cover the three major hardware revisions. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
On Thu, 23 Sep 2010, Alan Cox wrote: Please do not introduce useless additional layers for clock sync. Load these ptp clocks like the other regular clock modules and make them sync system time like any other clock. I don't think you understand PTP. PTP has masters, a system can need to be honouring multiple conflicting masters at once. The upshot of it all has to be some synchronized notion of time regardless of how many other things are going on under the hood. And the spec here suggests a hardware able to generate periodic accurate events that can be used to sync system time. Really guys: I want a PTP solution! Now! And not some idiotic additional kernel layers that just pass bits around because its so much fun and screws up clock accurary in due to the latency noise introduced while having so much fun with the bits. There are some interesting complications in putting a PTP sync interface in kernel. If the PTP logic internally has to juggle multiple clocks then that is a complication for the driver ok. In any case the driver ultimately has to provide *one* source of time for the system to sync to. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT
On Thu, 23 Sep 2010 16:10:15 -0400 Paul Gortmaker paul.gortma...@windriver.com wrote: So the possibility exists to wrongly assign the user MAS3_URWX bits to kernel (PAGE_KERNEL_X) address space via the following code fragment: if (flags _PAGE_USER) { TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR; TLBCAM[index].MAS3 |= ((flags _PAGE_RW) ? MAS3_UW : 0); } Here is a dump of the TLB info from Simics with the above code present: -- L2 TLB1 GT SSS UUU V I Row Logical PhysicalSS TLPID TID WIMGE XWR XWR F P V - - --- -- - - - --- --- - - - 0 c000-cfff 0-00fff 00 0 0 M XWR XWR 0 1 1 1 d000-dfff 01000-01fff 00 0 0 M XWR XWR 0 1 1 2 e000-efff 02000-02fff 00 0 0 M XWR XWR 0 1 1 Actually this conditional code was only used for two legacy functions: 1: support KGDB to set break point. KGDB already dropped this; now uses its core write to set break point. 2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE) for device IO space. This use case is also removed from the latest PowerPC kernel. io_block_mapping() went away, but the feature itself is still useful and might come back with something like this: http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg33851.html ...though I'm not sure why such mappings would ever have user access. This could end up being used for large user pages by something like hugetlbfs or KVM, though. I don't think we want to make large user pages fail, especailly if it just happens with the 32-bit page table format (which i may not what the person adding such a feature tests with). I don't see a generic accessor that can test PTE flags for user access -- in the absence of one, I guess we need an ifdef here. Or at least put in a comment so anyone who adds a userspace use knows they need to fix it. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 2010-09-23 at 21:36 +0100, Alan Cox wrote: So as far as the POSIX standard is concerned, offering a clock id to represent the PHC would be acceptable. But completely useless as you may have more than one entirely different time managed by PTP and in which you are not master but must work with the timebases provided. I don't see how this is a problem, as it exposes the multiple hardware clocks via different posix clock ids. So in the boundary clock case, you can configure which side is the client and which side is the master in a config file and the PTPd will appropriately steer them individually. /sys/class/timesource/name/id /sys/class/ptp/ptp_clock_X/id Note: I am not too sure that this is exactly what people imagined, but it is my best understanding so far. I gleaned two different ideas about where to offer the clock id. In order to keep just one way, I will be happy to remove the less popular one. I see no fix proposed for the race condition I pointed out. This doesn't work. So, if I recall this was: How do you keep the module from unloading while its being used? There may need to be proper locking for unregistering the posix clock_id on module unload, but I don't think we need a use-count to prevent the module from being unloaded. My question would be: How do we handle a USB network device ($14.99 now with PTP!) being unplugged? We can't say Sorry! That's in use!. So we note the hardware is gone, and return the proper error code. Or am I missing something else? If the Linux system time is synchronized to the PHC via the PPS To which PHC we can have several + Intel IXP465 - Auxiliary Slave/Master Mode Snapshot (optional interrupt) - Target Time (optional interrupt) And about 40 already supported by char driver interface clocks and rtcs in the kernel... And those char driver interfaces are all subtly different. I actually recently submitted an RFC to expose the RTC devices via the posix clock/timer interface, because working with the RTC hardware device directly is terrible for managing alarm interrupts. For instance, you easily run into the case where your TV recording application programs an alarm to record your favorite show at 8pm. Then your backup script programs an alarm to wake up at 2am to do your nightly backups. Your box suspends and the next morning, you're missing your favorite show! I'd say the inability to have multiple clocks and the race condition because of the clockid stuff leaves the proposal dead in the water. It also ignores the existing APIs we have floating around attached to devices. You need to make one small important change. You need to take the POSIX crap about enumerating things out and shoot it, bury it at a crossroads and sprinkle holy water on it. We agree the list-by-name stuff isn't the way to go. :) Drop the clockid_t and swap it for a file handle like a proper Unix or Linux interface. The rest is much the same fd = open /sys/class/timesource/[whatever] various queries you may want to do to check the name etc fclock_adjtime(fd, ...) The posix interface is fundamentally flawed. It only works for staticly enumerable objects. Unix avoided that forty years ago by making the identifier a handle which immediately cures all your object lifetime problems in one swoop. So, I don't really see how that's so different from what is being proposed. The clock_id is dynamically assigned per registered clock, and exposed via the sysfs interface from ptp hardware entry. The only difference is the open/close reference counting, which I don't think is necessary here (since we can't always keep the hardware from going away). thanks -john ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 23 Sep 2010, john stultz wrote: The HPET or pit timesource are also quite slow these days. You only need access periodically to essentially tune the TSC ratio. If we're using the TSC, then we're not using the PTP clock as you suggest. Further the HPET and PIT aren't used to steer the system time when we are using the TSC as a clocksource. Its only used to calibrate the initial constant freq used by the timekeeping code (and if its non-constant, we throw it out). There is no other scalable time source available for fast timer access than the time stamp counter in the cpu. Other time source require memory accesses which is inherently slower. An accurate other time source is used to adjust this clock. NTP does that via the clock interfaces from user space which has its problems with accuracy. PTP can provide the network synced time access that would a more accurate calibration of the time. 2) The way PTP clocks are steered to sync with network time causes their hardware freq to actually change. Since these adjustments are done on the hardware clock level, and not on the system time level, the adjustments to sync the system time/freq would then be made incorrect by PTP hardware adjustments. Right. So use these as a way to fine tune the TSC clock (and thereby the system time). 3) Further, the PTP hardware counter can be simply set to a new offset to put it in line with the network time. This could cause trouble with timekeeping much like unsynced TSCs do. You can do the same for system time. Now, what you seem to be suggesting is to use the TSC (or whatever clocksource the system time is using) but to steer the system time using the PTP clock. This is actually what is being proposed, however, the steering is done in userland. This is due to the fact that there are two components to the steering, 1) adjusting the PTP clock hardware to network time and 2) adjusting the system time to the PTP hardware. By exposing the PTP clock to userland via the posix clocks interface, we allow this to easily be done. Userland code would introduce latencies that would make sub microsecond time sync very difficult. We can switch underlying clocks for system time already. We can adapt to a different hw frequency. Actually no. The timekeeping code requires a fixed freq counter. Dealing with hardware freq changes is difficult, because error is introduced by the latency between when the freq changes and when the timekeeping code is notified of it. So the system treats the hardware counters as fixed freq. Now, hardware does vary freq ever so slightly as thermal conditions change, but this is addressed in userland and corrected via adjtimex. Acadmic hair splitting? I have repeatedly switched between different clocks on various systems. So its difficult but we do it? Unnecessary layers? Where? This approach has less in-kernel layers, as it exposes the PTP clock to userland, instead of trying to layer things on top of it and stretching the system time abstraction to cover it. You dont need the user APIs if you directly use the PTP time source to steer the system clock. In fact I think you have to do it in kernel space since user space latencies will degrade accuracy otherwise. I've argued through the approach trying to keep it all internal to the kernel, but to do so would be anything but trivial. Further, there's the case of master-clocks, where the PTP hardware must be synced to system time, instead of the other way around. And then there's the case of boundary-clocks, which may have multiple PTP hardware clocks that have to be synced. Ok maybe we need some sort of control interface to manage the clock like the others have. I think exposing this through the posix clock interface is really the best approach. Its not a static clockid, so its not something most apps will ever have to deal with, but it allows the few apps that really need to have access to the PTP clock hardware can do so in a clean way. It implies clock tuning in userspace for a potential sub microsecond accurate clock. The clock accuracy will be limited by user space latencies and noise. You wont be able to discipline the system clock accurately. The posix clocks today assumes one notion of real time in the kernel. All clocks increase in lockstep (aside from offset updates). This approach here result in multiple notions of time increasing at various speeds. And it implies that someone is user space is trying to tinker around with extremely low latencies using system call APIs that take much longer than these intervals to process the data. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
In contrast to the standard Linux system clock, a PHC is adjustable in hardware, for example using frequency compensation registers or a VCO. The ability to directly tune the PHC is essential to reap the benefit of hardware timestamping. There is a reason for not being able to shift posix clocks: The system has one time base. The various clocks are contributing to maintaining that sytem wide time. Adjusting clocks is absolutely essential for proper functioning of the PTP protocol. The slave obtains and calculates the offset from master and uses that in order to adjust the clock properly, The problem is that the timestamps are done via the hardware. We need a method to expose that hardware so that the ptp software can properly adjust those clocks. I do not understand why you want to maintain different clocks running at different speeds. Certainly interesting for some uses I guess that I do not have the energy to imagine right now. But can we get the PTP killer feature of synchronized accurate system time first? The problem is maintaining a hardware clock at the correct speed/frequency and time. The timestamping is done via hardware, and that hardware clock needs to be accurate. We need to be able to modify that clock. Yes, having the system time be the same value would be nice, but the problem comes because we don't want to jump through hoops to keep that hardware clock accurate to the ptp protocol running on the network. Instead, the patch set provides a way to offer a Pulse Per Second (PPS) event from the PHC to the Linux PPS subsystem. A user space application can read the PPS events and tune the system clock, just like when using other external time sources like radio clocks or GPS. User space is subject to various latencies created by the OS etc. I would that in order to have fine grained (read microsecond) accurary we would have to run the portions that are relevant to obtaining the desired accuracy in the kernel. All of the necessary features for microsecond or better accuracy are done via the hardware. You can get accuracy to within 10 mircoseconds while only sending sync packets and such once per second. The reason is because the hardware timestamps are very accurate. But if we can't properly adjust the clocks time and frequency, we cannot maintain the accuracy of the timestamps. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
O I don't see how this is a problem, as it exposes the multiple hardware clocks via different posix clock ids. So in the boundary clock case, you can configure which side is the client and which side is the master in a config file and the PTPd will appropriately steer them individually. They may all be slaves - that means you can't treat them as part of system time. on module unload, but I don't think we need a use-count to prevent the module from being unloaded. My question would be: How do we handle a USB network device ($14.99 now with PTP!) being unplugged? We can't say Sorry! That's in use!. So we note the hardware is gone, and return the proper error code. Or am I missing something else? Open list Oh number 31 appears to be the device I want Close list USB unplugged Random other device plugged clock_op(31, ) Oh bugger I've just reprogrammed the wrong time source. We don't have stop the device being removed, instead of a disaster you get clock_op(fd, blah) -ENODEV which btw is how just about everything else USB works when you pull the hardware. And about 40 already supported by char driver interface clocks and rtcs in the kernel... And those char driver interfaces are all subtly different. I actually recently submitted an RFC to expose the RTC devices via the posix clock/timer interface, because working with the RTC hardware device directly is terrible for managing alarm interrupts. Given that driver interfaces are sane and posix clock/timer interfaces have totally broken enumeration maybe you have it backwards. But if you follow through to my proposal maybe there is a saner answer still For instance, you easily run into the case where your TV recording application programs an alarm to record your favorite show at 8pm. Then your backup script programs an alarm to wake up at 2am to do your nightly backups. Your box suspends and the next morning, you're missing your favorite show! Poor resource management, and yes I'd agree you want a sensible interface. Drop the clockid_t and swap it for a file handle like a proper Unix or Linux interface. The rest is much the same fd = open /sys/class/timesource/[whatever] various queries you may want to do to check the name etc fclock_adjtime(fd, ...) The posix interface is fundamentally flawed. It only works for staticly enumerable objects. Unix avoided that forty years ago by making the identifier a handle which immediately cures all your object lifetime problems in one swoop. So, I don't really see how that's so different from what is being proposed. The clock_id is dynamically assigned per registered clock, and exposed via the sysfs interface from ptp hardware entry. The only difference is the open/close reference counting, which I don't think is necessary here (since we can't always keep the hardware from going away). It is absolutely neccessary in order that you can be sure that two calls actually relate to the *same* device. It's as fundamental as the difference betweeh chmod and fchmod although with the added ugliness of some random numeric identifier stuck in the middle. It also btw makes it much easier to fix up the existing random collection of /dev/rtc devices - because you can open them and issue fclock_adjtime if we are careful how we do it and it makes sense. Alan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
There is no other scalable time source available for fast timer access than the time stamp counter in the cpu. Other time source require memory accesses which is inherently slower. On what hardware ? An accurate other time source is used to adjust this clock. NTP does that via the clock interfaces from user space which has its problems with accuracy. PTP can provide the network synced time access that would a more accurate calibration of the time. Accuracy does not require speed of access. Accuracy requires predictible latency of access. Userland code would introduce latencies that would make sub microsecond time sync very difficult. You can take a multiple micro-second I/O stall or SMI trap on a PC so you already lost the battle on the platform you seem to be discussing. You dont need the user APIs if you directly use the PTP time source to steer the system clock. In fact I think you have to do it in kernel space since user space latencies will degrade accuracy otherwise. PTP is not a 'time source' it is one or more source of time. The distinction is rather important. It implies clock tuning in userspace for a potential sub microsecond accurate clock. The clock accuracy will be limited by user space latencies and noise. You wont be able to discipline the system clock accurately. Noise matters, latency doesn't. And the kernel is getting more and more real time support all the time. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
Alan Cox wrote: Please do not introduce useless additional layers for clock sync. Load these ptp clocks like the other regular clock modules and make them sync system time like any other clock. I don't think you understand PTP. PTP has masters, a system can need to be honouring multiple conflicting masters at once. AFAIK the master's should not be conflicting. The Best Master Clock algorithm (BMC) defined in IEEE1588 selects the best master clock. This clock distributes its notion of time on the network while the other masters, that is the other clocks/nodes that are configured to potentially become a master, keep quiet. So usually we will only have one source of time (the master clock selected by the BMC) and we will steer our single PHC (PTP hardware clock) to follow this master (Of course there may be use-cases that require more than one PTP clock, e.g., for research purposes). However, if the clock selected by the BMC is switched off, loses its network connection..., the second best clock is selected by the BMC and becomes master. This clock may be less accurate and thus our slave clock has to switch from one notion of time to another. Is that the conflict you mentioned? Christian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
Alan Cox wrote: It implies clock tuning in userspace for a potential sub microsecond accurate clock. The clock accuracy will be limited by user space latencies and noise. You wont be able to discipline the system clock accurately. Noise matters, latency doesn't. Well put! That's why we need hardware support for PTP timestamping to reduce the noise, but get along well with the clock servo that is steering the PHC in user space. Christian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 2010-09-23 at 15:49 -0500, Christoph Lameter wrote: On Thu, 23 Sep 2010, john stultz wrote: The HPET or pit timesource are also quite slow these days. You only need access periodically to essentially tune the TSC ratio. If we're using the TSC, then we're not using the PTP clock as you suggest. Further the HPET and PIT aren't used to steer the system time when we are using the TSC as a clocksource. Its only used to calibrate the initial constant freq used by the timekeeping code (and if its non-constant, we throw it out). There is no other scalable time source available for fast timer access than the time stamp counter in the cpu. Other time source require memory accesses which is inherently slower. Right, but no one likes the HPET or ACPI PM for a clocksource, its just the TSC isn't usable in some cases, so they have to be used. We don't want to force folks to decide between closely sycned time and fast time reads. So that is part of the reason why PTP as a clocksource isn't a good idea. An accurate other time source is used to adjust this clock. NTP does that via the clock interfaces from user space which has its problems with accuracy. PTP can provide the network synced time access that would a more accurate calibration of the time. Calibration isn't whats needed here (it is an issue, but a separate one - and I've got some patches if you're interested!) as its a one-time source of error and can be corrected by ntp today without trouble. Adjustments to the system time is something that has to be done continuously to handle for variable thermal drift over time. 2) The way PTP clocks are steered to sync with network time causes their hardware freq to actually change. Since these adjustments are done on the hardware clock level, and not on the system time level, the adjustments to sync the system time/freq would then be made incorrect by PTP hardware adjustments. Right. So use these as a way to fine tune the TSC clock (and thereby the system time). So you're then not suggesting to use the PTP as a clocksource. Using the PTP hardware to adjust the system time freq is exactly whats being proposed. 3) Further, the PTP hardware counter can be simply set to a new offset to put it in line with the network time. This could cause trouble with timekeeping much like unsynced TSCs do. You can do the same for system time. Settimeofday does allow CLOCK_REALTIME to jump, but the CLOCK_MONOTONIC time cannot jump around. Having a clocksource that is non-monotonic would break this. Now, what you seem to be suggesting is to use the TSC (or whatever clocksource the system time is using) but to steer the system time using the PTP clock. This is actually what is being proposed, however, the steering is done in userland. This is due to the fact that there are two components to the steering, 1) adjusting the PTP clock hardware to network time and 2) adjusting the system time to the PTP hardware. By exposing the PTP clock to userland via the posix clocks interface, we allow this to easily be done. Userland code would introduce latencies that would make sub microsecond time sync very difficult. The design actually avoids most userland induced latency. 1) On the PTP hardware syncing point, the reference packet gets timestamped with the PTP hardware time on arrival. This allows the offset calculation to be done in userland without introducing latency. 2) On the system syncing side, the proposal for the PPS interrupt allows the PTP hardware to trigger an interrupt on the second boundary that would take a timestamp of the system time. Then the pps interface allows for the timestamp to be read from userland allowing the offset to be calculated without introducing additional latency. We can switch underlying clocks for system time already. We can adapt to a different hw frequency. Actually no. The timekeeping code requires a fixed freq counter. Dealing with hardware freq changes is difficult, because error is introduced by the latency between when the freq changes and when the timekeeping code is notified of it. So the system treats the hardware counters as fixed freq. Now, hardware does vary freq ever so slightly as thermal conditions change, but this is addressed in userland and corrected via adjtimex. Acadmic hair splitting? I have repeatedly switched between different clocks on various systems. So its difficult but we do it? Sure, we handle the fairly-rare case of switching clocksources. And that introduces a bit of error each time. But one doesn't expect to be switching clock-sources every second and still keep synced time. Unnecessary layers? Where? This approach has less in-kernel layers, as it exposes the PTP clock to userland, instead of trying to layer things on top of it and stretching the system time abstraction to cover it. You dont need the user APIs if you directly use the PTP time source to
Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT
On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote: I don't see a generic accessor that can test PTE flags for user access -- in the absence of one, I guess we need an ifdef here. Or at least put in a comment so anyone who adds a userspace use knows they need to fix it. We could make up one in powerpc arch at least #define pte_user(val) ((val _PAGE_USER) == _PAGE_USER) would do Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ppc44x - how do i optimize driver for tlb hits
On Thu, 2010-09-23 at 10:12 -0500, Ayman El-Khashab wrote: I've implemented a working driver on my 460EX. it allocates a couple of buffers of 4MB each. I have a custom memcmp algorithm in asm that is extremely fast in user space, but 1/2 as fast when run on these buffers. my tests are showing that the algorithm seems to be memory bandwidth bound. my guess is that i am having tlb or cache misses (my algo uses the dbct) that is slowing performance. curiously when in user space, i can affect the performance by small changes in the size of the buffer, i.e. 4MB + 32B is fast, 4MB + 4K is much worse. Can i adjust my driver code that is using kmalloc to make sure that the ppc44x has 4MB tlb entries for these and that they stay put? Anything you allocate with kmalloc() is going to be mapped by bolted 256M TLB entries, so there should be no TLB misses happening in the kernel case. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
On Thu, 2010-09-23 at 22:30 +0100, Alan Cox wrote: O I don't see how this is a problem, as it exposes the multiple hardware clocks via different posix clock ids. So in the boundary clock case, you can configure which side is the client and which side is the master in a config file and the PTPd will appropriately steer them individually. They may all be slaves - that means you can't treat them as part of system time. Sure, and that's something one would configure. So I'm not sure I see how exposing the different hardware bits via a clock_id is problematic. They're just clocks that are being exposed. The steering of system time to PTP or PTP to system time (or just PTP to other PTP clocks). on module unload, but I don't think we need a use-count to prevent the module from being unloaded. My question would be: How do we handle a USB network device ($14.99 now with PTP!) being unplugged? We can't say Sorry! That's in use!. So we note the hardware is gone, and return the proper error code. Or am I missing something else? Open list Oh number 31 appears to be the device I want Close list USB unplugged Random other device plugged clock_op(31, ) Oh bugger I've just reprogrammed the wrong time source. Ok. So its just the issue of clock_id reuse. I was confusing it with some sort of module use counting issue. And yea, I can see how it might be easier to re-use the file descriptor then re-implementing the reuse logic in the posix-clock registration. We don't have stop the device being removed, instead of a disaster you get clock_op(fd, blah) -ENODEV which btw is how just about everything else USB works when you pull the hardware. Right, which was what I was thinking as well, but assuming we didn't re-use clockids quickly. So, I don't really see how that's so different from what is being proposed. The clock_id is dynamically assigned per registered clock, and exposed via the sysfs interface from ptp hardware entry. The only difference is the open/close reference counting, which I don't think is necessary here (since we can't always keep the hardware from going away). It is absolutely neccessary in order that you can be sure that two calls actually relate to the *same* device. It's as fundamental as the difference betweeh chmod and fchmod although with the added ugliness of some random numeric identifier stuck in the middle. It also btw makes it much easier to fix up the existing random collection of /dev/rtc devices - because you can open them and issue fclock_adjtime if we are careful how we do it and it makes sense. Wait, you're suggesting we add new fclock_* calls that duplicate the posix interface? That doesn't sound great to me. What did you think of Kyle Moffett's suggestion of utilizing the fd to map to the clock_id which could then be used by the posix clocks interface? Although I'm still not sure if it wouldn't be so hard to just simply increment the id on each registration and index to a clock through a reasonably small hash table. I suspect that would solve the enumeration/reuse issue without much trouble (but again, I'm open to being corrected if I'm missing something larger). But yes, in summary, this is an issue to be addressed one way or another. thanks -john ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote: A new syscall is introduced that allows tuning of a POSIX clock. The syscall is implemented for four architectures: arm, blackfin, powerpc, and x86. The new syscall, clock_adjtime, takes two parameters, the clock ID, and a pointer to a struct timex. The semantics of the timex struct have been expanded by one additional mode flag, which allows an absolute offset correction. When specificied, the clock offset is immediately corrected by adding the given time value to the current time value. Any reason why you CC'ed device-tree discuss ? This list is getting way too much unrelated stuff, which I find annoying, it would be nice if we were all a bit more careful here with our CC lists. Cheers, Ben. Signed-off-by: Richard Cochran richard.coch...@omicron.at --- arch/arm/include/asm/unistd.h |1 + arch/arm/kernel/calls.S|1 + arch/blackfin/include/asm/unistd.h |3 +- arch/blackfin/mach-common/entry.S |1 + arch/powerpc/include/asm/systbl.h |1 + arch/powerpc/include/asm/unistd.h |3 +- arch/x86/ia32/ia32entry.S |1 + arch/x86/include/asm/unistd_32.h |3 +- arch/x86/include/asm/unistd_64.h |2 + arch/x86/kernel/syscall_table_32.S |1 + include/linux/posix-timers.h |3 + include/linux/syscalls.h |2 + include/linux/timex.h |3 +- kernel/compat.c| 136 +++- kernel/posix-cpu-timers.c |4 + kernel/posix-timers.c | 17 + 16 files changed, 130 insertions(+), 52 deletions(-) diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h index c891eb7..f58d881 100644 --- a/arch/arm/include/asm/unistd.h +++ b/arch/arm/include/asm/unistd.h @@ -396,6 +396,7 @@ #define __NR_fanotify_init (__NR_SYSCALL_BASE+367) #define __NR_fanotify_mark (__NR_SYSCALL_BASE+368) #define __NR_prlimit64 (__NR_SYSCALL_BASE+369) +#define __NR_clock_adjtime (__NR_SYSCALL_BASE+370) /* * The following SWIs are ARM private. diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S index 5c26ecc..430de4c 100644 --- a/arch/arm/kernel/calls.S +++ b/arch/arm/kernel/calls.S @@ -379,6 +379,7 @@ CALL(sys_fanotify_init) CALL(sys_fanotify_mark) CALL(sys_prlimit64) +/* 370 */CALL(sys_clock_adjtime) #ifndef syscalls_counted .equ syscalls_padding, ((NR_syscalls + 3) ~3) - NR_syscalls #define syscalls_counted diff --git a/arch/blackfin/include/asm/unistd.h b/arch/blackfin/include/asm/unistd.h index 14fcd25..79ad99b 100644 --- a/arch/blackfin/include/asm/unistd.h +++ b/arch/blackfin/include/asm/unistd.h @@ -392,8 +392,9 @@ #define __NR_fanotify_init 371 #define __NR_fanotify_mark 372 #define __NR_prlimit64 373 +#define __NR_clock_adjtime 374 -#define __NR_syscall 374 +#define __NR_syscall 375 #define NR_syscalls __NR_syscall /* Old optional stuff no one actually uses */ diff --git a/arch/blackfin/mach-common/entry.S b/arch/blackfin/mach-common/entry.S index af1bffa..ee68730 100644 --- a/arch/blackfin/mach-common/entry.S +++ b/arch/blackfin/mach-common/entry.S @@ -1631,6 +1631,7 @@ ENTRY(_sys_call_table) .long _sys_fanotify_init .long _sys_fanotify_mark .long _sys_prlimit64 + .long _sys_clock_adjtime .rept NR_syscalls-(.-_sys_call_table)/4 .long _sys_ni_syscall diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h index 3d21266..2485d8f 100644 --- a/arch/powerpc/include/asm/systbl.h +++ b/arch/powerpc/include/asm/systbl.h @@ -329,3 +329,4 @@ COMPAT_SYS(rt_tgsigqueueinfo) SYSCALL(fanotify_init) COMPAT_SYS(fanotify_mark) SYSCALL_SPU(prlimit64) +COMPAT_SYS_SPU(clock_adjtime) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index 597e6f9..85d5067 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -348,10 +348,11 @@ #define __NR_fanotify_init 323 #define __NR_fanotify_mark 324 #define __NR_prlimit64 325 +#define __NR_clock_adjtime 326 #ifdef __KERNEL__ -#define __NR_syscalls326 +#define __NR_syscalls327 #define __NR__exit __NR_exit #define NR_syscalls __NR_syscalls diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 518bb99..0ed7896 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -851,4 +851,5 @@ ia32_sys_call_table: .quad sys_fanotify_init .quad sys32_fanotify_mark .quad sys_prlimit64 /* 340 */ + .quad compat_sys_clock_adjtime ia32_syscall_end: diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
Re: [BUG 2.6.36-rc5] of_i2c.ko - i2c-core.ko dependency loop
On Thu, 23 Sep 2010 22:16:32 +0200 Mikael Pettersson wrote: Randy Dunlap writes: On Thu, 23 Sep 2010 13:53:18 +0200 Mikael Pettersson wrote: Running modules_install from a newly built 2.6.36-rc5 kernel on my 32-bit PowerMac results in: WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/busses/i2c-powermac.ko ignored, due to loop WARNING: Loop detected: /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko needs of_i2c.ko which needs i2c-core.ko again! WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-core.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/i2c/i2c-dev.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/drivers/of/of_i2c.ko ignored, due to loop WARNING: Module /lib/modules/2.6.36-rc5/kernel/sound/ppc/snd-powermac.ko ignored, due to loop grep '.*I2C.*=' .config CONFIG_OF_I2C=m CONFIG_I2C=m CONFIG_I2C_BOARDINFO=y CONFIG_I2C_CHARDEV=m CONFIG_I2C_POWERMAC=m I can't say exactly when this started, haven't built kernels on this box in a while. No kconfig warnings? Not that I recall. I can check tomorrow if necessary. No kconfig warnings. I checked with your .config file. Please post your full .config file. Just a matter of module i2c-core calls of_ functions and module of_i2c calls i2c_ functions. Hmph. Something for Grant, Jean, and Ben to work out. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] fsldma: add support to 36-bit physical address
On Tue, Sep 21, 2010 at 10:41 PM, Kumar Gala ga...@kernel.crashing.org wrote: On Sep 21, 2010, at 5:34 PM, Timur Tabi wrote: On Tue, Sep 21, 2010 at 5:17 PM, Scott Wood scottw...@freescale.com wrote: It needs to be the actual device that is performing the DMA -- the platform may need to do things such as IOMMU manipulation where knowing the device matters. Ok, this all makes sense. So it appears that the patch is valid, at least in theory. I would like to see some testing of it, but I realize that may be too difficult. There's no easy way to force an allocation above 4GB. I think the patch is pretty safe w/o testing. However I agree we need a better solution to testing 36-bit addressing. I'll take that as an acked-by, but I'll wait for the next version of the patch with the completed changelog before acting on it. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v1 3/4] PPC4xx: New file with SoC specific functions
From: Tirumala Marri tma...@apm.com This patch creates new file with SoC dependent functions. Signed-off-by: Tirumala R Marri tma...@apm.com --- V1: * Remove all 440SPe specific references. * Move some of the code from header file to c file. --- drivers/dma/ppc4xx/ppc4xx-adma.c | 1658 ++ 1 files changed, 1658 insertions(+), 0 deletions(-) create mode 100644 drivers/dma/ppc4xx/ppc4xx-adma.c diff --git a/drivers/dma/ppc4xx/ppc4xx-adma.c b/drivers/dma/ppc4xx/ppc4xx-adma.c new file mode 100644 index 000..5a5da23 --- /dev/null +++ b/drivers/dma/ppc4xx/ppc4xx-adma.c @@ -0,0 +1,1658 @@ +/* + * Copyright (C) 2006-2009 DENX Software Engineering. + * + * Author: Yuri Tikhonov y...@emcraft.com + * + * Further porting to arch/powerpc by + * Anatolij Gustschin ag...@denx.de + * Tirumala R Marri tma...@apm.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 + * Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * The full GNU General Public License is included in this distribution in the + * file called COPYING. + */ + +/* + * This driver supports the asynchrounous DMA copy and RAID engines available + * on the AMCC PPC440SPe Processors. + * Based on the Intel Xscale(R) family of I/O Processors (IOP 32x, 33x, 134x) + * ADMA driver written by D.Williams. + */ + +#include linux/of.h +#include linux/of_platform.h +#include asm/dcr.h +#include asm/dcr-regs.h +#include linux/async_tx.h +#include linux/dma-mapping.h +#include linux/slab.h +#include adma.h +#if defined(CONFIG_440SPe) || defined(CONFIG_440SP) +#include ppc440spe-dma.h +#endif +#include ppc4xx-adma.h + +/* This array is used in data-check operations for storing a pattern */ +static char ppc4xx_qword[16]; +static atomic_t ppc4xx_adma_err_irq_ref; +static unsigned int ppc4xx_mq_dcr_len; + +/* These are used in enable check routines + */ +static u32 ppc4xx_r6_enabled; +static struct completion ppc4xx_r6_test_comp; + +static struct page *ppc4xx_rxor_srcs[32]; + +static dcr_host_t ppc4xx_mq_dcr_host; +/* Pointer to DMA0, DMA1 CP/CS FIFO */ +static void *ppc4xx_dma_fifo_buf; + +static char *ppc_adma_errors[] = { + [PPC_ADMA_INIT_OK] = ok, + [PPC_ADMA_INIT_MEMRES] = failed to get memory resource, + [PPC_ADMA_INIT_MEMREG] = failed to request memory region, + [PPC_ADMA_INIT_ALLOC] = failed to allocate memory for adev + structure, + [PPC_ADMA_INIT_COHERENT] = failed to allocate coherent memory for + hardware descriptors, + [PPC_ADMA_INIT_CHANNEL] = failed to allocate memory for channel, + [PPC_ADMA_INIT_IRQ1] = failed to request first irq, + [PPC_ADMA_INIT_IRQ2] = failed to request second irq, + [PPC_ADMA_INIT_REGISTER] = failed to register dma async device, +}; + +static void ppc4xx_adma_dma2rxor_set_mult(struct ppc4xx_adma_desc_slot *desc, + int index, u8 mult); +static void print_cb_list(struct ppc4xx_adma_chan *chan, + struct ppc4xx_adma_desc_slot *iter); +/** + * ppc4xx_can_rxor - check if the operands may be processed with RXOR + */ +static int ppc4xx_can_rxor(struct page **srcs, int src_cnt, size_t len) +{ + int i, order = 0, state = 0; + int idx = 0; + + if (unlikely(!(src_cnt 1))) + return 0; + + BUG_ON(src_cnt ARRAY_SIZE(ppc4xx_rxor_srcs)); + + /* Skip holes in the source list before checking */ + for (i = 0; i src_cnt; i++) { + if (!srcs[i]) + continue; + ppc4xx_rxor_srcs[idx++] = srcs[i]; + } + src_cnt = idx; + + for (i = 1; i src_cnt; i++) { + char *cur_addr = page_address(ppc4xx_rxor_srcs[i]); + char *old_addr = page_address(ppc4xx_rxor_srcs[i - 1]); + + switch (state) { + case 0: + if (cur_addr == old_addr + len) { + /* direct RXOR */ + order = 1; + state = 1; + } else if (old_addr == cur_addr + len) { + /* reverse RXOR */ + order = -1; + state = 1; + } else + goto out; +
[PATCH v1 4/4] PPC4xx: Merge files to create single 440spe header
From: Tirumala Marri tma...@apm.com This patch merges dma.h and xor.h to create ppc440spe-dma.h Signed-off-by: Tirumala R Marri tma...@apm.com --- V1: * No change. --- drivers/dma/ppc4xx/dma.h | 223 - drivers/dma/ppc4xx/ppc440spe-dma.h | 318 drivers/dma/ppc4xx/xor.h | 110 - 3 files changed, 318 insertions(+), 333 deletions(-) delete mode 100644 drivers/dma/ppc4xx/dma.h create mode 100644 drivers/dma/ppc4xx/ppc440spe-dma.h delete mode 100644 drivers/dma/ppc4xx/xor.h diff --git a/drivers/dma/ppc4xx/dma.h b/drivers/dma/ppc4xx/dma.h deleted file mode 100644 index bcde2df..000 --- a/drivers/dma/ppc4xx/dma.h +++ /dev/null @@ -1,223 +0,0 @@ -/* - * 440SPe's DMA engines support header file - * - * 2006-2009 (C) DENX Software Engineering. - * - * Author: Yuri Tikhonov y...@emcraft.com - * - * This file is licensed under the term of the GNU General Public License - * version 2. The program licensed as is without any warranty of any - * kind, whether express or implied. - */ - -#ifndef_PPC440SPE_DMA_H -#define _PPC440SPE_DMA_H - -#include linux/types.h - -/* Number of elements in the array with statical CDBs */ -#defineMAX_STAT_DMA_CDBS 16 -/* Number of DMA engines available on the contoller */ -#define DMA_ENGINES_NUM2 - -/* Maximum h/w supported number of destinations */ -#define DMA_DEST_MAX_NUM 2 - -/* FIFO's params */ -#define DMA0_FIFO_SIZE 0x1000 -#define DMA1_FIFO_SIZE 0x1000 -#define DMA_FIFO_ENABLE(112) - -/* DMA Configuration Register. Data Transfer Engine PLB Priority: */ -#define DMA_CFG_DXEPR_LP (026) -#define DMA_CFG_DXEPR_HP (326) -#define DMA_CFG_DXEPR_HHP (226) -#define DMA_CFG_DXEPR_HHHP (126) - -/* DMA Configuration Register. DMA FIFO Manager PLB Priority: */ -#define DMA_CFG_DFMPP_LP (023) -#define DMA_CFG_DFMPP_HP (323) -#define DMA_CFG_DFMPP_HHP (223) -#define DMA_CFG_DFMPP_HHHP (123) - -/* DMA Configuration Register. Force 64-byte Alignment */ -#define DMA_CFG_FALGN (1 19) - -/*UIC0:*/ -#define D0CPF_INT (112) -#define D0CSF_INT (111) -#define D1CPF_INT (110) -#define D1CSF_INT (19) -/*UIC1:*/ -#define DMAE_INT (19) - -/* I2O IOP Interrupt Mask Register */ -#define I2O_IOPIM_P0SNE(13) -#define I2O_IOPIM_P0EM (15) -#define I2O_IOPIM_P1SNE(16) -#define I2O_IOPIM_P1EM (18) - -/* DMA CDB fields */ -#define DMA_CDB_MSK(0xF) -#define DMA_CDB_64B_ADDR (12) -#define DMA_CDB_NO_INT (13) -#define DMA_CDB_STATUS_MSK (0x3) -#define DMA_CDB_ADDR_MSK (0xFFF0) - -/* DMA CDB OpCodes */ -#define DMA_CDB_OPC_NO_OP (0x00) -#define DMA_CDB_OPC_MV_SG1_SG2 (0x01) -#define DMA_CDB_OPC_MULTICAST (0x05) -#define DMA_CDB_OPC_DFILL128 (0x24) -#define DMA_CDB_OPC_DCHECK128 (0x23) - -#define DMA_CUED_XOR_BASE (0x1000) -#define DMA_CUED_XOR_HB(0x0008) - -#ifdef CONFIG_440SP -#define DMA_CUED_MULT1_OFF 0 -#define DMA_CUED_MULT2_OFF 8 -#define DMA_CUED_MULT3_OFF 16 -#define DMA_CUED_REGION_OFF24 -#define DMA_CUED_XOR_WIN_MSK (0xFC00) -#else -#define DMA_CUED_MULT1_OFF 2 -#define DMA_CUED_MULT2_OFF 10 -#define DMA_CUED_MULT3_OFF 18 -#define DMA_CUED_REGION_OFF26 -#define DMA_CUED_XOR_WIN_MSK (0xF000) -#endif - -#define DMA_CUED_REGION_MSK0x3 -#define DMA_RXOR1230x0 -#define DMA_RXOR1240x1 -#define DMA_RXOR1250x2 -#define DMA_RXOR12 0x3 - -/* S/G addresses */ -#define DMA_CDB_SG_SRC 1 -#define DMA_CDB_SG_DST12 -#define DMA_CDB_SG_DST23 - -/* - * DMAx engines Command Descriptor Block Type - */ -struct dma_cdb { - /* -* Basic CDB structure (Table 20-17, p.499, 440spe_um_1_22.pdf) -*/ - u8 pad0[2];/* reserved */ - u8 attr; /* attributes */ - u8 opc;/* opcode */ - u32 sg1u; /* upper SG1 address */ - u32 sg1l; /* lower SG1 address */ - u32 cnt;/* SG count, 3B used */ - u32 sg2u; /* upper SG2 address */ - u32 sg2l; /* lower SG2 address */ - u32 sg3u; /* upper SG3 address */ - u32 sg3l; /* lower SG3 address */ -}; - -/* - * DMAx hardware registers (p.515 in 440SPe UM 1.22) - */ -struct dma_regs { - u32 cpfpl; - u32 cpfph; - u32 csfpl; - u32 csfph; - u32 dsts; - u32 cfg; - u8 pad0[0x8]; - u16 cpfhp; - u16 cpftp; - u16 csfhp; - u16 csftp; - u8 pad1[0x8]; - u32 acpl; - u32 acph; - u32 s1bpl; - u32
Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
On Fri, 24 Sep 2010, Benjamin Herrenschmidt wrote: On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote: The new syscall, clock_adjtime, takes two parameters, the clock ID, and a pointer to a struct timex. The semantics of the timex struct have been expanded by one additional mode flag, which allows an absolute offset correction. When specificied, the clock offset is immediately corrected by adding the given time value to the current time value. Any reason why you CC'ed device-tree discuss ? This list is getting way too much unrelated stuff, which I find annoying, it would be nice if we were all a bit more careful here with our CC lists. Says the guy who missed to trim the useless context of the original mail, which made me scroll down all the way just to find out that there is nothing to see. Thanks, tglx ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ppc44x - how do i optimize driver for tlb hits
On Fri, Sep 24, 2010 at 08:01:04AM +1000, Benjamin Herrenschmidt wrote: On Thu, 2010-09-23 at 10:12 -0500, Ayman El-Khashab wrote: I've implemented a working driver on my 460EX. it allocates a couple of buffers of 4MB each. I have a custom memcmp algorithm in asm that is extremely fast in user space, but 1/2 as fast when run on these buffers. my tests are showing that the algorithm seems to be memory bandwidth bound. my guess is that i am having tlb or cache misses (my algo uses the dbct) that is slowing performance. curiously when in user space, i can affect the performance by small changes in the size of the buffer, i.e. 4MB + 32B is fast, 4MB + 4K is much worse. Can i adjust my driver code that is using kmalloc to make sure that the ppc44x has 4MB tlb entries for these and that they stay put? Anything you allocate with kmalloc() is going to be mapped by bolted 256M TLB entries, so there should be no TLB misses happening in the kernel case. Hi Ben, can you or somebody elaborate? I saw the pinned tlb in 44x_mmu.c. Perhaps I don't understand the code fully, but it appears to map 256MB of lowmem into a pinned tlb. I am not sure what phys address lowmem means, but I assumed (possibly incorrectly) that it is 0-256MB. When I get the physical addresses for my buffers after kmalloc, they all have addresses that are within my DRAM but start at about the 440MB mark. I end up passing those phys addresses to my DMA engine. When my compare runs it takes a huge amount of time in the assembly code doing memory fetches which makes me think that there are either tons of cache misses (despite the prefetching) or the entries have been purged from the TLB and must be obtained again. As an experiment, I disabled my cache prefetch code and the algo took forever. Next I altered the asm to do the same amount of data but a smaller amount over and over so that less if fetched from main memory. That executed very quickly. From that I drew the conclusion that the algorithm is memory bandwidth limited. In a standalone configuration (i.e. algorithm just using user memory, everything else identical), the speedup is 2-3x. So the limitation is not a hardware limit, it must be something that is happening when I execute the loads. (it is a compare algorithm, so it only does loads). Thanks Ayman ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c
Will both versions of this driver exist in the same kernel build? For example the iop-adma driver supports iop13xx and iop3xx, but we select the archtitecture at build time? Or, as I assume in this case, will the two (maybe more?) ppc4xx adma drivers all be built in the same image, more like ioatdma? [Marri] We select the architecture at build time. In the latter case I would recommend a file structure like: drivers/dma/ppc4xx/adma.c drivers/dma/ppc4xx/adma_440spe.c drivers/dma/ppc4xx/adma_460ex.c With patches to move the chipset specific pieces to their own file. Minimizing the code churn in adma.c, or at least showing a progression of what is unique and needs to be moved. This would be similar to how ioatdma is structured and compiles a single driver to cover the three major hardware revisions. [Marri]Looks like this driver is similar to iop-adma driver. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v1 1/4] PPC4xx: Generalizing ADMA driver modifications
Did you look at this changelog before sending? It just deletes 4000 lines of code?? [Marri] The reason I have to send it in different file is the size of the patch. There seem to be issue with patch sizes 200k or more. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v1 1/4] PPC4xx: Generalizing ADMA driver modifications
On 9/23/2010 3:10 PM, tma...@apm.com wrote: From: Tirumala Marritma...@apm.com This patch generalizes the existing drver/dma/ppc4xx/adma.c, so that common code can be shared between different similar DMA engine drivers in other SoCs. Also Makefile and Kconfig changed to accommodate PPC4XX. Signed-off-by: Tirumala R Marritma...@apm.com --- V1: * No change. --- arch/powerpc/include/asm/async_tx.h |4 +- drivers/dma/Kconfig |6 +- drivers/dma/Makefile|2 +- drivers/dma/ppc4xx/Makefile |2 +- drivers/dma/ppc4xx/adma.c | 4437 +++ drivers/dma/ppc4xx/adma.h | 92 +- 6 files changed, 354 insertions(+), 4189 deletions(-) Did you look at this changelog before sending? It just deletes 4000 lines of code?? Moving and renaming code in one patch makes it very difficult to verify the result. When generalizing code the first thing I want to see with a very quick glance at the patch(es) is that the existing implementation is not harmed. One way to go about this is to first identify the portions of existing code that you want to reuse in your driver and the pieces that are truly ppc440spe specific. Move the ppc440spe pieces to their own file (get this reviewed and approved by the ppc440spe authors). The remaining code in adma.c will be assumed generic. You can then have another patch to do a simple s/ppc440spe/ppc4xx/ in adma.c (no other logic changes or code movement). Then you can introduce your ppc460ex unique implementation that calls into adma.c. I don't want to see patches along the lines of rename drivers/dma/ppc4xx/adma.c to drivers/dma/ppc4xx/ppc4xx-adma.c because that is just redundant. Assume that the existing generic file names are where the common code will lie and then add hw-implementation specific files to call into that base. Another rule is that the conversion should be bisectable at every step, I should be able to apply each patch in the series and still have a functional/runnable result. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v1 3/4] PPC4xx: New file with SoC specific functions
On 9/23/2010 3:11 PM, tma...@apm.com wrote: From: Tirumala Marritma...@apm.com This patch creates new file with SoC dependent functions. Signed-off-by: Tirumala R Marritma...@apm.com --- V1: * Remove all 440SPe specific references. Maybe it renames ppc440spe to ppc4xx but it adds things like... +#if defined(CONFIG_440SPe) || defined(CONFIG_440SP) + np = of_find_compatible_node(NULL, NULL, ibm,i2o-440spe); +#endif ...in the code. Which is 1) not generic and 2) I suspect causes a compile warning for using an uninitialized variable. + if (!np) { + pr_err(%s: can't find I2O device tree node\n, + __func__); + ret = -ENODEV; + goto err_req2; + } It looks to me like the common code will need to have a few build dependent helper routines as it appears one instance of the driver cannot simultaneously support 440sp, 440spe, and 460ex. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v1 1/4] PPC4xx: Generalizing ADMA driver modifications
On 9/23/2010 3:44 PM, Tirumala Marri wrote: Did you look at this changelog before sending? It just deletes 4000 lines of code?? [Marri] The reason I have to send it in different file is the size of the patch. There seem to be issue with patch sizes 200k or more. Read the rest of what I wrote: Moving and renaming code in one patch makes it very difficult to verify the result. When generalizing code the first thing I want to see with a very quick glance at the patch(es) is that the existing implementation is not harmed. One way to go about this is to first identify the portions of existing code that you want to reuse in your driver and the pieces that are truly ppc440spe specific. Move the ppc440spe pieces to their own file (get this reviewed and approved by the ppc440spe authors). The remaining code in adma.c will be assumed generic. You can then have another patch to do a simple s/ppc440spe/ppc4xx/ in adma.c (no other logic changes or code movement). Then you can introduce your ppc460ex unique implementation that calls into adma.c. The patch would not be so large if you leave the existing code where it is and move the implementation specific pieces to their own file. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ppc44x - how do i optimize driver for tlb hits
On Thu, 2010-09-23 at 17:35 -0500, Ayman El-Khashab wrote: Anything you allocate with kmalloc() is going to be mapped by bolted 256M TLB entries, so there should be no TLB misses happening in the kernel case. Hi Ben, can you or somebody elaborate? I saw the pinned tlb in 44x_mmu.c. Perhaps I don't understand the code fully, but it appears to map 256MB of lowmem into a pinned tlb. I am not sure what phys address lowmem means, but I assumed (possibly incorrectly) that it is 0-256MB. No. The first pinned entry (0...256M) is inserted by the asm code in head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem (typically up to 768M but various settings can change that) using more 256M entries. Basically, all of lowmem is permanently mapped with such entries. When I get the physical addresses for my buffers after kmalloc, they all have addresses that are within my DRAM but start at about the 440MB mark. I end up passing those phys addresses to my DMA engine. Anything you get from kmalloc is going to come from lowmem, and thus be covered by those bolted TLB entries. When my compare runs it takes a huge amount of time in the assembly code doing memory fetches which makes me think that there are either tons of cache misses (despite the prefetching) or the entries have been purged What prefetching ? IE. The DMA operation -will- flush things out of the cache due to the DMA being not cache coherent on 44x. The 440 also doesn't have a working HW prefetch engine afaik (it should be disabled in FW or early asm on 440 cores and fused out in HW on 460 cores afaik). So only explicit SW prefetching will help. from the TLB and must be obtained again. As an experiment, I disabled my cache prefetch code and the algo took forever. Next I altered the asm to do the same amount of data but a smaller amount over and over so that less if fetched from main memory. That executed very quickly. From that I drew the conclusion that the algorithm is memory bandwidth limited. I don't know what exactly is going on, maybe your prefetch stride isn't right for the HW setup, or something like that. You can use xmon 'u' command to look at the TLB content. Check that we have the 256M entries mapping your data, they should be there. In a standalone configuration (i.e. algorithm just using user memory, everything else identical), the speedup is 2-3x. So the limitation is not a hardware limit, it must be something that is happening when I execute the loads. (it is a compare algorithm, so it only does loads). Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
On Fri, 2010-09-24 at 00:12 +0200, Thomas Gleixner wrote: This list is getting way too much unrelated stuff, which I find annoying, it would be nice if we were all a bit more careful here with our CC lists. Says the guy who missed to trim the useless context of the original mail, which made me scroll down all the way just to find out that there is nothing to see. Heh, you can usually ignore what's after my signature :-) At least I didn't put my reply all the way down the bottom ! Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ppc44x - how do i optimize driver for tlb hits
On Fri, Sep 24, 2010 at 11:07:24AM +1000, Benjamin Herrenschmidt wrote: On Thu, 2010-09-23 at 17:35 -0500, Ayman El-Khashab wrote: Anything you allocate with kmalloc() is going to be mapped by bolted 256M TLB entries, so there should be no TLB misses happening in the kernel case. Hi Ben, can you or somebody elaborate? I saw the pinned tlb in 44x_mmu.c. Perhaps I don't understand the code fully, but it appears to map 256MB of lowmem into a pinned tlb. I am not sure what phys address lowmem means, but I assumed (possibly incorrectly) that it is 0-256MB. No. The first pinned entry (0...256M) is inserted by the asm code in head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem (typically up to 768M but various settings can change that) using more 256M entries. Thanks Ben, appreciate all your wisdom and insight. Ok, so my 460ex board has 512MB total, so how does that figure into the 768M? Is there some other heuristic that determines how these are mapped? Basically, all of lowmem is permanently mapped with such entries. When I get the physical addresses for my buffers after kmalloc, they all have addresses that are within my DRAM but start at about the 440MB mark. I end up passing those phys addresses to my DMA engine. Anything you get from kmalloc is going to come from lowmem, and thus be covered by those bolted TLB entries. So is it reasonable to assume that everything on my system will come from pinned TLB entries? When my compare runs it takes a huge amount of time in the assembly code doing memory fetches which makes me think that there are either tons of cache misses (despite the prefetching) or the entries have been purged What prefetching ? IE. The DMA operation -will- flush things out of the cache due to the DMA being not cache coherent on 44x. The 440 also doesn't have a working HW prefetch engine afaik (it should be disabled in FW or early asm on 440 cores and fused out in HW on 460 cores afaik). So only explicit SW prefetching will help. The DMA is what I use in the real world case to get data into and out of these buffers. However, I can disable the DMA completely and do only the kmalloc. In this case I still see the same poor performance. My prefetching is part of my algo using the dcbt instructions. I know the instructions are effective b/c without them the algo is much less performant. So yes, my prefetches are explicit. from the TLB and must be obtained again. As an experiment, I disabled my cache prefetch code and the algo took forever. Next I altered the asm to do the same amount of data but a smaller amount over and over so that less if fetched from main memory. That executed very quickly. From that I drew the conclusion that the algorithm is memory bandwidth limited. I don't know what exactly is going on, maybe your prefetch stride isn't right for the HW setup, or something like that. You can use xmon 'u' command to look at the TLB content. Check that we have the 256M entries mapping your data, they should be there. Ok, I will give that a try ... in addition, is there an easy way to use any sort of gprof like tool to see the system performance? What about looking at the 44x performance counters in some meaningful way? All the experiments point to the fetching being slower in the full program as opposed to the algo in a testbench, so I want to determine what it is that could cause that. thanks ayman ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ppc44x - how do i optimize driver for tlb hits
No. The first pinned entry (0...256M) is inserted by the asm code in head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem (typically up to 768M but various settings can change that) using more 256M entries. Thanks Ben, appreciate all your wisdom and insight. Ok, so my 460ex board has 512MB total, so how does that figure into the 768M? Is there some other heuristic that determines how these are mapped? Not really, it all fits in lowmem so it will be mapped with two pinned 256M entries. Basically, we try to map all memory with those entries in the linear mapping. But since we only have 1G of address space available when PAGE_OFFSET is c000, and we need some of that for vmalloc, ioremap, etc... we thus limit that mapping to 768M currently. If you have more memory, you will see only 768M unless you use CONFIG_HIGHMEM, which allows the kernel to exploit more physical memory. In this case, only the first 768M are permanently mapped (and accessible), but you can allocate pages in highmem which can still be mapped into user space and need kmap/kunmap calls to be accessed by the kernel. However, in your case you don't need highmem, everything fits in lowmem, so the kernel will just use 2x256M of bolted TLB entries to map that permanently. Note also that kmalloc() always return lowmem. So is it reasonable to assume that everything on my system will come from pinned TLB entries? Yes. The DMA is what I use in the real world case to get data into and out of these buffers. However, I can disable the DMA completely and do only the kmalloc. In this case I still see the same poor performance. My prefetching is part of my algo using the dcbt instructions. I know the instructions are effective b/c without them the algo is much less performant. So yes, my prefetches are explicit. Could be some effect of the cache structure, L2 cache, cache geometry (number of ways etc...). You might be able to alleviate that by changing the stride of your prefetch. Unfortunately, I'm not familiar enough with the 440 micro architecture and its caches to be able to help you much here. Ok, I will give that a try ... in addition, is there an easy way to use any sort of gprof like tool to see the system performance? What about looking at the 44x performance counters in some meaningful way? All the experiments point to the fetching being slower in the full program as opposed to the algo in a testbench, so I want to determine what it is that could cause that. Does it have any useful performance counters ? I didn't think it did but I may be mistaken. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT
-Original Message- From: linuxppc-dev-bounces+tiejun.chen=windriver@lists.ozlabs.or g [mailto:linuxppc-dev-bounces+tiejun.chen=windriver@lists.o zlabs.org] On Behalf Of Scott Wood Sent: Friday, September 24, 2010 4:34 AM To: Gortmaker, Paul Cc: linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT On Thu, 23 Sep 2010 16:10:15 -0400 Paul Gortmaker paul.gortma...@windriver.com wrote: So the possibility exists to wrongly assign the user MAS3_URWX bits to kernel (PAGE_KERNEL_X) address space via the following code fragment: if (flags _PAGE_USER) { TLBCAM[index].MAS3 |= MAS3_UX | MAS3_UR; TLBCAM[index].MAS3 |= ((flags _PAGE_RW) ? MAS3_UW : 0); } Here is a dump of the TLB info from Simics with the above code present: -- L2 TLB1 GT SSS UUU V I Row Logical PhysicalSS TLPID TID WIMGE XWR XWR F P V - - --- -- - - - --- --- - - - 0 c000-cfff 0-00fff 00 0 0 M XWR XWR 0 1 1 1 d000-dfff 01000-01fff 00 0 0 M XWR XWR 0 1 1 2 e000-efff 02000-02fff 00 0 0 M XWR XWR 0 1 1 Actually this conditional code was only used for two legacy functions: 1: support KGDB to set break point. KGDB already dropped this; now uses its core write to set break point. 2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE) for device IO space. This use case is also removed from the latest PowerPC kernel. io_block_mapping() went away, but the feature itself is still useful and might come back with something like this: http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg3 3851.html ...though I'm not sure why such mappings would ever have user access. This could end up being used for large user pages by something like hugetlbfs or KVM, though. I don't think we want to make large user pages fail, especailly if it just Understand. Actually the following is my original modification. == +#if defined(CONFIG_FSL_BOOKE) defined(CONFIG_PTE_64BIT) + /* On there _PAGE_BAP_UR is always integrated into flag, _PAGE_KERNEL_RWX +* and _PAGE_USER here. So we have to only check _PAGE_BAP_UR as the condition. +*/ + if (flags _PAGE_BAP_UR) { +#else if (flags _PAGE_USER) { +#endif But I find there is no any usage for this, except for the above #1 KGDB and #2 io_block_mapping(). So I think it's possible to remove this completely :) happens with the 32-bit page table format (which i may not what the person adding such a feature tests with). I don't see a generic accessor that can test PTE flags for user access -- in the absence of one, I guess we need an ifdef here. Or at least put in a comment so anyone who adds a userspace use knows they need to fix it. I already notice Ben's advice and looks fine to us. Tiejun -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
On Thu, 2010-09-23 at 09:13 -0400, Neil Horman wrote: On Thu, Sep 23, 2010 at 08:57:20PM +1000, Michael Neuling wrote: + if (fwrite(line, strlen(line) - 1, 1, file) == 0) if (fputs(line, file) == EOF) Good point thanks... new patch below Mikey irqbalance, powerpc: add IRQs without settable SMP affinity to banned list On pseries powerpc, IPIs are registered with an IRQ number so /proc/interrupts looks like this on a 2 core/2 thread machine: CPU0 CPU1 CPU2 CPU3 16:316428232905141138794 983121 XICS Level IPI 18:2605674 0 304994 0 XICS Level lan0 30: 400057 0 169209 0 XICS Level ibmvscsi LOC: 133734 77250 106425 91951 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions Unfortunately this means irqbalance attempts to set the affinity of IPIs which is not possible. So in the above case, when irqbalance is in performance mode due to heavy IPI, lan0 and ibmvscsi activity, it sometimes attempts to put the IPIs on one core (CPU01) and lan0 and ibmvscsi on the other core (CPU23). This is suboptimal as we want lan0 and ibmvscsi to be on separate cores and IPIs to be ignored. When irqblance attempts writes to the IPI smp_affinity (ie. /proc/irq/16/smp_affinity in the above example) it fails but irqbalance ignores currently ignores this. This patch catches these write fails and in this case adds that IRQ number to the banned IRQ list. This will catch the above IPI case and any other IRQ where the SMP affinity can't be set. Tested on POWER6, POWER7 and x86. Signed-off-by: Michael Neuling mi...@neuling.org Index: irqbalance/irqlist.c === --- irqbalance.orig/irqlist.c +++ irqbalance/irqlist.c @@ -67,7 +67,7 @@ DIR *dir; struct dirent *entry; char *c, *c2; - int nr , count = 0; + int nr , count = 0, can_set = 1; char buf[PATH_MAX]; sprintf(buf, /proc/irq/%i, number); dir = opendir(buf); @@ -80,7 +80,7 @@ size_t size = 0; FILE *file; sprintf(buf, /proc/irq/%i/smp_affinity, number); - file = fopen(buf, r); + file = fopen(buf, r+); if (!file) continue; if (getline(line, size, file)==0) { @@ -89,7 +89,14 @@ continue; } cpumask_parse_user(line, strlen(line), irq-mask); - fclose(file); + /* +* Check that we can write the affinity, if +* not take it out of the list. +*/ + if (fputs(line, file) == EOF) + can_set = 0; This is maybe a nit, but writing to the affinity file can fail for a few different reasons, some of them permanent, some transient. For instance, if we're in a memory constrained condition temporarily irq_affinity_proc_write might return -ENOMEM. Yeah true, usually followed shortly by your kernel going so far into swap you never get it back, or OOMing, but I guess it's possible. Might it be better to modify this code so that, instead of using fputs to merge the various errors into an EOF, we use some other write method that lets us better determine the error and selectively ban the interrupt only for those errors which we consider permanent? Yep. It seems fputs() gives you know way to get the actual error from write(), so it looks we'll need to switch to open/write, but that's probably not so terrible. cheers signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT
-Original Message- From: linuxppc-dev-bounces+tiejun.chen=windriver@lists.ozlabs.or g [mailto:linuxppc-dev-bounces+tiejun.chen=windriver@lists.o zlabs.org] On Behalf Of Benjamin Herrenschmidt Sent: Friday, September 24, 2010 5:59 AM To: Scott Wood Cc: Gortmaker, Paul; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH] powerpc: Fix invalid page flags in create TLB CAM pathfor PTE_64BIT On Thu, 2010-09-23 at 15:33 -0500, Scott Wood wrote: I don't see a generic accessor that can test PTE flags for user access -- in the absence of one, I guess we need an ifdef here. Or at least put in a comment so anyone who adds a userspace use knows they need to fix it. We could make up one in powerpc arch at least #define pte_user(val) ((val _PAGE_USER) == _PAGE_USER) Looks good. Ben and Scott, But for the patched issue we're discussing we have to do #ifdef that as my original modification. Right? Or do you have other suggestion? Then I can improve that as v2. Thanks Tiejun would do Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
-Original Message- From: david.hag...@gmail.com [mailto:david.hag...@gmail.com] Sent: Thursday, September 23, 2010 10:44 PM To: Chen, Tiejun Cc: David Hagood; linuxppc-...@ozlabs.org Subject: RE: MPC8641D PEX: programming OWBAR in Endpoint mode? -Original Message- via the BARs. I read your email again and something hint me. I notice you clarify you already condigure InBound successfully. I am programming BOTH the inbound ATMUs to make PPC memory available to the root complex, AND programming outbound ATMUs to enable the PPC to bus master to the root complex's memory space on PCIe. Right but this should be done for RC mode, not for EP mode we're discussing. Tiejun I am NOT attempting to program the IWBARs - as you noted, they get programmed by the root complex via PCI config operations. And as my above comment I'm afraid you mix up InBound and OutBound on EP mode? No, I am NOT confusing the two - that is why I am being VERY EXPLICIT about accessing the OUTBOUND ATMUs. The only reason I mention the inbound ATMUs is to demonstrate that the physical layer is working. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev