Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks
On Thursday 31 July 2008 03:34, Andrew Morton wrote: On Wed, 30 Jul 2008 18:23:18 +0100 Mel Gorman [EMAIL PROTECTED] wrote: On (30/07/08 01:43), Andrew Morton didst pronounce: On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson [EMAIL PROTECTED] wrote: Certain workloads benefit if their data or text segments are backed by huge pages. oh. As this is a performance patch, it would be much better if its description contained some performance measurement results! Please. I ran these patches through STREAM (http://www.cs.virginia.edu/stream/). STREAM itself was patched to allocate data from the stack instead of statically for the test. They completed without any problem on x86, x86_64 and PPC64 and each test showed a performance gain from using hugepages. I can post the raw figures but they are not currently in an eye-friendly format. Here are some plots of the data though; x86: http://www.csn.ul.ie/~mel/postings/stack-backing-20080730/x86-stream-stac k.ps x86_64: http://www.csn.ul.ie/~mel/postings/stack-backing-20080730/x86_64-stream-s tack.ps ppc64-small: http://www.csn.ul.ie/~mel/postings/stack-backing-20080730/ppc64-small-str eam-stack.ps ppc64-large: http://www.csn.ul.ie/~mel/postings/stack-backing-20080730/ppc64-large-str eam-stack.ps The test was to run STREAM with different array sizes (plotted on X-axis) and measure the average throughput (y-axis). In each case, backing the stack with large pages with a performance gain. So about a 10% speedup on x86 for most STREAM configurations. Handy - that's somewhat larger than most hugepage-conversions, iirc. Although it might be a bit unusual to have codes doing huge streaming memory operations on stack memory... We can see why IBM is so keen on their hugepages though :) Do we expect that this change will be replicated in other memory-intensive apps? (I do). Such as what? It would be nice to see some numbers with some HPC or java or DBMS workload using this. Not that I dispute it will help some cases, but 10% (or 20% for ppc) I guess is getting toward the best case, short of a specifically written TLB thrasher. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks
On Thu, 31 Jul 2008 16:04:14 +1000 Nick Piggin [EMAIL PROTECTED] wrote: Do we expect that this change will be replicated in other memory-intensive apps? (I do). Such as what? It would be nice to see some numbers with some HPC or java or DBMS workload using this. Not that I dispute it will help some cases, but 10% (or 20% for ppc) I guess is getting toward the best case, short of a specifically written TLB thrasher. I didn't realise the STREAM is using vast amounts of automatic memory. I'd assumed that it was using sane amounts of stack, but the stack TLB slots were getting zapped by all the heap-memory activity. Oh well. I guess that effect is still there, but smaller. I agree that few real-world apps are likely to see gains of this order. More benchmarks, please :) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [Bugme-new] [Bug 11185] New: Device/host RESET in SCSI
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 30 Jul 2008 23:18:04 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=11185 Summary: Device/host RESET in SCSI Product: Platform Specific/Hardware Version: 2.5 KernelVersion: 2.6.26 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: blocking Priority: P1 Component: PPC-64 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: 2.6.25, 2.6.18??? both tested are Debian distribution kernels Earliest failing kernel version: unknown Distribution: Debian stable Hardware Environment: IBM H70, PPC64 kernel Software Environment: Debian stable, 2.6.26 self compiled Why do you describe this regression as a powerpc problem rather than a scsi one? (It could be either or both, I'm just wondering...) Problem Description: [3.881326] sym53c8xx :00:0c.0: enabling device (0140 - 0143) [3.959117] sym0: 875 rev 0x4 at pci :00:0c.0 irq 17 [4.029503] sym0: No NVRAM, ID 7, Fast-20, SE, parity checking [4.108967] sym0: SCSI BUS has been reset. [4.160753] scsi0 : sym-2.2.3 [4.200066] sym53c8xx :00:11.0: enabling device (0140 - 0143) [4.278375] sym1: 895 rev 0x1 at pci :00:11.0 irq 19 [4.349340] sym1: No NVRAM, ID 7, Fast-40, SE, parity checking [4.429359] sym1: SCSI BUS has been reset. [4.481660] scsi1 : sym-2.2.3 [4.521351] sym53c8xx 0001:40:0c.0: enabling device (0140 - 0143) [4.600250] sym2: 875 rev 0x3 at pci 0001:40:0c.0 irq 29 [4.756252] sym2: No NVRAM, ID 7, Fast-20, SE, parity checking [4.836739] sym2: SCSI BUS has been reset. [4.889450] scsi2 : sym-2.2.3 [4.929845] st: Version 20080224, fixed bufsize 32768, s/g segs 256 [5.008868] Driver 'st' needs updating - please use bus_type methods [5.089184] Driver 'sd' needs updating - please use bus_type methods [5.169000] Driver 'sr' needs updating - please use bus_type methods [5.248969] SCSI Media Changer driver v0.25 [5.303686] Driver 'ch' needs updating - please use bus_type methods [5.385159] mice: PS/2 mouse device common for all mice [5.454519] TCP cubic registered [5.496517] NET: Registered protocol family 17 [5.553945] registered taskstats version 1 [5.606960] scsi: waiting for bus probes to complete ... [ 12.689883] scsi 0:0:0:0: ABORT operation started [ 13.057829] scsi 1:0:0:0: ABORT operation started [ 13.401828] scsi 2:0:0:0: ABORT operation started [ 17.745837] scsi 0:0:0:0: ABORT operation timed-out. [ 17.808370] scsi 0:0:0:0: DEVICE RESET operation started [ 18.113823] scsi 1:0:0:0: ABORT operation timed-out. [ 18.176365] scsi 1:0:0:0: DEVICE RESET operation started [ 18.457824] scsi 2:0:0:0: ABORT operation timed-out. [ 18.520280] scsi 2:0:0:0: DEVICE RESET operation started [ 22.873822] scsi 0:0:0:0: DEVICE RESET operation timed-out. [ 22.943527] scsi 0:0:0:0: BUS RESET operation started [ 23.241823] scsi 1:0:0:0: DEVICE RESET operation timed-out. [ 23.311387] scsi 1:0:0:0: BUS RESET operation started [ 23.585824] scsi 2:0:0:0: DEVICE RESET operation timed-out. [ 23.655371] scsi 2:0:0:0: BUS RESET operation started [ 28.005857] scsi 0:0:0:0: BUS RESET operation timed-out. [ 28.072268] scsi 0:0:0:0: HOST RESET operation started [ 28.143373] sym0: SCSI BUS has been reset. [ 28.373822] scsi 1:0:0:0: BUS RESET operation timed-out. [ 28.440183] scsi 1:0:0:0: HOST RESET operation started [ 28.511103] sym1: SCSI BUS has been reset. [ 28.717826] scsi 2:0:0:0: BUS RESET operation timed-out. [ 28.784138] scsi 2:0:0:0: HOST RESET operation started [ 28.854961] sym2: SCSI BUS has been reset. [ 33.193826] scsi 0:0:0:0: HOST RESET operation timed-out. [ 33.261164] scsi 0:0:0:0: Device offlined - not ready after error recovery [ 33.561823] scsi 1:0:0:0: HOST RESET operation timed-out. [ 33.629409] scsi 1:0:0:0: Device offlined - not ready after error recovery [ 33.905841] scsi 2:0:0:0: HOST RESET operation timed-out. [ 33.973557] scsi 2:0:0:0: Device offlined - not ready after error recovery [ 38.845823] scsi 0:0:1:0: ABORT operation started [ 39.213828] scsi 1:0:1:0: ABORT operation started [ 39.469825] scsi 2:0:1:0: ABORT operation started [ 43.901857] scsi 0:0:1:0: ABORT operation timed-out. [ 43.964323] scsi 0:0:1:0: DEVICE RESET operation started [ 44.269822] scsi 1:0:1:0: ABORT operation timed-out. [ 44.332252] scsi 1:0:1:0: DEVICE RESET operation started [ 44.525821] scsi 2:0:1:0: ABORT operation timed-out. [ 44.588275] scsi 2:0:1:0: DEVICE RESET operation started [ 49.029823] scsi 0:0:1:0: DEVICE RESET operation timed-out. [ 49.099525] scsi 0:0:1:0: BUS
Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks
On Thursday 31 July 2008 16:14, Andrew Morton wrote: On Thu, 31 Jul 2008 16:04:14 +1000 Nick Piggin [EMAIL PROTECTED] wrote: Do we expect that this change will be replicated in other memory-intensive apps? (I do). Such as what? It would be nice to see some numbers with some HPC or java or DBMS workload using this. Not that I dispute it will help some cases, but 10% (or 20% for ppc) I guess is getting toward the best case, short of a specifically written TLB thrasher. I didn't realise the STREAM is using vast amounts of automatic memory. I'd assumed that it was using sane amounts of stack, but the stack TLB slots were getting zapped by all the heap-memory activity. Oh well. An easy mistake to make because that's probabably how STREAM would normally work. I think what Mel had done is to modify the stream kernel so as to have it operate on arrays of stack memory. I guess that effect is still there, but smaller. I imagine it should be, unless you're using a CPU with seperate TLBs for small and huge pages, and your large data set is mapped with huge pages, in which case you might now introduce *new* TLB contention between the stack and the dataset :) Also, interestingly I have actually seen some CPUs whos memory operations get significantly slower when operating on large pages than small (in the case when there is full TLB coverage for both sizes). This would make sense if the CPU only implements a fast L1 TLB for small pages. So for the vast majority of workloads, where stacks are relatively small (or slowly changing), and relatively hot, I suspect this could easily have no benefit at best and slowdowns at worst. But I'm not saying that as a reason not to merge it -- this is no different from any other hugepage allocations and as usual they have to be used selectively where they help I just wonder exactly where huge stacks will help. I agree that few real-world apps are likely to see gains of this order. More benchmarks, please :) Would be nice, if just out of morbid curiosity :) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [Bugme-new] [Bug 11185] New: Device/host RESET in SCSI
On Wed, 2008-07-30 at 23:24 -0700, Andrew Morton wrote: (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 30 Jul 2008 23:18:04 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=11185 Summary: Device/host RESET in SCSI Product: Platform Specific/Hardware Version: 2.5 KernelVersion: 2.6.26 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: blocking Priority: P1 Component: PPC-64 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Latest working kernel version: 2.6.25, 2.6.18??? both tested are Debian distribution kernels Earliest failing kernel version: unknown Distribution: Debian stable Hardware Environment: IBM H70, PPC64 kernel Software Environment: Debian stable, 2.6.26 self compiled Why do you describe this regression as a powerpc problem rather than a scsi one? (It could be either or both, I'm just wondering...) Problem Description: [3.881326] sym53c8xx :00:0c.0: enabling device (0140 - 0143) [3.959117] sym0: 875 rev 0x4 at pci :00:0c.0 irq 17 [4.029503] sym0: No NVRAM, ID 7, Fast-20, SE, parity checking [4.108967] sym0: SCSI BUS has been reset. [4.160753] scsi0 : sym-2.2.3 [4.200066] sym53c8xx :00:11.0: enabling device (0140 - 0143) [4.278375] sym1: 895 rev 0x1 at pci :00:11.0 irq 19 [4.349340] sym1: No NVRAM, ID 7, Fast-40, SE, parity checking [4.429359] sym1: SCSI BUS has been reset. [4.481660] scsi1 : sym-2.2.3 [4.521351] sym53c8xx 0001:40:0c.0: enabling device (0140 - 0143) [4.600250] sym2: 875 rev 0x3 at pci 0001:40:0c.0 irq 29 [4.756252] sym2: No NVRAM, ID 7, Fast-20, SE, parity checking [4.836739] sym2: SCSI BUS has been reset. [4.889450] scsi2 : sym-2.2.3 I don't know much about scsi, but I have a 44P (POWER3) which boots fine: Linux version 2.6.27-rc1 ([EMAIL PROTECTED]) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu8 ... sym53c8xx :00:0c.0: enabling device (0140 - 0143) sym0: 896 rev 0x7 at pci :00:0c.0 irq 17 sym0: No NVRAM, ID 7, Fast-40, SE, parity checking sym0: SCSI BUS has been reset. scsi0 : sym-2.2.3 scsi 0:0:1:0: CD-ROMIBM CDRM002031_05 PQ: 0 ANSI: 2 target0:0:1: Beginning Domain Validation target0:0:1: asynchronous target0:0:1: wide asynchronous target0:0:1: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 15) target0:0:1: Domain Validation skipping write tests target0:0:1: Ending Domain Validation target0:0:4: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 31) scsi 0:0:4:0: Direct-Access IBM DDYS-T09170N S96F PQ: 0 ANSI: 3 target0:0:4: tagged command queuing enabled, command queue depth 16. target0:0:4: Beginning Domain Validation target0:0:4: asynchronous target0:0:4: wide asynchronous target0:0:4: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 31) target0:0:4: Domain Validation skipping write tests target0:0:4: Ending Domain Validation target0:0:5: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 31) scsi 0:0:5:0: Direct-Access IBM DDYS-T09170N S96F PQ: 0 ANSI: 3 target0:0:5: tagged command queuing enabled, command queue depth 16. target0:0:5: Beginning Domain Validation target0:0:5: asynchronous target0:0:5: wide asynchronous target0:0:5: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 31) target0:0:5: Domain Validation skipping write tests target0:0:5: Ending Domain Validation sym53c8xx :00:0c.1: enabling device (0140 - 0143) sym1: 896 rev 0x7 at pci :00:0c.1 irq 18 sym1: No NVRAM, ID 7, Fast-40, LVD, parity checking sym1: SCSI BUS has been reset. scsi1 : sym-2.2.3 ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007) st: Version 20080504, fixed bufsize 32768, s/g segs 256 Driver 'st' needs updating - please use bus_type methods Driver 'sd' needs updating - please use bus_type methods sd 0:0:4:0: [sda] 17774160 512-byte hardware sectors (9100 MB) sd 0:0:4:0: [sda] Write Protect is off sd 0:0:4:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DA sd 0:0:4:0: [sda] 17774160 512-byte hardware sectors (9100 MB) sd 0:0:4:0: [sda] Write Protect is off sd 0:0:4:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DA sda: sda1 sd 0:0:4:0: [sda] Attached SCSI disk sd 0:0:5:0: [sdb] Spinning up disk..ready sd 0:0:5:0: [sdb] 17774160 512-byte hardware sectors (9100 MB) sd 0:0:5:0: [sdb] Write Protect is off sd 0:0:5:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DA sd 0:0:5:0: [sdb] 17774160 512-byte hardware sectors (9100 MB) sd 0:0:5:0: [sdb] Write Protect is off sd 0:0:5:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DA sdb: sdb1 sdb2 sdb3 sdb5 sdb6 sd 0:0:5:0: [sdb] Attached SCSI disk
[PATCH] powerpc/kdump: Fix /dev/oldmem interface
This patch fixes the /dev/oldmem interface for kdump on ppc64. The patch originally came from Michael Ellerman hence have retained the signed-off line. I just rediffed/tested against latest git. Michael has ack'ed this patch. Ben this is not a must for 2.6.27 but would be good if it's included. Thanks -Sachin Signed-off-by : Michael Ellerman [EMAIL PROTECTED] Acked-by : Michael Ellerman [EMAIL PROTECTED] --- Fix /dev/oldmem for kdump A change to __ioremap() broke reading /dev/oldmem because we're no longer able to ioremap pfn 0 (d177c207ba16b1db31283e2d1fee7ad4a863584b). We actually don't need to ioremap for anything that's part of the linear mapping, so just read it directly. Also make sure we're only reading one page or less at a time. Signed-off-by : Michael Ellerman [EMAIL PROTECTED] --- diff -Naurp 1/arch/powerpc/kernel/crash_dump.c 2/arch/powerpc/kernel/crash_dump.c --- 1/arch/powerpc/kernel/crash_dump.c 2008-07-31 11:55:39.0 +0530 +++ 2/arch/powerpc/kernel/crash_dump.c 2008-07-31 12:09:49.0 +0530 @@ -86,6 +86,19 @@ static int __init parse_savemaxmem(char } __setup(savemaxmem=, parse_savemaxmem); + +static size_t copy_oldmem_vaddr(void *vaddr, char *buf, size_t csize, + unsigned long offset, int userbuf) +{ + if (userbuf) { + if (copy_to_user((char __user *)buf, (vaddr + offset), csize)) + return -EFAULT; + } else + memcpy(buf, (vaddr + offset), csize); + + return csize; +} + /** * copy_oldmem_page - copy one page from oldmem * @pfn: page frame number to be copied @@ -107,16 +120,16 @@ ssize_t copy_oldmem_page(unsigned long p if (!csize) return 0; - vaddr = __ioremap(pfn PAGE_SHIFT, PAGE_SIZE, 0); + csize = min(csize, PAGE_SIZE); - if (userbuf) { - if (copy_to_user((char __user *)buf, (vaddr + offset), csize)) { - iounmap(vaddr); - return -EFAULT; - } - } else - memcpy(buf, (vaddr + offset), csize); + if (pfn max_pfn) { + vaddr = __va(pfn PAGE_SHIFT); + csize = copy_oldmem_vaddr(vaddr, buf, csize, offset, userbuf); + } else { + vaddr = __ioremap(pfn PAGE_SHIFT, PAGE_SIZE, 0); + csize = copy_oldmem_vaddr(vaddr, buf, csize, offset, userbuf); + iounmap(vaddr); + } - iounmap(vaddr); return csize; } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 5/8] Silence warning in arch/powerpc/mm/ppc_mmu_32.c
On Thu Jul 31 at 15:21:25 EST in 2008, Stephen Rothwell wrote: On Thu, 31 Jul 2008 13:51:43 +1000 (EST) Tony Breeds tony at bakeyournoodle.com wrote: total_memory is a 'phys_addr_t', cast to unsigned long to silence warning. diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c index c53145f..9c19655 100644 --- a/arch/powerpc/mm/ppc_mmu_32.c +++ b/arch/powerpc/mm/ppc_mmu_32.c @@ -237,7 +237,7 @@ void __init MMU_init_hw(void) Hash_end = (struct hash_pte *) ((unsigned long)Hash + Hash_size); printk(Total memory = %ldMB; using %ldkB for hash table (at %p)\n, -total_memory 20, Hash_size 10, Hash); +(unsigned long)total_memory 20, Hash_size 10, Hash); Will this ever be built with CONFIG_PHYS_64BIT? I think that is how warning originates. But please, cast the result of the shift. Otherwise it will print 0MB instead of 4096MB. The patches for 4G ram are in on one platform and in progress on a second. milton ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH v2] Guard print_device_node_tree() if #if 0.
Currently print_device_node_tree() isn't called but it can be usful for debuging. Leave the function there but hide it behind '#if 0' to save it being rewritten. If you want to call it you're already editing this file anyway ;P Signed-off-by: Tony Breeds [EMAIL PROTECTED] --- Changes since v1: - No longer breaks CONFIG_PPC_PSERIES_DEBUG. arch/powerpc/platforms/pseries/eeh_driver.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/platforms/pseries/eeh_driver.c b/arch/powerpc/platforms/pseries/eeh_driver.c index 8c1ca47..0ad56ff 100644 --- a/arch/powerpc/platforms/pseries/eeh_driver.c +++ b/arch/powerpc/platforms/pseries/eeh_driver.c @@ -41,7 +41,7 @@ static inline const char * pcid_name (struct pci_dev *pdev) return ; } -#ifdef DEBUG +#if 0 static void print_device_node_tree(struct pci_dn *pdn, int dent) { int i; -- 1.5.6.3 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH v2] Force printing of 'total_memory' to unsigned long long in ppc_mmu_32.c
total_memory is a 'phys_addr_t', Which can be either 64 or 32 bits. Force printing as unsigned long long to silence the warning. Signed-off-by: Tony Breeds [EMAIL PROTECTED] --- Changes since v1: - correctly use 64bit type as phys_addr_t wont always be 32bits. Thanks to sfr for showing me the error of my ways ;P arch/powerpc/mm/ppc_mmu_32.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c index c53145f..07473e0 100644 --- a/arch/powerpc/mm/ppc_mmu_32.c +++ b/arch/powerpc/mm/ppc_mmu_32.c @@ -236,8 +236,8 @@ void __init MMU_init_hw(void) Hash_end = (struct hash_pte *) ((unsigned long)Hash + Hash_size); - printk(Total memory = %ldMB; using %ldkB for hash table (at %p)\n, - total_memory 20, Hash_size 10, Hash); + printk(Total memory = %lldMB; using %ldkB for hash table (at %p)\n, + (unsigned long long)total_memory 20, Hash_size 10, Hash); /* -- 1.5.6.3 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 5/8] Silence warning in arch/powerpc/mm/ppc_mmu_32.c
On Thu, Jul 31, 2008 at 03:21:25PM +1000, Stephen Rothwell wrote: Hi Tony, Rusty has forever ruined 'Hi $name' ;P snip Will this ever be built with CONFIG_PHYS_64BIT? Updated patch follows. Yours Tony linux.conf.auhttp://www.marchsouth.org/ Jan 19 - 24 2009 The Australian Linux Technical Conference! ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc/kdump: Fix /dev/oldmem interface
On Thu, 2008-07-31 at 12:24 +0530, Sachin P. Sant wrote: This patch fixes the /dev/oldmem interface for kdump on ppc64. The patch originally came from Michael Ellerman hence have retained the signed-off line. I just rediffed/tested against latest git. Michael has ack'ed this patch. Ben this is not a must for 2.6.27 but would be good if it's included. It's been broken long enough, I think it's a must for 27. Signed-off-by : Michael Ellerman [EMAIL PROTECTED] Acked-by : Michael Ellerman [EMAIL PROTECTED] Heh, do I get a prize for being neurotic? ;) .. Or just stupid, ahem. I think you should also sign off on it Sachin, see clause (c) of the developers cert (http://lwn.net/Articles/139918/) cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [Bugme-new] [Bug 11185] New: Device/host RESET in SCSI
On Wed, Jul 30, 2008 at 11:24:09PM -0700, Andrew Morton wrote: Why do you describe this regression as a powerpc problem rather than a scsi one? (It could be either or both, I'm just wondering...) This seems quite astute of the reporter. The error messages from sym2 are consistent with an interrupt routing problem. I have an idea for reporting this more effectively (because this comes up every 3-6 months or so) but testing that patch will have to wait until I'm back home. -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [Bugme-new] [Bug 11185] New: Device/host RESET in SCSI
On Thu, 2008-07-31 at 01:21 -0600, Matthew Wilcox wrote: On Wed, Jul 30, 2008 at 11:24:09PM -0700, Andrew Morton wrote: Why do you describe this regression as a powerpc problem rather than a scsi one? (It could be either or both, I'm just wondering...) This seems quite astute of the reporter. The error messages from sym2 are consistent with an interrupt routing problem. Hmm I suppose. In that case can we see the full dmesg and a tarball of /proc/device-tree from a working kernel, Cijoml? Which begs the question what was the latest working kernel version? cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: ide pmac breakage
There seems to be some confusion between warm-plugging of IDE devices and hot-plugging of IDE devices. not a single piece of HW to exercise those code path ? I don't ask you to get a powermac with a media-bay, but ide_cs seems to be a pretty important one that's part of what the ide maintainer should take care of... And I suspect it's going to exercise the same code path as mediabay. That confuses people sometimes - ide_cs is a controller hotplug not a device hotplug ... ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: ide pmac breakage
On Thu, 2008-07-31 at 09:49 +0100, Alan Cox wrote: There seems to be some confusion between warm-plugging of IDE devices and hot-plugging of IDE devices. not a single piece of HW to exercise those code path ? I don't ask you to get a powermac with a media-bay, but ide_cs seems to be a pretty important one that's part of what the ide maintainer should take care of... And I suspect it's going to exercise the same code path as mediabay. That confuses people sometimes - ide_cs is a controller hotplug not a device hotplug ... I could make the media-bay look like a controller hotplug if it was going to make things easier... Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc/kdump: Fix /dev/oldmem interface
Michael Ellerman wrote: Heh, do I get a prize for being neurotic? ;) .. Or just stupid, ahem. I think you should also sign off on it Sachin, see clause (c) of the developers cert (http://lwn.net/Articles/139918/) Here is a signed-off from me. Let me know if i need to resubmit the patch with all signed-off-by lines. Signed-off-by : Sachin Sant [EMAIL PROTECTED] -- Thanks -Sachin - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: ide pmac breakage
I could make the media-bay look like a controller hotplug if it was going to make things easier... I'm not sure it will. It may do nowdays, but the older IDE code historically was fairly broken for both cases except in 2.4. Also faking it as controller hotplug is the wrong path for libata which does real drive hot plug. Alan ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] powerpc - Initialize the irq radix tree earlier
The radix tree used for fast irq reverse mapping by the XICS is initialized late in the boot process, after the first interrupt (IPI) gets registered and after the first IPI is received. This patch moves the initialization of the XICS radix tree earlier into the boot process in smp_xics_probe() (the mm is already up but no interrupts have been registered at that point) to avoid having to insert a mapping into the tree in interrupt context. This will help in simplifying the locking constraints and move to a lockless radix tree in subsequent patches. As a nice side effect, there is no need any longer to check for (host-revmap_data.tree.gfp_mask != 0) to know if the tree have been initialized. Signed-off-by: Sebastien Dugue [EMAIL PROTECTED] Cc: Benjamin Herrenschmidt [EMAIL PROTECTED] Cc: Paul Mackerras [EMAIL PROTECTED] --- arch/powerpc/kernel/irq.c| 44 + arch/powerpc/platforms/pseries/smp.c |1 + include/asm-powerpc/irq.h|5 3 files changed, 18 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 6ac8612..0a1445c 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -840,9 +840,6 @@ void irq_dispose_mapping(unsigned int virq) host-revmap_data.linear.revmap[hwirq] = NO_IRQ; break; case IRQ_HOST_MAP_TREE: - /* Check if radix tree allocated yet */ - if (host-revmap_data.tree.gfp_mask == 0) - break; irq_radix_wrlock(flags); radix_tree_delete(host-revmap_data.tree, hwirq); irq_radix_wrunlock(flags); @@ -893,28 +890,28 @@ unsigned int irq_find_mapping(struct irq_host *host, } EXPORT_SYMBOL_GPL(irq_find_mapping); +void __init irq_radix_revmap_init(void) +{ + struct irq_host *h; + + list_for_each_entry(h, irq_hosts, link) { + if (h-revmap_type == IRQ_HOST_MAP_TREE) + INIT_RADIX_TREE(h-revmap_data.tree, GFP_ATOMIC); + } +} unsigned int irq_radix_revmap(struct irq_host *host, irq_hw_number_t hwirq) { - struct radix_tree_root *tree; struct irq_map_entry *ptr; unsigned int virq; unsigned long flags; WARN_ON(host-revmap_type != IRQ_HOST_MAP_TREE); - /* Check if the radix tree exist yet. We test the value of -* the gfp_mask for that. Sneaky but saves another int in the -* structure. If not, we fallback to slow mode -*/ - tree = host-revmap_data.tree; - if (tree-gfp_mask == 0) - return irq_find_mapping(host, hwirq); - - /* Now try to resolve */ + /* Try to resolve */ irq_radix_rdlock(flags); - ptr = radix_tree_lookup(tree, hwirq); + ptr = radix_tree_lookup(host-revmap_data.tree, hwirq); irq_radix_rdunlock(flags); /* Found it, return */ @@ -927,7 +924,7 @@ unsigned int irq_radix_revmap(struct irq_host *host, virq = irq_find_mapping(host, hwirq); if (virq != NO_IRQ) { irq_radix_wrlock(flags); - radix_tree_insert(tree, hwirq, irq_map[virq]); + radix_tree_insert(host-revmap_data.tree, hwirq, irq_map[virq]); irq_radix_wrunlock(flags); } return virq; @@ -1035,23 +1032,6 @@ void irq_early_init(void) get_irq_desc(i)-status |= IRQ_NOREQUEST; } -/* We need to create the radix trees late */ -static int irq_late_init(void) -{ - struct irq_host *h; - unsigned long flags; - - irq_radix_wrlock(flags); - list_for_each_entry(h, irq_hosts, link) { - if (h-revmap_type == IRQ_HOST_MAP_TREE) - INIT_RADIX_TREE(h-revmap_data.tree, GFP_ATOMIC); - } - irq_radix_wrunlock(flags); - - return 0; -} -arch_initcall(irq_late_init); - #ifdef CONFIG_VIRQ_DEBUG static int virq_debug_show(struct seq_file *m, void *private) { diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index 9d8f8c8..b143fe7 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -130,6 +130,7 @@ static void smp_xics_message_pass(int target, int msg) static int __init smp_xics_probe(void) { + irq_radix_revmap_init(); xics_request_IPIs(); return cpus_weight(cpu_possible_map); diff --git a/include/asm-powerpc/irq.h b/include/asm-powerpc/irq.h index 1ef8e30..47b8119 100644 --- a/include/asm-powerpc/irq.h +++ b/include/asm-powerpc/irq.h @@ -237,6 +237,11 @@ extern unsigned int irq_find_mapping(struct irq_host *host, */ extern unsigned int irq_create_direct_mapping(struct irq_host *host); +/* + * Initialize the radix tree used by some irq controllers + */ +extern void __init irq_radix_revmap_init(void); + /** * irq_radix_revmap - Find a linux virq from a
[PATCH] powerpc - Separate the irq radix tree insertion and lookup
irq_radix_revmap() currently serves 2 purposes, irq mapping lookup and insertion which happen in interrupt and process context respectively. Separate the function into its 2 components, one for lookup only and one for insertion only. Signed-off-by: Sebastien Dugue [EMAIL PROTECTED] Cc: Benjamin Herrenschmidt [EMAIL PROTECTED] Cc: Paul Mackerras [EMAIL PROTECTED] --- arch/powerpc/kernel/irq.c | 25 ++--- arch/powerpc/platforms/pseries/xics.c | 11 --- include/asm-powerpc/irq.h | 18 +++--- 3 files changed, 33 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 0a1445c..083b181 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -900,34 +900,37 @@ void __init irq_radix_revmap_init(void) } } -unsigned int irq_radix_revmap(struct irq_host *host, - irq_hw_number_t hwirq) +unsigned int irq_radix_revmap_lookup(struct irq_host *host, +irq_hw_number_t hwirq) { struct irq_map_entry *ptr; - unsigned int virq; + unsigned int virq = NO_IRQ; unsigned long flags; WARN_ON(host-revmap_type != IRQ_HOST_MAP_TREE); - /* Try to resolve */ irq_radix_rdlock(flags); ptr = radix_tree_lookup(host-revmap_data.tree, hwirq); irq_radix_rdunlock(flags); - /* Found it, return */ - if (ptr) { + if (ptr) virq = ptr - irq_map; - return virq; - } - /* If not there, try to insert it */ - virq = irq_find_mapping(host, hwirq); + return virq; +} + +void irq_radix_revmap_insert(struct irq_host *host, unsigned int virq, +irq_hw_number_t hwirq) +{ + unsigned long flags; + + WARN_ON(host-revmap_type != IRQ_HOST_MAP_TREE); + if (virq != NO_IRQ) { irq_radix_wrlock(flags); radix_tree_insert(host-revmap_data.tree, hwirq, irq_map[virq]); irq_radix_wrunlock(flags); } - return virq; } unsigned int irq_linear_revmap(struct irq_host *host, diff --git a/arch/powerpc/platforms/pseries/xics.c b/arch/powerpc/platforms/pseries/xics.c index 0fc830f..6b1a005 100644 --- a/arch/powerpc/platforms/pseries/xics.c +++ b/arch/powerpc/platforms/pseries/xics.c @@ -310,12 +310,6 @@ static void xics_mask_irq(unsigned int virq) static unsigned int xics_startup(unsigned int virq) { - unsigned int irq; - - /* force a reverse mapping of the interrupt so it gets in the cache */ - irq = (unsigned int)irq_map[virq].hwirq; - irq_radix_revmap(xics_host, irq); - /* unmask it */ xics_unmask_irq(virq); return 0; @@ -346,7 +340,7 @@ static inline unsigned int xics_remap_irq(unsigned int vec) if (vec == XICS_IRQ_SPURIOUS) return NO_IRQ; - irq = irq_radix_revmap(xics_host, vec); + irq = irq_radix_revmap_lookup(xics_host, vec); if (likely(irq != NO_IRQ)) return irq; @@ -530,6 +524,9 @@ static int xics_host_map(struct irq_host *h, unsigned int virq, { pr_debug(xics: map virq %d, hwirq 0x%lx\n, virq, hw); + /* Insert the interrupt mapping into the radix tree for fast lookup */ + irq_radix_revmap_insert(xics_host, virq, hw); + get_irq_desc(virq)-status |= IRQ_LEVEL; set_irq_chip_and_handler(virq, xics_irq_chip, handle_fasteoi_irq); return 0; diff --git a/include/asm-powerpc/irq.h b/include/asm-powerpc/irq.h index 47b8119..5c88acf 100644 --- a/include/asm-powerpc/irq.h +++ b/include/asm-powerpc/irq.h @@ -243,15 +243,27 @@ extern unsigned int irq_create_direct_mapping(struct irq_host *host); extern void __init irq_radix_revmap_init(void); /** - * irq_radix_revmap - Find a linux virq from a hw irq number. + * irq_radix_revmap_insert - Insert a hw irq to linux virq number mapping. + * @host: host owning this hardware interrupt + * @virq: linux irq number + * @hwirq: hardware irq number in that host space + * + * This is for use by irq controllers that use a radix tree reverse + * mapping for fast lookup. + */ +extern void irq_radix_revmap_insert(struct irq_host *host, unsigned int virq, + irq_hw_number_t hwirq); + +/** + * irq_radix_revmap_lookup - Find a linux virq from a hw irq number. * @host: host owning this hardware interrupt * @hwirq: hardware irq number in that host space * * This is a fast path, for use by irq controller code that uses radix tree * revmaps */ -extern unsigned int irq_radix_revmap(struct irq_host *host, -irq_hw_number_t hwirq); +extern unsigned int irq_radix_revmap_lookup(struct irq_host *host, + irq_hw_number_t hwirq); /** * irq_linear_revmap - Find a linux virq from a hw irq number. -- 1.5.5.1
[PATCH 0/3] powerpc - Make the irq reverse mapping tree lockless
Hi , here is a respin of the patches I posted last week for the RT kernel now targeted for mainline (http://lkml.org/lkml/2008/7/24/98). Thomas, steven, a note for you at the end. The goal of this patchset is to simplify the locking constraints on the radix tree used for IRQ reverse mapping on the pSeries machines and provide lockless access to this tree. This also solves the following BUG under preempt-rt: BUG: sleeping function called from invalid context swapper(1) at kernel/rtmutex.c:739 in_atomic():1 [0002], irqs_disabled():1 Call Trace: [c001e20f3340] [c0010370] .show_stack+0x70/0x1bc (unreliable) [c001e20f33f0] [c0049380] .__might_sleep+0x11c/0x138 [c001e20f3470] [c02a2f64] .__rt_spin_lock+0x3c/0x98 [c001e20f34f0] [c00c3f20] .kmem_cache_alloc+0x68/0x184 [c001e20f3590] [c0193f3c] .radix_tree_node_alloc+0xf0/0x144 [c001e20f3630] [c0195190] .radix_tree_insert+0x18c/0x2fc [c001e20f36f0] [c000c710] .irq_radix_revmap+0x1a4/0x1e4 [c001e20f37b0] [c003b3f0] .xics_startup+0x30/0x54 [c001e20f3840] [c008b864] .setup_irq+0x26c/0x370 [c001e20f38f0] [c008ba68] .request_irq+0x100/0x158 [c001e20f39a0] [c01ee9c0] .hvc_open+0xb4/0x148 [c001e20f3a40] [c01d72ec] .tty_open+0x200/0x368 [c001e20f3af0] [c00ce928] .chrdev_open+0x1f4/0x25c [c001e20f3ba0] [c00c8bf0] .__dentry_open+0x188/0x2c8 [c001e20f3c50] [c00c8dec] .do_filp_open+0x50/0x70 [c001e20f3d70] [c00c8e8c] .do_sys_open+0x80/0x148 [c001e20f3e20] [c000928c] .init_post+0x4c/0x100 [c001e20f3ea0] [c03c0e0c] .kernel_init+0x428/0x478 [c001e20f3f90] [c0027448] .kernel_thread+0x4c/0x68 The root cause of this bug lies in the fact that the XICS interrupt controller uses a radix tree for its reverse irq mapping and that we cannot allocate the tree nodes (even GFP_ATOMIC) with preemption disabled. In fact, we have 2 nested preemption disabling when we want to allocate a new node: - setup_irq() does a spin_lock_irqsave() before calling xics_startup() which then calls irq_radix_revmap() to insert a new node in the tree - irq_radix_revmap() also does a spin_lock_irqsave() (in irq_radix_wrlock()) before the radix_tree_insert() Also, if an IRQ gets registered before the tree is initialized (namely the IPI), it will be inserted into the tree in interrupt context once the tree have been initialized, hence the need for a spin_lock_irqsave() in the insertion path. This serie is split into 3 patches: - The first patch moves the initialization of the radix tree earlier in the boot process before any IRQ gets registered, but after the mm is up. - The second patch splits irq_radix_revmap() into its 2 components: one for lookup and one for insertion into the radix tree. - And finally, the third patch makes the radix tree fully lockless on the lookup side. Here is the diffstat for the whole patchset: arch/powerpc/kernel/irq.c | 134 - arch/powerpc/platforms/pseries/smp.c |1 + arch/powerpc/platforms/pseries/xics.c | 11 +-- include/asm-powerpc/irq.h | 24 +- 4 files changed, 58 insertions(+), 112 deletions(-) Thomas, Steven, the first 2 patches can be applied seamlessly to 2.6.26-rt1 with offsets, the third patch has a trivial to fix reject in arch/powerpc/kernel/irq.c because the irq_big_lock is changed to a raw spinlock in preempt-rt. If you want those patches for RT, just flag me, I have those sitting on my test box. Thanks, Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] powerpc - Make the irq reverse mapping radix tree lockless
The radix trees used by interrupt controllers for their irq reverse mapping (currently only the XICS found on pSeries) have a complex locking scheme dating back to before the advent of the lockless radix tree. Take advantage of this and of the fact that the items of the tree are pointers to a static array (irq_map) elements which can never go under us to simplify the locking. Concurrency between readers and writers is handled by the intrinsic properties of the lockless radix tree. Concurrency between writers is handled with a spinlock added to the irq_host structure. Signed-off-by: Sebastien Dugue [EMAIL PROTECTED] Cc: Benjamin Herrenschmidt [EMAIL PROTECTED] Cc: Paul Mackerras [EMAIL PROTECTED] --- arch/powerpc/kernel/irq.c | 75 ++-- include/asm-powerpc/irq.h |1 + 2 files changed, 12 insertions(+), 64 deletions(-) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 083b181..3aa683b 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -458,8 +458,6 @@ void do_softirq(void) static LIST_HEAD(irq_hosts); static DEFINE_SPINLOCK(irq_big_lock); -static DEFINE_PER_CPU(unsigned int, irq_radix_reader); -static unsigned int irq_radix_writer; struct irq_map_entry irq_map[NR_IRQS]; static unsigned int irq_virq_count = NR_IRQS; static struct irq_host *irq_default_host; @@ -602,57 +600,6 @@ void irq_set_virq_count(unsigned int count) irq_virq_count = count; } -/* radix tree not lockless safe ! we use a brlock-type mecanism - * for now, until we can use a lockless radix tree - */ -static void irq_radix_wrlock(unsigned long *flags) -{ - unsigned int cpu, ok; - - spin_lock_irqsave(irq_big_lock, *flags); - irq_radix_writer = 1; - smp_mb(); - do { - barrier(); - ok = 1; - for_each_possible_cpu(cpu) { - if (per_cpu(irq_radix_reader, cpu)) { - ok = 0; - break; - } - } - if (!ok) - cpu_relax(); - } while(!ok); -} - -static void irq_radix_wrunlock(unsigned long flags) -{ - smp_wmb(); - irq_radix_writer = 0; - spin_unlock_irqrestore(irq_big_lock, flags); -} - -static void irq_radix_rdlock(unsigned long *flags) -{ - local_irq_save(*flags); - __get_cpu_var(irq_radix_reader) = 1; - smp_mb(); - if (likely(irq_radix_writer == 0)) - return; - __get_cpu_var(irq_radix_reader) = 0; - smp_wmb(); - spin_lock(irq_big_lock); - __get_cpu_var(irq_radix_reader) = 1; - spin_unlock(irq_big_lock); -} - -static void irq_radix_rdunlock(unsigned long flags) -{ - __get_cpu_var(irq_radix_reader) = 0; - local_irq_restore(flags); -} - static int irq_setup_virq(struct irq_host *host, unsigned int virq, irq_hw_number_t hwirq) { @@ -807,7 +754,6 @@ void irq_dispose_mapping(unsigned int virq) { struct irq_host *host; irq_hw_number_t hwirq; - unsigned long flags; if (virq == NO_IRQ) return; @@ -840,9 +786,9 @@ void irq_dispose_mapping(unsigned int virq) host-revmap_data.linear.revmap[hwirq] = NO_IRQ; break; case IRQ_HOST_MAP_TREE: - irq_radix_wrlock(flags); + spin_lock(host-tree_lock); radix_tree_delete(host-revmap_data.tree, hwirq); - irq_radix_wrunlock(flags); + spin_unlock(host-tree_lock); break; } @@ -895,8 +841,10 @@ void __init irq_radix_revmap_init(void) struct irq_host *h; list_for_each_entry(h, irq_hosts, link) { - if (h-revmap_type == IRQ_HOST_MAP_TREE) + if (h-revmap_type == IRQ_HOST_MAP_TREE) { INIT_RADIX_TREE(h-revmap_data.tree, GFP_ATOMIC); + spin_lock_init(h-tree_lock); + } } } @@ -905,13 +853,14 @@ unsigned int irq_radix_revmap_lookup(struct irq_host *host, { struct irq_map_entry *ptr; unsigned int virq = NO_IRQ; - unsigned long flags; WARN_ON(host-revmap_type != IRQ_HOST_MAP_TREE); - irq_radix_rdlock(flags); + /* +* No rcu_read_lock(ing) needed, the ptr returned can't go under us +* as it's referencing an entry in the static irq_map table. +*/ ptr = radix_tree_lookup(host-revmap_data.tree, hwirq); - irq_radix_rdunlock(flags); if (ptr) virq = ptr - irq_map; @@ -922,14 +871,12 @@ unsigned int irq_radix_revmap_lookup(struct irq_host *host, void irq_radix_revmap_insert(struct irq_host *host, unsigned int virq, irq_hw_number_t hwirq) { - unsigned long flags; - WARN_ON(host-revmap_type != IRQ_HOST_MAP_TREE);
Re: ide pmac breakage
On Thu, 2008-07-31 at 10:13 +0100, Alan Cox wrote: I could make the media-bay look like a controller hotplug if it was going to make things easier... I'm not sure it will. It may do nowdays, but the older IDE code historically was fairly broken for both cases except in 2.4. Also faking it as controller hotplug is the wrong path for libata which does real drive hot plug. Yeah, that was my line of thinking initially, also the fact that it has the nice side effect of keeping the minor number stable. Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 0/3] powerpc - Make the irq reverse mapping tree lockless
OK, I goofed up with git-format-patch, forgot the --numbered option. The patches subjects should read: [PATCH 1/3] powerpc - Initialize the irq radix tree earlier [PATCH 2/3] powerpc - Separate the irq radix tree insertion and lookup [PATCH 3/3] powerpc - Make the irq reverse mapping radix tree lockless Sorry for that. Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks
On (30/07/08 13:07), Andrew Morton didst pronounce: On Wed, 30 Jul 2008 20:30:10 +0100 Mel Gorman [EMAIL PROTECTED] wrote: With Erics patch and libhugetlbfs, we can automatically back text/data[1], malloc[2] and stacks without source modification. Fairly soon, libhugetlbfs will also be able to override shmget() to add SHM_HUGETLB. That should cover a lot of the memory-intensive apps without source modification. The weak link in all of this still might be the need to reserve hugepages and the unreliability of dynamically allocating them. The dynamic allocation should be better nowadays, but I've lost track of how reliable it really is. What's our status there? We are a lot more reliable than we were although exact quantification is difficult because it's workload dependent. For a long time, I've been able to test bits and pieces with hugepages by allocating the pool at the time I needed it even after days of uptime. Previously this required a reboot. I've also been able to use the dynamic hugepage pool resizing effectively and we track how much it is succeeding and failing in /proc/vmstat (see the htlb fields) to watch for problems. Between that and /proc/pagetypeinfo, I am expecting to be able to identify availablilty problems. As an administrator can now set a minimum pool size and the maximum size of the pool (nr_hugepages and nr_overcommit_hugepages), the configuration difficulties should be relaxed. If it is found that anti-fragmentation can be broken down and pool resizing starts failing after X amount of time on Y workloads, there is still the option of using movablecore=BiggestPoolSizeIWillEverNeed and writing 1 to /proc/sys/vm/hugepages_treat_as_movable so the hugepage pool can grow/shrink reliably there. Overall, it's in pretty good shape. To be fair, one snag is that that swap is almost required for pool resizing to work as I never pushed to complete memory compaction (http://lwn.net/Articles/238837/). Hence, we depend on the workload to have lots of filesystem-backed data for lumpy-reclaim to do its job, for pool resizing to take place between batch jobs or for swap to be configured even if it's just for the duration of a pool resize. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: ide pmac breakage
On Thursday 31 July 2008, Benjamin Herrenschmidt wrote: Is it actually caused by additional reference counting on drive-gendev? IOW if you reverse the patch below instead of applying the previous fix do things work OK again? Note that there shouldn't be anything fundamentally different from ide-pmac here vs. something like pcmcia IDE cards... do you have one of these to test with ? Nope and I really don't intend to have one. I count on other people to take some care of support for host drivers that they maintain/use. ;) Reverting the patch below does the job. Thanks. Thanks, this narrows down the problem pretty nicely. [ Unfortunately we cannot revert the whole patch as it would break unloading of IDE host drivers modules so I still need you help on fixing this one. ] Lets get back to the oops: Vector: 300 (Data Access) at [c59dfdc0]   pc: c0211f78: ide_device_put+0x18/0x58   lr: c0223c34: ide_cd_put+0x40/0x5c   sp: c59dfe70   msr: 9032   dar: 10  dsisr: 4000  current = 0xc58a9880   pid  = 843, comm = media-bay enter ? for help [c59dfe80] c0223c34 ide_cd_put+0x40/0x5c [c59dfea0] c02114d4 generic_ide_remove+0x28/0x3c [c59dfeb0] c01ea108 __device_release_driver+0x78/0xb4 [c59dfec0] c01ea218 device_release_driver+0x28/0x44 [c59dfee0] c01e9350 bus_remove_device+0xac/0xd8 [c59dff00] c01e77f8 device_del+0x104/0x198 [c59dff20] c01e78a4 device_unregister+0x18/0x30 [c59dff40] c0212598 __ide_port_unregister_devices+0x6c/0x88 [c59dff60] c021276c ide_port_unregister_devices+0x38/0x80 [c59dff80] c0209078 media_bay_step+0x1cc/0x5c0 [c59dffb0] c02094f8 media_bay_task+0x8c/0xcc [c59dffd0] c0048640 kthread+0x48/0x84 [c59dfff0] c0011b38 kernel_thread+0x44/0x60 On a fresh look at ide_device_put(), ide_host_alloc() and pmac.c it may be that the above oops is actually media-bay specific. ide_device_put(): ... struct device *host_dev = drive-hwif-host-dev[0]; struct module *module = host_dev ? host_dev-driver-owner : NULL; ... ide_host_alloc(): ... if (hws[0]) host-dev[0] = hws[0]-dev; ... pmac.c: ... pmac_ide_macio_attach(struct macio_dev *mdev, const struct of_device_id *match) ... dev_set_drvdata(mdev-ofdev.dev, pmif); memset(hw, 0, sizeof(hw)); pmac_ide_init_ports(hw, pmif-regbase); hw.irq = irq; hw.dev = mdev-bus-pdev-dev; hw.parent = mdev-ofdev.dev; ... pmac macio is unique in using different devices for hwif-dev / host-dev (hw.dev) and hwif-gendev.parent / dev_set_drvdata() (hw.parent) [ I has been actually wondering why is it so for some time...? ] Thus we may be hitting oops in ide_device_put() on host_dev-driver because hw.dev is used as host-dev for pmac macio in ide_device_put() while we really want to use hw.parent. Fix should be as simple as: -host-dev[0] = hws[0]-dev; +host-dev[0] = hws[0]-parent ? hws[0]-parent : hws[0]-dev; Could you please try it together with my previous patch for ide_device_{get,put}()? Bart ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks
On (31/07/08 16:26), Nick Piggin didst pronounce: On Thursday 31 July 2008 16:14, Andrew Morton wrote: On Thu, 31 Jul 2008 16:04:14 +1000 Nick Piggin [EMAIL PROTECTED] wrote: Do we expect that this change will be replicated in other memory-intensive apps? (I do). Such as what? It would be nice to see some numbers with some HPC or java or DBMS workload using this. Not that I dispute it will help some cases, but 10% (or 20% for ppc) I guess is getting toward the best case, short of a specifically written TLB thrasher. I didn't realise the STREAM is using vast amounts of automatic memory. I'd assumed that it was using sane amounts of stack, but the stack TLB slots were getting zapped by all the heap-memory activity. Oh well. An easy mistake to make because that's probabably how STREAM would normally work. I think what Mel had done is to modify the stream kernel so as to have it operate on arrays of stack memory. Yes, I mentioned in the mail that STREAM was patched to use stack for its data. It was as much to show the patches were working as advertised even though it is an extreme case obviously. I had seen stack-hugepage-backing as something that would improve performance in addition to something else as opposed to having to stand entirely on its own. For example, I would expect many memory-intensive applications to gain by just having malloc and stack backed more than backing either in isolation. I guess that effect is still there, but smaller. I imagine it should be, unless you're using a CPU with seperate TLBs for small and huge pages, and your large data set is mapped with huge pages, in which case you might now introduce *new* TLB contention between the stack and the dataset :) Yes, this can happen particularly on older CPUs. For example, on my crash-test laptop the Pentium III there reports TLB and cache info: 01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries so a workload that sparsely addressed memory (i.e. = 4MB strides on each reference) might suffer more TLB misses with large pages than with small. It's hardly new that there are is uncertainity around when and if hugepages are of benefit and where. Also, interestingly I have actually seen some CPUs whos memory operations get significantly slower when operating on large pages than small (in the case when there is full TLB coverage for both sizes). This would make sense if the CPU only implements a fast L1 TLB for small pages. It's also possible there is a micro-TLB involved that only support small pages. It's been the case for a while that what wins on one machine type may lose on another. So for the vast majority of workloads, where stacks are relatively small (or slowly changing), and relatively hot, I suspect this could easily have no benefit at best and slowdowns at worst. I wouldn't expect an application with small stacks to request its stack to be backed by hugepages either. Ideally, it would be enabled because a large enough number of DTLB misses were found to be in the stack although catching this sort of data is tricky. But I'm not saying that as a reason not to merge it -- this is no different from any other hugepage allocations and as usual they have to be used selectively where they help I just wonder exactly where huge stacks will help. Benchmark wise, SPECcpu and SPEComp have stack-dependent benchmarks. Computations that partition problems with recursion I would expect to benefit as well as some JVMs that heavily use the stack (see how many docs suggest setting ulimit -s unlimited). Bit out there, but stack-based languages would stand to gain by this. The potential gap is for threaded apps as there will be stacks that are not the main stack. Backing those with hugepages depends on how they are allocated (malloc, it's easy, MAP_ANONYMOUS not so much). I agree that few real-world apps are likely to see gains of this order. More benchmarks, please :) Would be nice, if just out of morbid curiosity :) Benchmarks will happen, they just take time, you know the way. The STREAM one in the meantime is a this works and has an effect. I'm hoping Andrew Hastings will have figures at hand and I cc'd him elsewhere in the thread for comment. Thanks -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc - Initialize the irq radix tree earlier
On Thu, 2008-07-31 at 11:40 +0200, Sebastien Dugue wrote: The radix tree used for fast irq reverse mapping by the XICS is initialized late in the boot process, after the first interrupt (IPI) gets registered and after the first IPI is received. This patch moves the initialization of the XICS radix tree earlier into the boot process in smp_xics_probe() (the mm is already up but no interrupts have been registered at that point) to avoid having to insert a mapping into the tree in interrupt context. This will help in simplifying the locking constraints and move to a lockless radix tree in subsequent patches. As a nice side effect, there is no need any longer to check for (host-revmap_data.tree.gfp_mask != 0) to know if the tree have been initialized. Hi Sebastien, This is a nice cleanup, I think :) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 6ac8612..0a1445c 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -893,28 +890,28 @@ unsigned int irq_find_mapping(struct irq_host *host, } EXPORT_SYMBOL_GPL(irq_find_mapping); +void __init irq_radix_revmap_init(void) +{ + struct irq_host *h; + + list_for_each_entry(h, irq_hosts, link) { + if (h-revmap_type == IRQ_HOST_MAP_TREE) + INIT_RADIX_TREE(h-revmap_data.tree, GFP_ATOMIC); + } +} Note irq_radix_revmap_init() loops over all irq_hosts ... diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index 9d8f8c8..b143fe7 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -130,6 +130,7 @@ static void smp_xics_message_pass(int target, int msg) static int __init smp_xics_probe(void) { + irq_radix_revmap_init(); xics_request_IPIs(); But now it's only called from the xics setup code. Which seems a bit ugly. In practice it doesn't matter because at the moment xics is the only user of the radix revmap. But if we're going to switch to this sort of initialisation I think xics should only be init'ing the revmap for itself. This boot ordering stuff is pretty hairy, so I might have missed something, but this is how the code is ordered AFAICT:  start_kernel() init_IRQ() ... local_irq_enable() ... rest_init() kernel_thread() kernel_init() smp_prepare_cpus() smp_xics_probe()(via smp_ops-probe()) What's stopping us from taking an irq between local_irq_enable() and smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet? cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks
On Thursday 31 July 2008 21:27, Mel Gorman wrote: On (31/07/08 16:26), Nick Piggin didst pronounce: I imagine it should be, unless you're using a CPU with seperate TLBs for small and huge pages, and your large data set is mapped with huge pages, in which case you might now introduce *new* TLB contention between the stack and the dataset :) Yes, this can happen particularly on older CPUs. For example, on my crash-test laptop the Pentium III there reports TLB and cache info: 01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries Oh? Newer CPUs tend to have unified TLBs? Also, interestingly I have actually seen some CPUs whos memory operations get significantly slower when operating on large pages than small (in the case when there is full TLB coverage for both sizes). This would make sense if the CPU only implements a fast L1 TLB for small pages. It's also possible there is a micro-TLB involved that only support small pages. That is the case on a couple of contemporary CPUs I've tested with (although granted they are engineering samples, but I don't expect that to be the cause) So for the vast majority of workloads, where stacks are relatively small (or slowly changing), and relatively hot, I suspect this could easily have no benefit at best and slowdowns at worst. I wouldn't expect an application with small stacks to request its stack to be backed by hugepages either. Ideally, it would be enabled because a large enough number of DTLB misses were found to be in the stack although catching this sort of data is tricky. Sure, as I said, I have nothing against this functionality just because it has the possibility to cause a regression. I was just pointing out there are a few possibilities there, so it will take a particular type of app to take advantage of it. Ie. it is not something you would ever just enable just in case the stack starts thrashing the TLB. But I'm not saying that as a reason not to merge it -- this is no different from any other hugepage allocations and as usual they have to be used selectively where they help I just wonder exactly where huge stacks will help. Benchmark wise, SPECcpu and SPEComp have stack-dependent benchmarks. Computations that partition problems with recursion I would expect to benefit as well as some JVMs that heavily use the stack (see how many docs suggest setting ulimit -s unlimited). Bit out there, but stack-based languages would stand to gain by this. The potential gap is for threaded apps as there will be stacks that are not the main stack. Backing those with hugepages depends on how they are allocated (malloc, it's easy, MAP_ANONYMOUS not so much). Oh good, then there should be lots of possibilities to demonstrate it. Thanks, Nick ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc - Initialize the irq radix tree earlier
Hi Michael, On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman [EMAIL PROTECTED] wrote: On Thu, 2008-07-31 at 11:40 +0200, Sebastien Dugue wrote: The radix tree used for fast irq reverse mapping by the XICS is initialized late in the boot process, after the first interrupt (IPI) gets registered and after the first IPI is received. This patch moves the initialization of the XICS radix tree earlier into the boot process in smp_xics_probe() (the mm is already up but no interrupts have been registered at that point) to avoid having to insert a mapping into the tree in interrupt context. This will help in simplifying the locking constraints and move to a lockless radix tree in subsequent patches. As a nice side effect, there is no need any longer to check for (host-revmap_data.tree.gfp_mask != 0) to know if the tree have been initialized. Hi Sebastien, This is a nice cleanup, I think :) Thanks. diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 6ac8612..0a1445c 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -893,28 +890,28 @@ unsigned int irq_find_mapping(struct irq_host *host, } EXPORT_SYMBOL_GPL(irq_find_mapping); +void __init irq_radix_revmap_init(void) +{ + struct irq_host *h; + + list_for_each_entry(h, irq_hosts, link) { + if (h-revmap_type == IRQ_HOST_MAP_TREE) + INIT_RADIX_TREE(h-revmap_data.tree, GFP_ATOMIC); + } +} Note irq_radix_revmap_init() loops over all irq_hosts ... Yep, but there's only one host (xics) diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index 9d8f8c8..b143fe7 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -130,6 +130,7 @@ static void smp_xics_message_pass(int target, int msg) static int __init smp_xics_probe(void) { + irq_radix_revmap_init(); xics_request_IPIs(); But now it's only called from the xics setup code. Which seems a bit ugly. In practice it doesn't matter because at the moment xics is the only user of the radix revmap. But if we're going to switch to this sort of initialisation I think xics should only be init'ing the revmap for itself. You're right, that's what I intended to do from the beginning but stumbled upon ... hmm, can't remember what, that made me change my mind. But I agree, I'm not particularly proud of that. Will look again into that. This boot ordering stuff is pretty hairy, so I might have missed something, but this is how the code is ordered AFAICT:  start_kernel() init_IRQ() ... local_irq_enable() ... rest_init() kernel_thread() kernel_init() smp_prepare_cpus() smp_xics_probe()(via smp_ops-probe()) What's stopping us from taking an irq between local_irq_enable() and smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet? It's hairy, I agree, but as you've mentioned no one has done a request_irq() at that point. The first one to do it is smp_xics_probe() for the IPI. Thanks for your comments. Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc - Initialize the irq radix tree earlier
On Thu, 31 Jul 2008 14:00:02 +0200 Sebastien Dugue [EMAIL PROTECTED] wrote: Hi Michael, On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman [EMAIL PROTECTED] wrote: On Thu, 2008-07-31 at 11:40 +0200, Sebastien Dugue wrote: The radix tree used for fast irq reverse mapping by the XICS is initialized late in the boot process, after the first interrupt (IPI) gets registered and after the first IPI is received. This patch moves the initialization of the XICS radix tree earlier into the boot process in smp_xics_probe() (the mm is already up but no interrupts have been registered at that point) to avoid having to insert a mapping into the tree in interrupt context. This will help in simplifying the locking constraints and move to a lockless radix tree in subsequent patches. As a nice side effect, there is no need any longer to check for (host-revmap_data.tree.gfp_mask != 0) to know if the tree have been initialized. Hi Sebastien, This is a nice cleanup, I think :) Thanks. diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 6ac8612..0a1445c 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -893,28 +890,28 @@ unsigned int irq_find_mapping(struct irq_host *host, } EXPORT_SYMBOL_GPL(irq_find_mapping); +void __init irq_radix_revmap_init(void) +{ + struct irq_host *h; + + list_for_each_entry(h, irq_hosts, link) { + if (h-revmap_type == IRQ_HOST_MAP_TREE) + INIT_RADIX_TREE(h-revmap_data.tree, GFP_ATOMIC); + } +} Note irq_radix_revmap_init() loops over all irq_hosts ... Yep, but there's only one host (xics) diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index 9d8f8c8..b143fe7 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -130,6 +130,7 @@ static void smp_xics_message_pass(int target, int msg) static int __init smp_xics_probe(void) { + irq_radix_revmap_init(); xics_request_IPIs(); But now it's only called from the xics setup code. Which seems a bit ugly. In practice it doesn't matter because at the moment xics is the only user of the radix revmap. But if we're going to switch to this sort of initialisation I think xics should only be init'ing the revmap for itself. You're right, that's what I intended to do from the beginning but stumbled upon ... hmm, can't remember what, that made me change my mind. Ah yes, I wanted to do it from xics_init_host() but backed off because at that point the mm is not up. But it does not make a difference as the first request_irq() happens after the mm is up. A bit shaky I concede. But I agree, I'm not particularly proud of that. Will look again into that. This boot ordering stuff is pretty hairy, so I might have missed something, but this is how the code is ordered AFAICT:  start_kernel() init_IRQ() ... local_irq_enable() ... rest_init() kernel_thread() kernel_init() smp_prepare_cpus() smp_xics_probe()(via smp_ops-probe()) What's stopping us from taking an irq between local_irq_enable() and smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet? It's hairy, I agree, but as you've mentioned no one has done a request_irq() at that point. The first one to do it is smp_xics_probe() for the IPI. Thanks for your comments. Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc - Initialize the irq radix tree earlier
On Thu, 2008-07-31 at 14:00 +0200, Sebastien Dugue wrote: On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman [EMAIL PROTECTED] wrote: This boot ordering stuff is pretty hairy, so I might have missed something, but this is how the code is ordered AFAICT:  start_kernel() init_IRQ() ... local_irq_enable() ... rest_init() kernel_thread() kernel_init() smp_prepare_cpus() smp_xics_probe()(via smp_ops-probe()) What's stopping us from taking an irq between local_irq_enable() and smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet? It's hairy, I agree, but as you've mentioned no one has done a request_irq() at that point. The first one to do it is smp_xics_probe() for the IPI. Hmm, I don't think that's strong enough. I can trivially cause irqs to fire during a kexec reboot just by mashing the keyboard. And during a kdump boot all sorts of stuff could be firing. Even during a clean boot, from firmware, I don't think we can guarantee that nothing's going to fire. .. after a bit of testing .. It seems it actually works (sort of). xics_remap_irq() calls irq_radix_revmap_lookup(), which calls: ptr = radix_tree_lookup(host-revmap_data.tree, hwirq); And because host-revmap_data.tree was zalloc'ed we trip on the first check here: cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
ePAPR 1.0 is published
The ePAPR 1.0 spec is officially published and available on the power.org website. http://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.0.pdf One purpose of the spec is to more formally specify in one place the flat device tree concept defined booting_without_of.txt plus the relevant parts of 1275 and misc bindings created by the OF working group. It also contains a specification for how multicore boot works in an ePAPR-compliant system. Stuart Yoder Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc - Initialize the irq radix tree earlier
On Thu, 2008-07-31 at 22:58 +1000, Michael Ellerman wrote: On Thu, 2008-07-31 at 14:00 +0200, Sebastien Dugue wrote: On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman [EMAIL PROTECTED] wrote: This boot ordering stuff is pretty hairy, so I might have missed something, but this is how the code is ordered AFAICT:  start_kernel() init_IRQ() ... local_irq_enable() ... rest_init() kernel_thread() kernel_init() smp_prepare_cpus() smp_xics_probe()(via smp_ops-probe()) What's stopping us from taking an irq between local_irq_enable() and smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet? It's hairy, I agree, but as you've mentioned no one has done a request_irq() at that point. The first one to do it is smp_xics_probe() for the IPI. Hmm, I don't think that's strong enough. I can trivially cause irqs to fire during a kexec reboot just by mashing the keyboard. And during a kdump boot all sorts of stuff could be firing. Even during a clean boot, from firmware, I don't think we can guarantee that nothing's going to fire. .. after a bit of testing .. It seems it actually works (sort of). xics_remap_irq() calls irq_radix_revmap_lookup(), which calls: ptr = radix_tree_lookup(host-revmap_data.tree, hwirq); And because host-revmap_data.tree was zalloc'ed we trip on the first check here: @#$% ctrl-enter == send! Continuing ... void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index) { unsigned int height, shift; struct radix_tree_node *node, **slot; node = rcu_dereference(root-rnode); if (node == NULL) return NULL; Which means irq_radix_revmap_lookup() will return NO_IRQ, which is cool. So I think it can fly, as long as we're happy that we can't reverse map anything until smp_xics_probe() - and I think that's true, as any irq we take will be invalid. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Compact Flash on 8349mITX
Hello All, I'm using the latest 2.2.26 kernel image from linux/kernel/git/benh/powerpc.git on a mpc8349mITX, and trying to get Compact Flash to work. The kernel booted fine, but I was surprised to see Compact Flash IRQ was not being handled. Does irq polling need to be used for compact flash to work with this version of the kernel on this board? Here's snip of the unhandled IRQ, for your edification. irq 23: nobody cared (try booting with the irqpoll option) Call Trace: [cf829a30] [c00087d4] 0xc00087d4 (unreliable) [cf829a60] [c0052844] 0xc0052844 [cf829a80] [c0052ac4] 0xc0052ac4 [cf829ac0] [c0053520] 0xc0053520 [cf829ad0] [c000581c] 0xc000581c [cf829ae0] [c0012adc] 0xc0012adc --- Exception: 501 at 0xc002f1e0 LR = 0xc00058c8 [cf829ba0] [c02d4ef8] 0xc02d4ef8 (unreliable) [cf829bd0] [c00058c8] 0xc00058c8 [cf829be0] [c002f394] 0xc002f394 [cf829bf0] [c0005820] 0xc0005820 [cf829c00] [c0012adc] 0xc0012adc --- Exception: 501 at 0xc0052450 LR = 0xc0052574 [cf829cf0] [c00527e0] 0xc00527e0 [cf829d20] [c0053a04] 0xc0053a04 [cf829d40] [c01a7d88] 0xc01a7d88 [cf829d70] [c029a5bc] 0xc029a5bc [cf829db0] [c029a8b4] 0xc029a8b4 [cf829e40] [c0202d10] 0xc0202d10 [cf829e60] [c017ab60] 0xc017ab60 [cf829e80] [c017ad6c] 0xc017ad6c [cf829ea0] [c0179758] 0xc0179758 [cf829ed0] [c017adc4] 0xc017adc4 [cf829ee0] [c017a180] 0xc017a180 [cf829f10] [c017b39c] 0xc017b39c [cf829f30] [c0202f80] 0xc0202f80 [cf829f40] [c0320750] 0xc0320750 [cf829f50] [c03007ac] 0xc03007ac [cf829fd0] [c0300994] 0xc0300994 [cf829ff0] [c0012268] 0xc0012268 handlers: [c01b35c0] Disabling IRQ #23 Thanks for the help, Sam ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc - Initialize the irq radix tree earlier
On Thu, 31 Jul 2008 23:01:39 +1000 Michael Ellerman [EMAIL PROTECTED] wrote: On Thu, 2008-07-31 at 22:58 +1000, Michael Ellerman wrote: On Thu, 2008-07-31 at 14:00 +0200, Sebastien Dugue wrote: On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman [EMAIL PROTECTED] wrote: This boot ordering stuff is pretty hairy, so I might have missed something, but this is how the code is ordered AFAICT:  start_kernel() init_IRQ() ... local_irq_enable() ... rest_init() kernel_thread() kernel_init() smp_prepare_cpus() smp_xics_probe()(via smp_ops-probe()) What's stopping us from taking an irq between local_irq_enable() and smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet? It's hairy, I agree, but as you've mentioned no one has done a request_irq() at that point. The first one to do it is smp_xics_probe() for the IPI. Hmm, I don't think that's strong enough. I can trivially cause irqs to fire during a kexec reboot just by mashing the keyboard. And during a kdump boot all sorts of stuff could be firing. Even during a clean boot, from firmware, I don't think we can guarantee that nothing's going to fire. .. after a bit of testing .. It seems it actually works (sort of). xics_remap_irq() calls irq_radix_revmap_lookup(), which calls: ptr = radix_tree_lookup(host-revmap_data.tree, hwirq); And because host-revmap_data.tree was zalloc'ed we trip on the first check here: @#$% ctrl-enter == send! Continuing ... void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index) { unsigned int height, shift; struct radix_tree_node *node, **slot; node = rcu_dereference(root-rnode); if (node == NULL) return NULL; Which means irq_radix_revmap_lookup() will return NO_IRQ, which is cool. Which is what I intended so that as long as no IRQ is registered we return NO_IRQ. So I think it can fly, as long as we're happy that we can't reverse map anything until smp_xics_probe() - and I think that's true, as any irq we take will be invalid. That's true as no IRQs are registered before smp_xics_probe() and for any interrupt we might get before that, irq_radix_revmap_lookup() will return NO_IRQ. Thanks, Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Grant Likely wrote: On Fri, Jul 25, 2008 at 11:19:41AM -0500, Timur Tabi wrote: Wolfgang Grandegger wrote: I know but we still need an algorithm for MPC52xx and MPC82xx as well. That's true, but I still think hard-coding values of DFSR and FDR in the device tree is not a good way to do this. I agree, it should encode real frequencies, not raw register values. Digging deeper I'm frightened by plenty of platform specific code. We would need: - one table of divider,fdr,dfsr values for the MPC82/3/5/6xx processors (already available from Timur's U-Boot implementation) - one table of divider,fdr values for the MPC5200 rev A. - one table of divider,fdr values for the MPC5200 rev B. (the Rev. B has two more pre-scaler bits). - furthermore, there are various mpc-specific I2C clock sources: MPC82xx : fsl_get_sys_freq() MPC5200 : IPB MPC83xx : fsl_get_sys_freq() MPC8540/41/60/55,MPC8610: fsl_get_sys_freq() MPC8543/45/47/48/68, MPC8641: fsl_get_sys_freq()/2 MPC8544 : fsl_get_sys_freq()/2 or /3 It would make sense to hand-over the I2C frequency from U-Boot to Linux. Wolfgang. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks
On (31/07/08 21:51), Nick Piggin didst pronounce: On Thursday 31 July 2008 21:27, Mel Gorman wrote: On (31/07/08 16:26), Nick Piggin didst pronounce: I imagine it should be, unless you're using a CPU with seperate TLBs for small and huge pages, and your large data set is mapped with huge pages, in which case you might now introduce *new* TLB contention between the stack and the dataset :) Yes, this can happen particularly on older CPUs. For example, on my crash-test laptop the Pentium III there reports TLB and cache info: 01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries Oh? Newer CPUs tend to have unified TLBs? I've seen more unified DTLBs (ITLB tends to be split) than not but it could just be where I'm looking. For example, on the machine I'm writing this (Core Duo), it's TLB and cache info: 51: Instruction TLB: 4KB and 2MB or 4MB pages, 128 entries 5b: Data TLB: 4KB and 4MB pages, 64 entries DTLB is unified there but on my T60p laptop where I guess they want the CPU to be using less power and be cheaper, it's TLB info Instruction TLB: 4K pages, 4-way associative, 128 entries. Instruction TLB: 4MB pages, fully associative, 2 entries Data TLB: 4K pages, 4-way associative, 128 entries. Data TLB: 4MB pages, 4-way associative, 8 entries So I would expect huge pages to be slower there than in other cases. On one Xeon, I see 32 entries for huge pages and 256 for small pages so it's not straight-forward to predict. On another Xeon, I see the DLB is 64 entries unified. To make all this more complex, huge pages can be a win because less L2 cache is consumed on page table information. The gains are due to fewer access to main memory and less to do with TLB misses. So lets say we do have a TLB that is set-associative with very few large page entries, it could still end up winning because the increased usage of L2 offset the increased TLB misses. Predicting when huge pages are a win and when they are a loss is just not particularly straight-forward. Also, interestingly I have actually seen some CPUs whos memory operations get significantly slower when operating on large pages than small (in the case when there is full TLB coverage for both sizes). This would make sense if the CPU only implements a fast L1 TLB for small pages. It's also possible there is a micro-TLB involved that only support small pages. That is the case on a couple of contemporary CPUs I've tested with (although granted they are engineering samples, but I don't expect that to be the cause) I found it hard to determine if the CPU I was using at a uTLB or not. The manuals didn't cover the subject but it was a theory as to why large pages might be slower on a particular CPU. Whatever the reason, I'm ok admitting that large pages can be slower on smaller data sets and in other situations for whatever reason. It's not a major surprise. So for the vast majority of workloads, where stacks are relatively small (or slowly changing), and relatively hot, I suspect this could easily have no benefit at best and slowdowns at worst. I wouldn't expect an application with small stacks to request its stack to be backed by hugepages either. Ideally, it would be enabled because a large enough number of DTLB misses were found to be in the stack although catching this sort of data is tricky. Sure, as I said, I have nothing against this functionality just because it has the possibility to cause a regression. I was just pointing out there are a few possibilities there, so it will take a particular type of app to take advantage of it. Ie. it is not something you would ever just enable just in case the stack starts thrashing the TLB. No, it's something you'd enable because you know your app is using a lot of stack. If you are lazy, you might do a test run of the app with it enabled for the sake of curiousity and take the option that's faster :) But I'm not saying that as a reason not to merge it -- this is no different from any other hugepage allocations and as usual they have to be used selectively where they help I just wonder exactly where huge stacks will help. Benchmark wise, SPECcpu and SPEComp have stack-dependent benchmarks. Computations that partition problems with recursion I would expect to benefit as well as some JVMs that heavily use the stack (see how many docs suggest setting ulimit -s unlimited). Bit out there, but stack-based languages would stand to gain by this. The potential gap is for threaded apps as there will be stacks that are not the main stack. Backing those with hugepages depends on how they are allocated (malloc, it's easy, MAP_ANONYMOUS not so much). Oh good, then there should be lots of possibilities to demonstrate it. There should :) -- Mel Gorman Part-time Phd Student Linux Technology Center
Re: [PATCH] powerpc - Initialize the irq radix tree earlier
On Thu, 31 Jul 2008 23:39:26 +1000 Michael Ellerman [EMAIL PROTECTED] wrote: On Thu, 2008-07-31 at 15:26 +0200, Sebastien Dugue wrote: On Thu, 31 Jul 2008 23:01:39 +1000 Michael Ellerman [EMAIL PROTECTED] wrote: On Thu, 2008-07-31 at 22:58 +1000, Michael Ellerman wrote: On Thu, 2008-07-31 at 14:00 +0200, Sebastien Dugue wrote: On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman [EMAIL PROTECTED] wrote: This boot ordering stuff is pretty hairy, so I might have missed something, but this is how the code is ordered AFAICT:  start_kernel() init_IRQ() ... local_irq_enable() ... rest_init() kernel_thread() kernel_init() smp_prepare_cpus() smp_xics_probe()(via smp_ops-probe()) What's stopping us from taking an irq between local_irq_enable() and smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet? It's hairy, I agree, but as you've mentioned no one has done a request_irq() at that point. The first one to do it is smp_xics_probe() for the IPI. Hmm, I don't think that's strong enough. I can trivially cause irqs to fire during a kexec reboot just by mashing the keyboard. And during a kdump boot all sorts of stuff could be firing. Even during a clean boot, from firmware, I don't think we can guarantee that nothing's going to fire. .. after a bit of testing .. It seems it actually works (sort of). xics_remap_irq() calls irq_radix_revmap_lookup(), which calls: ptr = radix_tree_lookup(host-revmap_data.tree, hwirq); And because host-revmap_data.tree was zalloc'ed we trip on the first check here: @#$% ctrl-enter == send! Continuing ... void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index) { unsigned int height, shift; struct radix_tree_node *node, **slot; node = rcu_dereference(root-rnode); if (node == NULL) return NULL; Which means irq_radix_revmap_lookup() will return NO_IRQ, which is cool. Which is what I intended so that as long as no IRQ is registered we return NO_IRQ. So I think it can fly, as long as we're happy that we can't reverse map anything until smp_xics_probe() - and I think that's true, as any irq we take will be invalid. That's true as no IRQs are registered before smp_xics_probe() and for any interrupt we might get before that, irq_radix_revmap_lookup() will return NO_IRQ. Cool, we agree :) My only worry is that we might be relying on on the particular radix tree implementation a bit too much. Well maybe we could revert back to testing a flag just like we do for host-revmap_data.tree.gfp_mask != 0. Dunno. Is it documented somewhere that the /very/ first check is for root-rnode != NULL, and the rest of the root may be unintialised? Not in anything I could read except in looking at the code. And I think it needs a big fat comment in the irq code saying that it's safe because revmap_data is zalloc'ed, and that means the radix lookup will fail (safely). Yep, right. Will advertise this properly for the next round if this remains the prefered solution. Thanks, Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
RE: Compact Flash on 8349mITX
From: Sparks, Sam Sent: Thursday, July 31, 2008 8:15 AM Does irq polling need to be used for compact flash to work with this version of the kernel on this board? When I remove the interrupt definition from the DTS (to cause irqpolling), the ata driver gets stuck in a loop displaying the following. Does anyone have any insight? ata1.00: status: { DRDY } ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: hard resetting link ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: configured for UDMA/100 ata1: EH complete ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x20 (host bus error) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks
On Thu, 2008-07-31 at 14:50 +0100, Mel Gorman wrote: On (31/07/08 21:51), Nick Piggin didst pronounce: On Thursday 31 July 2008 21:27, Mel Gorman wrote: On (31/07/08 16:26), Nick Piggin didst pronounce: I imagine it should be, unless you're using a CPU with seperate TLBs for small and huge pages, and your large data set is mapped with huge pages, in which case you might now introduce *new* TLB contention between the stack and the dataset :) Yes, this can happen particularly on older CPUs. For example, on my crash-test laptop the Pentium III there reports TLB and cache info: 01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries Oh? Newer CPUs tend to have unified TLBs? I've seen more unified DTLBs (ITLB tends to be split) than not but it could just be where I'm looking. For example, on the machine I'm writing this (Core Duo), it's TLB and cache info: 51: Instruction TLB: 4KB and 2MB or 4MB pages, 128 entries 5b: Data TLB: 4KB and 4MB pages, 64 entries DTLB is unified there but on my T60p laptop where I guess they want the CPU to be using less power and be cheaper, it's TLB info Instruction TLB: 4K pages, 4-way associative, 128 entries. Instruction TLB: 4MB pages, fully associative, 2 entries Data TLB: 4K pages, 4-way associative, 128 entries. Data TLB: 4MB pages, 4-way associative, 8 entries Clearly I've been living under a rock, but I didn't know one could get such nicely formatted info. In case I'm not the only one, a bit of googling turned up x86info, courtesy of davej - apt-get'able and presumably yum'able too. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/5] x86: Define elfcorehdr_addr in arch dependent section
* Vivek Goyal [EMAIL PROTECTED] wrote: o Move elfcorehdr_addr definition in arch dependent crash dump file. This is equivalent to defining elfcorehdr_addr under CONFIG_CRASH_DUMP instead of CONFIG_PROC_VMCORE. This is needed by is_kdump_kernel() which can be used irrespective of the fact whether CONFIG_PROC_VMCORE is enabled or not. Signed-off-by: Vivek Goyal [EMAIL PROTECTED] --- arch/x86/kernel/crash_dump_32.c |3 +++ arch/x86/kernel/crash_dump_64.c |3 +++ arch/x86/kernel/setup.c |8 +++- 3 files changed, 13 insertions(+), 1 deletion(-) the x86 bits look fine to me. Acked-by: Ingo Molnar [EMAIL PROTECTED] Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: libfdt: Forgot one function when cleaning the namespace
In commit b6d80a20fc293f3b995c3ce1a6744a5574192125, we renamed all libfdt functions to be prefixed with fdt_ or _fdt_ to minimise the chance of collisions with things from whatever package libfdt is embedded in, pulled into the libfdt build via that environment's libfdt_env.h. Except... I missed one. This patch applies the same treatment to _stringlist_contains(). While we're at it, also make it static since it's only used in the same file. Signed-off-by: David Gibson [EMAIL PROTECTED] Applied. jdl ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: dtc: Remove unused lexer function
dtc does not use the input() function in flex. Apparently on some gcc versions the unused function will cause warnings. Therefore, this patch removes the function by using the 'noinput' option to flex. Signed-off-by: David Gibson [EMAIL PROTECTED] Applied. jdl ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] dtc: give advance warning that -S is going away.
The -S option allowed the specification of a minimum size for the blob, however the main reason for caring about the size is so there is enough padding to add a chosen node by u-boot or whoever. In which case, folks don't really care about the absolute size, but rather the size of the padding added for this -- which is what the -p option does. Having the -S just confuses people. Signed-off-by: Paul Gortmaker [EMAIL PROTECTED] Applied. jdl ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On 7/31/08, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Grant Likely wrote: On Fri, Jul 25, 2008 at 11:19:41AM -0500, Timur Tabi wrote: Wolfgang Grandegger wrote: I know but we still need an algorithm for MPC52xx and MPC82xx as well. That's true, but I still think hard-coding values of DFSR and FDR in the device tree is not a good way to do this. I agree, it should encode real frequencies, not raw register values. Digging deeper I'm frightened by plenty of platform specific code. We would need: - one table of divider,fdr,dfsr values for the MPC82/3/5/6xx processors (already available from Timur's U-Boot implementation) - one table of divider,fdr values for the MPC5200 rev A. - one table of divider,fdr values for the MPC5200 rev B. (the Rev. B has two more pre-scaler bits). Aren't the tables in the manual there just to make it easy for a human to pick out the line they want? For a computer you'd program the formula that was used to create the tables. I agree that it took me half an hour to figure out the formula that was needed to compute the i2s clocks, but once you figure out the formula it solves all of the cases and no one needs to read the manual any more. The i2c formula may even need a small loop which compares different solutions looking for the smallest error term. But it's a small space to search. These device tree flags should be removed, the driver can ask the platform code what CPU it is running on. if (of_get_property(op-node, dfsrr, NULL)) i2c-flags |= FSL_I2C_DEV_SEPARATE_DFSRR; if (of_device_is_compatible(op-node, fsl,mpc5200-i2c) || of_device_is_compatible(op-node, mpc5200-i2c)) i2c-flags |= FSL_I2C_DEV_CLOCK_5200; static void mpc_i2c_setclock(struct mpc_i2c *i2c) { /* Set clock and filters */ if (i2c-flags FSL_I2C_DEV_SEPARATE_DFSRR) { writeb(0x31, i2c-base + MPC_I2C_FDR); writeb(0x10, i2c-base + MPC_I2C_DFSRR); } else if (i2c-flags FSL_I2C_DEV_CLOCK_5200) writeb(0x3f, i2c-base + MPC_I2C_FDR); else writel(0x1031, i2c-base + MPC_I2C_FDR); } These defines shouldn't be here, they should get the offset from the right header file for the CPU. But it appears that structures for the i2c memory map haven't been done for the various CPUs. #define MPC_I2C_FDR 0x04 #define MPC_I2C_CR 0x08 #define MPC_I2C_SR 0x0c #define MPC_I2C_DR 0x10 #define MPC_I2C_DFSRR 0x14 There appears to be one for i2x8xxx but not the other CPUs. /* I2C */ typedef struct i2c { u_char i2c_i2mod; charres1[3]; u_char i2c_i2add; charres2[3]; u_char i2c_i2brg; charres3[3]; u_char i2c_i2com; charres4[3]; u_char i2c_i2cer; charres5[3]; u_char i2c_i2cmr; charres6[0x8b]; } i2c8xx_t; - furthermore, there are various mpc-specific I2C clock sources: MPC82xx : fsl_get_sys_freq() MPC5200 : IPB MPC83xx : fsl_get_sys_freq() MPC8540/41/60/55,MPC8610: fsl_get_sys_freq() MPC8543/45/47/48/68, MPC8641: fsl_get_sys_freq()/2 MPC8544 : fsl_get_sys_freq()/2 or /3 It would make sense to hand-over the I2C frequency from U-Boot to Linux. Wolfgang. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Jon Smirl wrote: Aren't the tables in the manual there just to make it easy for a human to pick out the line they want? For a computer you'd program the formula that was used to create the tables. Actually, the tables in the manuals are just an example of what can be programmed. They don't cover all cases. The tables assume a specific value of DFSR. For 8xxx, you want to use AN2919 to figure out how to really program the device. The table I generated for U-Boot is based on the calculations outlined in AN2919. I debated trying to implement the actual algorithm, but decided that pre-calculating the values was better. The algorithm is very convoluted. I agree that it took me half an hour to figure out the formula that was needed to compute the i2s clocks, but once you figure out the formula it solves all of the cases and no one needs to read the manual any more. The i2c formula may even need a small loop which compares different solutions looking for the smallest error term. But it's a small space to search. My understanding is that the algorithm itself is different on 8xxx vs. 52xx. It might be possible to combine 5200A and 5200B into one table/algorithm, but I don't think you can combine it with the 8xxx table. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[2.6 patch] Add cuImage.mpc866ads to the bootwrapper as a cuboot-8xx target
From: Scott Wood [EMAIL PROTECTED] This patch fixes the following build error with mpc866_ads_defconfig: -- snip -- ... WRAParch/powerpc/boot/cuImage.mpc866ads powerpc64-linux-ld: arch/powerpc/boot/cuboot-mpc866ads.o: No such file: No such file or directory make[2]: *** [arch/powerpc/boot/cuImage.mpc866ads] Error 1 -- snip -- Reported-by: Adrian Bunk [EMAIL PROTECTED] Signed-off-by: Scott Wood [EMAIL PROTECTED] Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- This patch was sent by Scott Wood on: - 21 May 2008 arch/powerpc/boot/wrapper |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper index d6c96d9..3b59e33 100755 --- a/arch/powerpc/boot/wrapper +++ b/arch/powerpc/boot/wrapper @@ -159,7 +159,7 @@ cuboot*) binary=y gzip= case $platform in -*-mpc885ads|*-adder875*|*-ep88xc) +*-mpc866ads|*-mpc885ads|*-adder875*|*-ep88xc) platformo=$object/cuboot-8xx.o ;; *5200*|*-motionpro) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 5:51 AM, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Grant Likely wrote: On Fri, Jul 25, 2008 at 11:19:41AM -0500, Timur Tabi wrote: Wolfgang Grandegger wrote: I know but we still need an algorithm for MPC52xx and MPC82xx as well. That's true, but I still think hard-coding values of DFSR and FDR in the device tree is not a good way to do this. I agree, it should encode real frequencies, not raw register values. Digging deeper I'm frightened by plenty of platform specific code. We would need: - one table of divider,fdr,dfsr values for the MPC82/3/5/6xx processors (already available from Timur's U-Boot implementation) - one table of divider,fdr values for the MPC5200 rev A. - one table of divider,fdr values for the MPC5200 rev B. (the Rev. B has two more pre-scaler bits). - furthermore, there are various mpc-specific I2C clock sources: MPC82xx : fsl_get_sys_freq() MPC5200 : IPB MPC83xx : fsl_get_sys_freq() MPC8540/41/60/55,MPC8610: fsl_get_sys_freq() MPC8543/45/47/48/68, MPC8641: fsl_get_sys_freq()/2 MPC8544 : fsl_get_sys_freq()/2 or /3 It would make sense to hand-over the I2C frequency from U-Boot to Linux. U-Boot isn't always available and there are plenty of 5200 and 8xxx boards out there which will never have U-Boot reflashed to provide this data. Also, there are boards that don't even use U-Boot. I don't want to go down this path. It is the drivers *job* to understand how to set these registers. If you're careful, the table doesn't need to be huge. It can be marked as initdata and conditionally compiled depending on which architectures are compiled in. You should use .data in the driver's of_device_id table to provide machine specific ops for setting clocking to avoid a maze of if/else statements. g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On 7/31/08, Grant Likely [EMAIL PROTECTED] wrote: If you're careful, the table doesn't need to be huge. It can be marked as initdata and conditionally compiled depending on which architectures are compiled in. You should use .data in the driver's of_device_id table to provide machine specific ops for setting clocking to avoid a maze of if/else statements. Does this look ok for the mpc5200 i2c struct? /* I2C Registers */ struct mpc52xx_i2c { u8 madr;/* I2C + 0x00 */ u8 reserved1[3];/* I2C + 0x01 */ u8 mfdr;/* I2C + 0x04 */ u8 reserved2[3];/* I2C + 0x05 */ u8 mcr; /* I2C + 0x08 */ u8 reserved3[3];/* I2C + 0x09 */ u8 msr; /* I2C + 0x0c */ u8 reserved4[3];/* I2C + 0x0d */ u8 mdr; /* I2C + 0x10 */ u8 reserved5[15]; /* I2C + 0x11 */ u8 interrupt; /* I2C + 0x20 */ u8 reserved6[3];/* I2C + 0x21 */ u8 mifr;/* I2C + 0x24 */ }; -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Jon Smirl wrote: On 7/31/08, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Grant Likely wrote: On Fri, Jul 25, 2008 at 11:19:41AM -0500, Timur Tabi wrote: Wolfgang Grandegger wrote: I know but we still need an algorithm for MPC52xx and MPC82xx as well. That's true, but I still think hard-coding values of DFSR and FDR in the device tree is not a good way to do this. I agree, it should encode real frequencies, not raw register values. Digging deeper I'm frightened by plenty of platform specific code. We would need: - one table of divider,fdr,dfsr values for the MPC82/3/5/6xx processors (already available from Timur's U-Boot implementation) - one table of divider,fdr values for the MPC5200 rev A. - one table of divider,fdr values for the MPC5200 rev B. (the Rev. B has two more pre-scaler bits). Aren't the tables in the manual there just to make it easy for a human to pick out the line they want? For a computer you'd program the formula that was used to create the tables. I have the formulas to create the tables, also for the MPC5200 Rev. A and B. That was not my point. I'm worried about arch specific code in i2c-mpc.c. It should go somewhere to arch/powerpc. I agree that it took me half an hour to figure out the formula that was needed to compute the i2s clocks, but once you figure out the formula it solves all of the cases and no one needs to read the manual any more. The i2c formula may even need a small loop which compares different solutions looking for the smallest error term. But it's a small space to search. These device tree flags should be removed, the driver can ask the platform code what CPU it is running on. if (of_get_property(op-node, dfsrr, NULL)) i2c-flags |= FSL_I2C_DEV_SEPARATE_DFSRR; if (of_device_is_compatible(op-node, fsl,mpc5200-i2c) || of_device_is_compatible(op-node, mpc5200-i2c)) i2c-flags |= FSL_I2C_DEV_CLOCK_5200; static void mpc_i2c_setclock(struct mpc_i2c *i2c) { /* Set clock and filters */ if (i2c-flags FSL_I2C_DEV_SEPARATE_DFSRR) { writeb(0x31, i2c-base + MPC_I2C_FDR); writeb(0x10, i2c-base + MPC_I2C_DFSRR); } else if (i2c-flags FSL_I2C_DEV_CLOCK_5200) writeb(0x3f, i2c-base + MPC_I2C_FDR); else writel(0x1031, i2c-base + MPC_I2C_FDR); } These defines shouldn't be here, they should get the offset from the right header file for the CPU. But it appears that structures for the i2c memory map haven't been done for the various CPUs. #define MPC_I2C_FDR 0x04 #define MPC_I2C_CR 0x08 #define MPC_I2C_SR 0x0c #define MPC_I2C_DR 0x10 #define MPC_I2C_DFSRR 0x14 There appears to be one for i2x8xxx but not the other CPUs. /* I2C */ typedef struct i2c { u_char i2c_i2mod; charres1[3]; u_char i2c_i2add; charres2[3]; u_char i2c_i2brg; charres3[3]; u_char i2c_i2com; charres4[3]; u_char i2c_i2cer; charres5[3]; u_char i2c_i2cmr; charres6[0x8b]; } i2c8xx_t; The I2C interface for the MPC5200 is not compatible with the one for the MPC83/4/5/6xx, AFAIK. Wolfgang. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Grant Likely wrote: On Thu, Jul 31, 2008 at 5:51 AM, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Grant Likely wrote: On Fri, Jul 25, 2008 at 11:19:41AM -0500, Timur Tabi wrote: Wolfgang Grandegger wrote: I know but we still need an algorithm for MPC52xx and MPC82xx as well. That's true, but I still think hard-coding values of DFSR and FDR in the device tree is not a good way to do this. I agree, it should encode real frequencies, not raw register values. Digging deeper I'm frightened by plenty of platform specific code. We would need: - one table of divider,fdr,dfsr values for the MPC82/3/5/6xx processors (already available from Timur's U-Boot implementation) - one table of divider,fdr values for the MPC5200 rev A. - one table of divider,fdr values for the MPC5200 rev B. (the Rev. B has two more pre-scaler bits). - furthermore, there are various mpc-specific I2C clock sources: MPC82xx : fsl_get_sys_freq() MPC5200 : IPB MPC83xx : fsl_get_sys_freq() MPC8540/41/60/55,MPC8610: fsl_get_sys_freq() MPC8543/45/47/48/68, MPC8641: fsl_get_sys_freq()/2 MPC8544 : fsl_get_sys_freq()/2 or /3 It would make sense to hand-over the I2C frequency from U-Boot to Linux. U-Boot isn't always available and there are plenty of 5200 and 8xxx boards out there which will never have U-Boot reflashed to provide this data. Also, there are boards that don't even use U-Boot. I don't want to go down this path. It is the drivers *job* to understand how to set these registers. If you're careful, the table doesn't need to be huge. It can be marked as initdata and conditionally compiled depending on which architectures are compiled in. You should use .data in the driver's of_device_id table to provide machine specific ops for setting clocking to avoid a maze of if/else statements. Yep, that makes sense. Wolfgang. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 11:22 AM, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Jon Smirl wrote: On 7/31/08, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Grant Likely wrote: On Fri, Jul 25, 2008 at 11:19:41AM -0500, Timur Tabi wrote: Wolfgang Grandegger wrote: I know but we still need an algorithm for MPC52xx and MPC82xx as well. That's true, but I still think hard-coding values of DFSR and FDR in the device tree is not a good way to do this. I agree, it should encode real frequencies, not raw register values. Digging deeper I'm frightened by plenty of platform specific code. We would need: - one table of divider,fdr,dfsr values for the MPC82/3/5/6xx processors (already available from Timur's U-Boot implementation) - one table of divider,fdr values for the MPC5200 rev A. - one table of divider,fdr values for the MPC5200 rev B. (the Rev. B has two more pre-scaler bits). Aren't the tables in the manual there just to make it easy for a human to pick out the line they want? For a computer you'd program the formula that was used to create the tables. I have the formulas to create the tables, also for the MPC5200 Rev. A and B. Oh, hey, even better. That was not my point. I'm worried about arch specific code in i2c-mpc.c. It should go somewhere to arch/powerpc. i2c-mpc *is* arch specific. I really don't think you need to worry about adding a block of code for each supported SoC family. Just change the of_match table to look something like this: static const struct of_device_id mpc_i2c_of_match[] = { {.compatible = fsl,mpc5200b-i2c, .data = fsl_i2c_mpc5200b_set_freq, }, {.compatible = fsl,mpc5200-i2c, .data = fsl_i2c_mpc5200_set_freq, }, {.compatible = fsl,mpc8260-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8349-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8540-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8543-i2c, .data = fsl_i2c_mpc8xxx_div2_set_freq, }, {.compatible = fsl,mpc8544-i2c, .data = fsl_i2c_mpc8xxx_div3_set_freq, }, /* keep this only for older device trees with some support code to figure out what .data should have pointed to. */ {.compatible = fsl-i2c, }, {}, }; MODULE_DEVICE_TABLE(of, mpc_i2c_of_match); g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On 7/31/08, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Jon Smirl wrote: There appears to be one for i2x8xxx but not the other CPUs. /* I2C */ typedef struct i2c { u_char i2c_i2mod; charres1[3]; u_char i2c_i2add; charres2[3]; u_char i2c_i2brg; charres3[3]; u_char i2c_i2com; charres4[3]; u_char i2c_i2cer; charres5[3]; u_char i2c_i2cmr; charres6[0x8b]; } i2c8xx_t; The I2C interface for the MPC5200 is not compatible with the one for the MPC83/4/5/6xx, AFAIK. Ignore that table, I mistook MPC8xx for MPC8xxx. That is a MPC8xx table. Wolfgang. -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On 7/31/08, Grant Likely [EMAIL PROTECTED] wrote: On Thu, Jul 31, 2008 at 11:06 AM, Jon Smirl [EMAIL PROTECTED] wrote: On 7/31/08, Grant Likely [EMAIL PROTECTED] wrote: If you're careful, the table doesn't need to be huge. It can be marked as initdata and conditionally compiled depending on which architectures are compiled in. You should use .data in the driver's of_device_id table to provide machine specific ops for setting clocking to avoid a maze of if/else statements. Does this look ok for the mpc5200 i2c struct? /* I2C Registers */ struct mpc52xx_i2c { u8 madr;/* I2C + 0x00 */ u8 reserved1[3];/* I2C + 0x01 */ u8 mfdr;/* I2C + 0x04 */ u8 reserved2[3];/* I2C + 0x05 */ u8 mcr; /* I2C + 0x08 */ u8 reserved3[3];/* I2C + 0x09 */ u8 msr; /* I2C + 0x0c */ u8 reserved4[3];/* I2C + 0x0d */ u8 mdr; /* I2C + 0x10 */ u8 reserved5[15]; /* I2C + 0x11 */ u8 interrupt; /* I2C + 0x20 */ u8 reserved6[3];/* I2C + 0x21 */ u8 mifr;/* I2C + 0x24 */ }; Ugh. I hate all the registers defined in structures thing done for 5200, but I guess it is better to stick with established convention than do it differently. Isn't it better than adding random numbers to a pointer and having to worry about what the pointer is pointing at so that it will multiply by the right word size? That's a mess when the registers are different lengths. A common i2c struct might be a better idea that adds in the dfsrr register might be better. You can set the flags for dfsrr use in these set_freq routines too since they select on CPU type. {.compatible = fsl,mpc5200b-i2c, .data = fsl_i2c_mpc5200b_set_freq, }, {.compatible = fsl,mpc5200-i2c, .data = fsl_i2c_mpc5200_set_freq, }, {.compatible = fsl,mpc8260-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8349-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8540-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8543-i2c, .data = fsl_i2c_mpc8xxx_div2_set_freq, }, {.compatible = fsl,mpc8544-i2c, .data = fsl_i2c_mpc8xxx_div3_set_freq, }, Yes, I think this is okay (but I haven't double checked the values; I trust you). g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Grant Likely wrote: On Thu, Jul 31, 2008 at 11:22 AM, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Jon Smirl wrote: On 7/31/08, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Grant Likely wrote: On Fri, Jul 25, 2008 at 11:19:41AM -0500, Timur Tabi wrote: Wolfgang Grandegger wrote: I know but we still need an algorithm for MPC52xx and MPC82xx as well. That's true, but I still think hard-coding values of DFSR and FDR in the device tree is not a good way to do this. I agree, it should encode real frequencies, not raw register values. Digging deeper I'm frightened by plenty of platform specific code. We would need: - one table of divider,fdr,dfsr values for the MPC82/3/5/6xx processors (already available from Timur's U-Boot implementation) - one table of divider,fdr values for the MPC5200 rev A. - one table of divider,fdr values for the MPC5200 rev B. (the Rev. B has two more pre-scaler bits). Aren't the tables in the manual there just to make it easy for a human to pick out the line they want? For a computer you'd program the formula that was used to create the tables. I have the formulas to create the tables, also for the MPC5200 Rev. A and B. Oh, hey, even better. That was not my point. I'm worried about arch specific code in i2c-mpc.c. It should go somewhere to arch/powerpc. i2c-mpc *is* arch specific. I really don't think you need to worry about adding a block of code for each supported SoC family. Just change the of_match table to look something like this: static const struct of_device_id mpc_i2c_of_match[] = { {.compatible = fsl,mpc5200b-i2c, .data = fsl_i2c_mpc5200b_set_freq, }, {.compatible = fsl,mpc5200-i2c, .data = fsl_i2c_mpc5200_set_freq, }, {.compatible = fsl,mpc8260-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8349-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8540-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8543-i2c, .data = fsl_i2c_mpc8xxx_div2_set_freq, }, {.compatible = fsl,mpc8544-i2c, .data = fsl_i2c_mpc8xxx_div3_set_freq, }, /* keep this only for older device trees with some support code to figure out what .data should have pointed to. */ {.compatible = fsl-i2c, }, {}, }; MODULE_DEVICE_TABLE(of, mpc_i2c_of_match); Cool, this would also make the dfsrr property obsolete. Just the MPC8544 needs more attention because the I2C clock can be programmed to be freq/2 or freq/3. Wolfgang. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Grant Likely wrote: static const struct of_device_id mpc_i2c_of_match[] = { {.compatible = fsl,mpc5200b-i2c, .data = fsl_i2c_mpc5200b_set_freq, }, {.compatible = fsl,mpc5200-i2c, .data = fsl_i2c_mpc5200_set_freq, }, {.compatible = fsl,mpc8260-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8349-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8540-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8543-i2c, .data = fsl_i2c_mpc8xxx_div2_set_freq, }, {.compatible = fsl,mpc8544-i2c, .data = fsl_i2c_mpc8xxx_div3_set_freq, }, So we need to update this table every time there's a new SOC? All 83xx, 85xx, and 86xx SOCs use the same table. I'd prefer an implementation that does need a specific entry for each variant of 8[356]xx. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Timur Tabi wrote: Grant Likely wrote: static const struct of_device_id mpc_i2c_of_match[] = { {.compatible = fsl,mpc5200b-i2c, .data = fsl_i2c_mpc5200b_set_freq, }, {.compatible = fsl,mpc5200-i2c, .data = fsl_i2c_mpc5200_set_freq, }, {.compatible = fsl,mpc8260-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8349-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8540-i2c, .data = fsl_i2c_mpc8xxx_set_freq, }, {.compatible = fsl,mpc8543-i2c, .data = fsl_i2c_mpc8xxx_div2_set_freq, }, {.compatible = fsl,mpc8544-i2c, .data = fsl_i2c_mpc8xxx_div3_set_freq, }, So we need to update this table every time there's a new SOC? All 83xx, 85xx, and 86xx SOCs use the same table. I'd prefer an implementation that does need a specific entry for each variant of 8[356]xx. We could add a property source-clock-divider = 2/3 if it's needed!? Wolfgang. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 12:07 PM, Wolfgang Grandegger [EMAIL PROTECTED] wrote: We could add a property source-clock-divider = 2/3 if it's needed!? fsl,source-clock-divider But, yes, this is a good solution (assuming that it is a board or SoC characteristic, and not just a choice that the driver has the option of making on it's own). g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Wolfgang Grandegger wrote: We could add a property source-clock-divider = 2/3 if it's needed!? How about we just get U-Boot to put the core frequency in the device tree? -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Grant Likely wrote: On Thu, Jul 31, 2008 at 12:07 PM, Wolfgang Grandegger [EMAIL PROTECTED] wrote: We could add a property source-clock-divider = 2/3 if it's needed!? fsl,source-clock-divider But, yes, this is a good solution (assuming that it is a board or SoC characteristic, and not just a choice that the driver has the option of making on it's own). I was going to suggest the actual clock frequency of the I2C device. It would be value of gd-i2c1_clk or gd-i2c2_clk from U-Boot. The actual divider value is meaningless. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Grant Likely wrote: This is a solved problem. The device tree simple claims compatibility with an older part that has the identical register-level interface. That would assume that the clock frequency is the only thing that decides compatibility, which may technically be true now, but I don't think it's a good idea. I don't understand what's wrong with simply specifying the actual clock frequency that the device uses? It varies from SOC to SOC, but U-Boot calculates today already: #if defined(CONFIG_MPC8540) || defined(CONFIG_MPC8541) || \ defined(CONFIG_MPC8560) || defined(CONFIG_MPC8555) gd-i2c1_clk = sys_info.freqSystemBus; #elif defined(CONFIG_MPC8544) /* * On the 8544, the I2C clock is the same as the SEC clock. This can be * either CCB/2 or CCB/3, depending on the value of cfg_sec_freq. See * 4.4.3.3 of the 8544 RM. Note that this might actually work for all * 85xx, but only the 8544 has cfg_sec_freq, so it's unknown if the * PORDEVSR2_SEC_CFG bit is 0 on all 85xx boards that are not an 8544. */ if (gur-pordevsr2 MPC85xx_PORDEVSR2_SEC_CFG) gd-i2c1_clk = sys_info.freqSystemBus / 3; else gd-i2c1_clk = sys_info.freqSystemBus / 2; #else /* Most 85xx SOCs use CCB/2, so this is the default behavior. */ gd-i2c1_clk = sys_info.freqSystemBus / 2; #endif gd-i2c2_clk = gd-i2c1_clk; We need this ugliness in U-Boot. Let's take advantage of this and do something clean and elegant in the device tree. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 01:10:30PM -0500, Timur Tabi wrote: Grant Likely wrote: On Thu, Jul 31, 2008 at 12:07 PM, Wolfgang Grandegger [EMAIL PROTECTED] wrote: We could add a property source-clock-divider = 2/3 if it's needed!? fsl,source-clock-divider But, yes, this is a good solution (assuming that it is a board or SoC characteristic, and not just a choice that the driver has the option of making on it's own). I was going to suggest the actual clock frequency of the I2C device. It would be value of gd-i2c1_clk or gd-i2c2_clk from U-Boot. The actual divider value is meaningless. I'm cool with that too, as long as it gets documented g. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 01:13:10PM -0500, Timur Tabi wrote: Grant Likely wrote: This is a solved problem. The device tree simple claims compatibility with an older part that has the identical register-level interface. That would assume that the clock frequency is the only thing that decides compatibility, which may technically be true now, but I don't think it's a good idea. No it doesn't, it depends on the register interface to decide compatibility. Clock interface is part of that. I suggested encoding the clock divider directly in compatible (implicit in the SoC version), but it doesn't have to be that way. If clock freq is obtained from another property or some other method then that is okay too. What is important is that if compatibility is claimed, then it must really be compatible! If some new part appears in the future that breaks our assumptions, then we'll need to rework the driver anyway and the new part will *not* claim compatibility with the old. I don't understand what's wrong with simply specifying the actual clock frequency that the device uses? It varies from SOC to SOC, but U-Boot calculates today already: There is nothing wrong with it (as long as we agree and it gets documented). I certainly don't have a problem with doing it this way. g. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] powerpc/mm: Implement _PAGE_SPECIAL pte_special() for 32-bit
Implement _PAGE_SPECIAL and pte_special() for 32-bit powerpc. This bit will be used by the fast get_user_pages() to differenciate PTEs that correspond to a valid struct page from special mappings that don't such as IO mappings obtained via io_remap_pfn_ranges(). We currently only implement this on sub-arch that support SMP or will so in the future (6xx, 44x, FSL-BookE) and not (8xx, 40x). Signed-off-by: Kumar Gala [EMAIL PROTECTED] --- Nick, do you forsee using _PAGE_SPECIAL for other applications that would be of interested to non-SMP hw? We can look at adding it into 8xx and 40x, but was being lazy as I assume there is no point. - k include/asm-powerpc/pgtable-ppc32.h | 15 +-- 1 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/asm-powerpc/pgtable-ppc32.h b/include/asm-powerpc/pgtable-ppc32.h index 6fe39e3..58afaee 100644 --- a/include/asm-powerpc/pgtable-ppc32.h +++ b/include/asm-powerpc/pgtable-ppc32.h @@ -261,6 +261,7 @@ extern int icache_44x_need_flush; #define _PAGE_HWEXEC 0x0004 /* H: Execute permission */ #define _PAGE_ACCESSED 0x0008 /* S: Page referenced */ #define _PAGE_DIRTY0x0010 /* S: Page dirty */ +#define _PAGE_SPECIAL 0x0020 /* S: Special page */ #define _PAGE_USER 0x0040 /* S: User page */ #define _PAGE_ENDIAN 0x0080 /* H: E bit */ #define _PAGE_GUARDED 0x0100 /* H: G bit */ @@ -276,6 +277,7 @@ extern int icache_44x_need_flush; /* ERPN in a PTE never gets cleared, ignore it */ #define _PTE_NONE_MASK 0xULL +#define __HAVE_ARCH_PTE_SPECIAL #elif defined(CONFIG_FSL_BOOKE) /* @@ -305,6 +307,7 @@ extern int icache_44x_need_flush; #define _PAGE_COHERENT 0x00100 /* H: M bit */ #define _PAGE_NO_CACHE 0x00200 /* H: I bit */ #define _PAGE_WRITETHRU0x00400 /* H: W bit */ +#define _PAGE_SPECIAL 0x00800 /* S: Special page */ #ifdef CONFIG_PTE_64BIT /* ERPN in a PTE never gets cleared, ignore it */ @@ -315,6 +318,8 @@ extern int icache_44x_need_flush; #define _PMD_PRESENT_MASK (PAGE_MASK) #define _PMD_BAD (~PAGE_MASK) +#define __HAVE_ARCH_PTE_SPECIAL + #elif defined(CONFIG_8xx) /* Definitions for 8xx embedded chips. */ #define _PAGE_PRESENT 0x0001 /* Page is valid */ @@ -362,6 +367,7 @@ extern int icache_44x_need_flush; #define _PAGE_ACCESSED 0x100 /* R: page referenced */ #define _PAGE_EXEC 0x200 /* software: i-cache coherency required */ #define _PAGE_RW 0x400 /* software: user write access allowed */ +#define _PAGE_SPECIAL 0x800 /* software: Special page */ #define _PTE_NONE_MASK _PAGE_HASHPTE @@ -372,6 +378,8 @@ extern int icache_44x_need_flush; /* Hash table based platforms need atomic updates of the linux PTE */ #define PTE_ATOMIC_UPDATES 1 +#define __HAVE_ARCH_PTE_SPECIAL + #endif /* @@ -404,6 +412,9 @@ extern int icache_44x_need_flush; #ifndef _PAGE_WRITETHRU #define _PAGE_WRITETHRU0 #endif +#ifndef _PAGE_SPECIAL +#define _PAGE_SPECIAL 0 +#endif #ifndef _PMD_PRESENT_MASK #define _PMD_PRESENT_MASK _PMD_PRESENT #endif @@ -533,7 +544,7 @@ static inline int pte_write(pte_t pte) { return pte_val(pte) _PAGE_RW; } static inline int pte_dirty(pte_t pte) { return pte_val(pte) _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return pte_val(pte) _PAGE_ACCESSED; } static inline int pte_file(pte_t pte) { return pte_val(pte) _PAGE_FILE; } -static inline int pte_special(pte_t pte) { return 0; } +static inline int pte_special(pte_t pte) { return pte_val(pte) _PAGE_SPECIAL; } static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } static inline void pte_cache(pte_t pte) { pte_val(pte) = ~_PAGE_NO_CACHE; } @@ -552,7 +563,7 @@ static inline pte_t pte_mkdirty(pte_t pte) { static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mkspecial(pte_t pte) { - return pte; } + pte_val(pte) |= _PAGE_SPECIAL; return pte; } static inline unsigned long pte_pgprot(pte_t pte) { return __pgprot(pte_val(pte)) PAGE_PROT_BITS; -- 1.5.5.1 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
libata badness
I'm getting the following badness with top of tree on a embedded PowerPC w/a ULI 1575 bridge (M5229 IDE): 02:1f.0 IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev c8) If you need more info let me know. - k ahci :02:1f.1: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl SATA mode ahci :02:1f.1: flags: ncq sntf ilck pm led clo pmp pio slum part [ cut here ] Badness at c021e700 [verbose debug info unavailable] NIP: c021e700 LR: c021e6e8 CTR: c022ce90 REGS: ef82bca0 TRAP: 0700 Not tainted (2.6.27-rc1-00495-g33bd9fe- dirty) MSR: 00029032 EE,ME,IR,DR CR: 22044022 XER: 2000 TASK = ef83[1] 'swapper' THREAD: ef82a000 CPU: 1 GPR00: 0001 ef82bd50 ef83 9032 ef0b64fc GPR08: ef937c10 c022f93f efbfc8e0 ef900b60 22044028 fffdd3ff 0ffe8700 c044c17c GPR16: c04d52ec ef82f458 ef82f4f4 ef82f4f4 c044c234 c044c5d8 c044c5d8 c044c220 GPR24: c044c218 c04d52f0 0080 c022f940 c044c244 efbec490 NIP [c021e700] ata_host_activate+0x40/0x10c LR [c021e6e8] ata_host_activate+0x28/0x10c Call Trace: [ef82bd50] [c021e6e8] ata_host_activate+0x28/0x10c (unreliable) [ef82bd80] [c022f48c] ahci_init_one+0x8b4/0xd68 [ef82be30] [c01b69c0] pci_device_probe+0x84/0xa8 [ef82be50] [c01e5438] driver_probe_device+0xb4/0x1e8 [ef82be70] [c01e55f0] __driver_attach+0x84/0x88 [ef82be90] [c01e4ba0] bus_for_each_dev+0x5c/0xa4 [ef82bec0] [c01e5254] driver_attach+0x24/0x34 [ef82bed0] [c01e44b8] bus_add_driver+0x1d8/0x24c [ef82bef0] [c01e5810] driver_register+0x70/0x160 [ef82bf10] [c01b6c54] __pci_register_driver+0x64/0xc4 [ef82bf30] [c04aaa60] ahci_init+0x28/0x38 [ef82bf40] [c048717c] do_one_initcall+0x38/0x1ac [ef82bfb0] [c04874e0] kernel_init+0x1f0/0x1fc [ef82bff0] [c0013b04] kernel_thread+0x44/0x60 Instruction dump: 7cbb2b78 90010034 7cda3378 7cf93b78 7c7e1b78 4bff9dbd 7c7f1b79 408200c8 2f9c 409e002c 313b 7c09d910 0f00 80010034 7fc3f378 7f24cb78 scsi0 : ahci scsi1 : ahci scsi2 : ahci scsi3 : ahci ata1: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006100 ata2: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006180 ata3: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006200 ata4: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006280 ata1: SATA link down (SStatus 0 SControl 300) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) scsi4 : pata_ali scsi5 : pata_ali ata5: PATA max UDMA/133 cmd 0x1200 ctl 0x1208 bmdma 0x1220 ata6: PATA max UDMA/133 cmd 0x1210 ctl 0x1218 bmdma 0x1228 ata5.00: ATAPI: TSSTcorp CDW/DVD SH-M522C, TS06, max UDMA/33 ata5.00: WARNING: ATAPI DMA disabled for reliablity issues. It can be enabled ata5.00: WARNING: via pata_ali.atapi_dma modparam or corresponding sysfs node. ata5.00: configured for UDMA/33 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.00: cmd a0/00:00:00:24:00/00:00:00:00:00/a0 tag 0 pio 36 in cdb 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5: soft resetting link ata5.00: configured for UDMA/33 ata5: EH complete ata5.00: limiting speed to UDMA/25:PIO4 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.00: cmd a0/00:00:00:24:00/00:00:00:00:00/a0 tag 0 pio 36 in cdb 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5: soft resetting link ata5.00: configured for UDMA/25 ata5: EH complete ata5.00: limiting speed to PIO4 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.00: cmd a0/00:00:00:24:00/00:00:00:00:00/a0 tag 0 pio 36 in cdb 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5: soft resetting link ata5.00: configured for PIO4 ata5: EH complete ata5.00: limiting speed to PIO3 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.00: cmd a0/00:00:00:24:00/00:00:00:00:00/a0 tag 0 pio 36 in cdb 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5: soft resetting link ata5.00: configured for PIO3 ata5: EH complete ata5.00: limiting speed to PIO0 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.00: cmd a0/00:00:00:24:00/00:00:00:00:00/a0 tag 0
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On 7/31/08, Timur Tabi [EMAIL PROTECTED] wrote: Grant Likely wrote: No it doesn't, it depends on the register interface to decide compatibility. Clock interface is part of that. I don't think so. The interface for programming the clock registers is identical on all 8[356]xx parts. The only thing that matters is what specific values to put in the FDR and DFSR registers to get a desired I2C bus speed. That answer is dependent on the actual clock input to the device, which is external to the device. I wouldn't call the input frequency a property of the I2C device. I suggested encoding the clock divider directly in compatible (implicit in the SoC version), but it doesn't have to be that way. If clock freq is obtained from another property or some other method then that is okay too. I think we agree on that. There is nothing wrong with it (as long as we agree and it gets documented). I certainly don't have a problem with doing it this way. I propose the property clock-frequency, like this: [EMAIL PROTECTED] { #address-cells = 1; #size-cells = 0; cell-index = 0; compatible = fsl-i2c; reg = 0x3000 0x100; interrupts = 14 0x8; interrupt-parent = ipic; dfsrr; The existence of the dfsrr and fsl,mpc5200-i2c attributes imply to me that the compatible strings were not done correctly. If these devices really were compatible we wouldn't need extra attributes to tell them apart. So I'm with Grant on adding compatible strings sufficient to remove the need for dfsrr and fsl,mpc5200-i2c. Something like... static const struct of_device_id mpc_i2c_of_match[] = { {.compatible = fsl,mpc5200b-i2c, .data = fsl_i2c_mpc5200, }, {.compatible = fsl,mpc5200-i2c, .data = fsl_i2c_mpc5200, }, {.compatible = fsl,mpc8xxx-i2c, .data = fsl_i2c_dfsrr, }, As for the source clock, how about creating a new global like ppc_proc_freq called ppc_ipb_freq. The platform code can then set the right clock value into the variable. For mpc8 get it from uboot. mpc5200 can easily compute it from ppc_proc_freq and checking how the ipb divider is set. That will move the clock problem out of the i2c driver. I'd like to move towards a more generic uboot that gets the processor minimally running and then use the device tree to customize as many things as possible. But that's just my opinion. -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Timur Tabi wrote: Grant Likely wrote: No it doesn't, it depends on the register interface to decide compatibility. Clock interface is part of that. I don't think so. The interface for programming the clock registers is identical on all 8[356]xx parts. The only thing that matters is what specific values to put in the FDR and DFSR registers to get a desired I2C bus speed. If it affects the values you need to write to the registers to achieve a given result, how is it not a difference in the register interface? I propose the property clock-frequency, like this: [EMAIL PROTECTED] { #address-cells = 1; #size-cells = 0; cell-index = 0; compatible = fsl-i2c; reg = 0x3000 0x100; interrupts = 14 0x8; interrupt-parent = ipic; dfsrr; clock-frequency = 0xblablabla; -- added by U-Boot }; A clock-frequency property is OK, and is in line with what we do in other types of nodes. However, in the long run it might be nice to introduce some sort of clock binding where, for example, the i2c node can point to a clock elsewhere in the device tree as an input clock. That way, less knowledge is required by the firmware to poke values all over the place, and it also allows one to describe situations where the frequency of the input clock can change (such as in low-power modes). -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Jon Smirl wrote: The existence of the dfsrr and fsl,mpc5200-i2c attributes imply to me that the compatible strings were not done correctly. If these devices really were compatible we wouldn't need extra attributes to tell them apart. I agree with that. So I'm with Grant on adding compatible strings sufficient to remove the need for dfsrr and fsl,mpc5200-i2c. Let's just make sure we don't break backwards compatibility. Something like... static const struct of_device_id mpc_i2c_of_match[] = { {.compatible = fsl,mpc5200b-i2c, .data = fsl_i2c_mpc5200, }, {.compatible = fsl,mpc5200-i2c, .data = fsl_i2c_mpc5200, }, {.compatible = fsl,mpc8xxx-i2c, .data = fsl_i2c_dfsrr, }, That seems ok. As for the source clock, how about creating a new global like ppc_proc_freq called ppc_ipb_freq. The platform code can then set the right clock value into the variable. For mpc8 get it from uboot. mpc5200 can easily compute it from ppc_proc_freq and checking how the ipb divider is set. That will move the clock problem out of the i2c driver. I don't want to differentiate between 52xx and 8xxx any more than we have to. If 8xxx has the clock frequency in the device tree, then 52xx should have it there, too. For backwards compatibility purposes, we can have the driver provide a hard-coded value of some kind if the property does not exist. I'd like to move towards a more generic uboot that gets the processor minimally running and then use the device tree to customize as many things as possible. But that's just my opinion. U-Boot needs to configure the I2C bus speed because U-Boot uses the I2C. We should capitalize on that by taking the information that U-Boot has calculated and re-use it. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On 7/31/08, Scott Wood [EMAIL PROTECTED] wrote: Timur Tabi wrote: Grant Likely wrote: No it doesn't, it depends on the register interface to decide compatibility. Clock interface is part of that. I don't think so. The interface for programming the clock registers is identical on all 8[356]xx parts. The only thing that matters is what specific values to put in the FDR and DFSR registers to get a desired I2C bus speed. If it affects the values you need to write to the registers to achieve a given result, how is it not a difference in the register interface? I propose the property clock-frequency, like this: [EMAIL PROTECTED] { #address-cells = 1; #size-cells = 0; cell-index = 0; compatible = fsl-i2c; reg = 0x3000 0x100; interrupts = 14 0x8; interrupt-parent = ipic; dfsrr; clock-frequency = 0xblablabla; -- added by U-Boot }; A clock-frequency property is OK, and is in line with what we do in other types of nodes. However, in the long run it might be nice to introduce some sort of clock binding where, for example, the i2c node can point to a clock elsewhere in the device tree as an input clock. That way, less knowledge is required by the firmware to poke values all over the place, and it also allows one to describe situations where the frequency of the input clock can change (such as in low-power modes). PowerPC,[EMAIL PROTECTED] { timebase-frequency = 0; /* From Bootloader */ bus-frequency = 0;/* From Bootloader */ clock-frequency = 0; /* From Bootloader */ }; The mpc5200 code already has mpc52xx_find_ipb_freq() to get it. I should grep before suggesting things. -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 01:35:51PM -0500, Timur Tabi wrote: Grant Likely wrote: No it doesn't, it depends on the register interface to decide compatibility. Clock interface is part of that. I don't think so. The interface for programming the clock registers is identical on all 8[356]xx parts. The only thing that matters is what specific values to put in the FDR and DFSR registers to get a desired I2C bus speed. That answer is dependent on the actual clock input to the device, which is external to the device. I wouldn't call the input frequency a property of the I2C device. Okay, I accept that argument. Make input frequency a property and be done with it. I suggested encoding the clock divider directly in compatible (implicit in the SoC version), but it doesn't have to be that way. If clock freq is obtained from another property or some other method then that is okay too. I think we agree on that. indeed. There is nothing wrong with it (as long as we agree and it gets documented). I certainly don't have a problem with doing it this way. I propose the property clock-frequency, like this: [EMAIL PROTECTED] { #address-cells = 1; #size-cells = 0; cell-index = 0; compatible = fsl-i2c; reg = 0x3000 0x100; interrupts = 14 0x8; interrupt-parent = ipic; dfsrr; clock-frequency = 0xblablabla; -- added by U-Boot }; I'm okay with this. Note that the dfsrr property already differentiates between 8xxx and 52xx, so maybe we don't need any other device tree changes. The whole dfsrr property is a really bad idea, and it just replaces the equally bad idea of an mpc5200-clocking property which is currently in use. This bit should definitely be determined via compatible. g. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Scott Wood wrote: Timur Tabi wrote: Grant Likely wrote: No it doesn't, it depends on the register interface to decide compatibility. Clock interface is part of that. I don't think so. The interface for programming the clock registers is identical on all 8[356]xx parts. The only thing that matters is what specific values to put in the FDR and DFSR registers to get a desired I2C bus speed. If it affects the values you need to write to the registers to achieve a given result, how is it not a difference in the register interface? I think we're splitting hairs, here. The code for actually programming the registers is the same, regardless of the input frequency. It's just that the input frequency is a value needed to calculate the right value. But like I said, I don't think this distinction is that relevant. A clock-frequency property is OK, and is in line with what we do in other types of nodes. However, in the long run it might be nice to introduce some sort of clock binding where, for example, the i2c node can point to a clock elsewhere in the device tree as an input clock. The only problem with that is that the actual input clock to the I2C device is not the same as any other device. It's a unique clock. Look at the code I had to write to figure out this clock just on 85xx: /* * The base clock for I2C depends on the actual SOC. Unfortunately, * there is no pattern that can be used to determine the frequency, so * the only choice is to look up the actual SOC number and use the value * for that SOC. This information is taken from application note * AN2919. */ #if defined(CONFIG_MPC8540) || defined(CONFIG_MPC8541) || \ defined(CONFIG_MPC8560) || defined(CONFIG_MPC8555) gd-i2c1_clk = sys_info.freqSystemBus; #elif defined(CONFIG_MPC8544) /* * On the 8544, the I2C clock is the same as the SEC clock. This can be * either CCB/2 or CCB/3, depending on the value of cfg_sec_freq. See * 4.4.3.3 of the 8544 RM. Note that this might actually work for all * 85xx, but only the 8544 has cfg_sec_freq, so it's unknown if the * PORDEVSR2_SEC_CFG bit is 0 on all 85xx boards that are not an 8544. */ if (gur-pordevsr2 MPC85xx_PORDEVSR2_SEC_CFG) gd-i2c1_clk = sys_info.freqSystemBus / 3; else gd-i2c1_clk = sys_info.freqSystemBus / 2; #else /* Most 85xx SOCs use CCB/2, so this is the default behavior. */ gd-i2c1_clk = sys_info.freqSystemBus / 2; #endif This is just for 85xx, and it only applies to the I2C devices. That way, less knowledge is required by the firmware to poke values all over the place, and it also allows one to describe situations where the frequency of the input clock can change (such as in low-power modes). I don't think it's possible to do what you want to do. I2C clocking is completely screwed up, and that's just the way the hardware was designed. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Timur Tabi wrote: Scott Wood wrote: A clock-frequency property is OK, and is in line with what we do in other types of nodes. However, in the long run it might be nice to introduce some sort of clock binding where, for example, the i2c node can point to a clock elsewhere in the device tree as an input clock. The only problem with that is that the actual input clock to the I2C device is not the same as any other device. It's a unique clock. Look at the code I had to write to figure out this clock just on 85xx: IIRC, only the divider is unique, and the divider that is applied to the input clock can be specified in the i2c node (either implicitly in compatible, or explicitly via a property). -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 02:57:25PM -0400, Jon Smirl wrote: On 7/31/08, Timur Tabi [EMAIL PROTECTED] wrote: Grant Likely wrote: I propose the property clock-frequency, like this: [EMAIL PROTECTED] { #address-cells = 1; #size-cells = 0; cell-index = 0; compatible = fsl-i2c; reg = 0x3000 0x100; interrupts = 14 0x8; interrupt-parent = ipic; dfsrr; The existence of the dfsrr and fsl,mpc5200-i2c attributes imply to me that the compatible strings were not done correctly. If these devices really were compatible we wouldn't need extra attributes to tell them apart. Yes, exactly right. So I'm with Grant on adding compatible strings sufficient to remove the need for dfsrr and fsl,mpc5200-i2c. Something like... static const struct of_device_id mpc_i2c_of_match[] = { {.compatible = fsl,mpc5200b-i2c, .data = fsl_i2c_mpc5200, }, {.compatible = fsl,mpc5200-i2c, .data = fsl_i2c_mpc5200, }, {.compatible = fsl,mpc8xxx-i2c, .data = fsl_i2c_dfsrr, }, Yes, but I don't agree with the last entry in this list. fsl,mpc8xxx-i2c is not a good value. Use an actual part number instead and have newer devices claim compatibility with the old. The problem with 8xxx is that you don't know if/when another 8xxx will arrive on the scene which does not conform to already made assumptions. If you specify an exact SoC part, then the problem goes away without any downside. As for the source clock, how about creating a new global like ppc_proc_freq called ppc_ipb_freq. The platform code can then set the right clock value into the variable. For mpc8 get it from uboot. mpc5200 can easily compute it from ppc_proc_freq and checking how the ipb divider is set. That will move the clock problem out of the i2c driver. In general, I don't like adding additional global or machine vars to Linux. I just don't think it in necessary and it can quickly get out of control. Instead, For 5200 I've exported a mpc52xx specific function that returns the clock frequency. The function can be adapted over time to handle new cases on the 52xx platform and it get compiled out if 5200 is not in use. As an alternative, clock-frequency could be made an optional property. If it isn't specified, then get the bus-frequency property from the parent node instead. g. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Wolfgang Grandegger wrote: No, the source clock is not identical for all 8[356]xx. Some use half or even a third of the SOC clock frequency. The platform clock divided by 2 or 3 *is* the source clock to the I2C. That's what I'm talking about. Linux must determine the real source clock frequency somehow. We may introduce the SOC property i2c-clock-frequency, which could be fixed up by U-Boot or a pre-loader (in case U-Boot is not used). Like for other frequency properties as well. That's what I'm proposing, but drop the i2c-. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: libata badness
On Thu, Jul 31, 2008 at 01:53:45PM -0500, Kumar Gala wrote: I'm getting the following badness with top of tree on a embedded PowerPC w/a ULI 1575 bridge (M5229 IDE): 02:1f.0 IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev c8) If you need more info let me know. - k ahci :02:1f.1: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl SATA mode ahci :02:1f.1: flags: ncq sntf ilck pm led clo pmp pio slum part [ cut here ] Badness at c021e700 [verbose debug info unavailable] it would be helpful to compile a kernel with verbose fault information turned on. NIP: c021e700 LR: c021e6e8 CTR: c022ce90 REGS: ef82bca0 TRAP: 0700 Not tainted (2.6.27-rc1-00495-g33bd9fe- dirty) MSR: 00029032 EE,ME,IR,DR CR: 22044022 XER: 2000 TASK = ef83[1] 'swapper' THREAD: ef82a000 CPU: 1 GPR00: 0001 ef82bd50 ef83 9032 ef0b64fc GPR08: ef937c10 c022f93f efbfc8e0 ef900b60 22044028 fffdd3ff 0ffe8700 c044c17c GPR16: c04d52ec ef82f458 ef82f4f4 ef82f4f4 c044c234 c044c5d8 c044c5d8 c044c220 GPR24: c044c218 c04d52f0 0080 c022f940 c044c244 efbec490 NIP [c021e700] ata_host_activate+0x40/0x10c LR [c021e6e8] ata_host_activate+0x28/0x10c Call Trace: [ef82bd50] [c021e6e8] ata_host_activate+0x28/0x10c (unreliable) [ef82bd80] [c022f48c] ahci_init_one+0x8b4/0xd68 [ef82be30] [c01b69c0] pci_device_probe+0x84/0xa8 [ef82be50] [c01e5438] driver_probe_device+0xb4/0x1e8 [ef82be70] [c01e55f0] __driver_attach+0x84/0x88 [ef82be90] [c01e4ba0] bus_for_each_dev+0x5c/0xa4 [ef82bec0] [c01e5254] driver_attach+0x24/0x34 [ef82bed0] [c01e44b8] bus_add_driver+0x1d8/0x24c [ef82bef0] [c01e5810] driver_register+0x70/0x160 [ef82bf10] [c01b6c54] __pci_register_driver+0x64/0xc4 [ef82bf30] [c04aaa60] ahci_init+0x28/0x38 [ef82bf40] [c048717c] do_one_initcall+0x38/0x1ac [ef82bfb0] [c04874e0] kernel_init+0x1f0/0x1fc [ef82bff0] [c0013b04] kernel_thread+0x44/0x60 Instruction dump: 7cbb2b78 90010034 7cda3378 7cf93b78 7c7e1b78 4bff9dbd 7c7f1b79 408200c8 2f9c 409e002c 313b 7c09d910 0f00 80010034 7fc3f378 7f24cb78 scsi0 : ahci scsi1 : ahci scsi2 : ahci scsi3 : ahci ata1: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006100 ata2: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006180 ata3: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006200 ata4: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006280 ata1: SATA link down (SStatus 0 SControl 300) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) scsi4 : pata_ali scsi5 : pata_ali ata5: PATA max UDMA/133 cmd 0x1200 ctl 0x1208 bmdma 0x1220 ata6: PATA max UDMA/133 cmd 0x1210 ctl 0x1218 bmdma 0x1228 Looks like you're running into the same problem as I did, with the fact that the M5529 is in native mode, but doesn't have any IRQ available. Do you know if the bridge chip in it is in routes the IRQs internally? ata5.00: ATAPI: TSSTcorp CDW/DVD SH-M522C, TS06, max UDMA/33 ata5.00: WARNING: ATAPI DMA disabled for reliablity issues. It can be enabled ata5.00: WARNING: via pata_ali.atapi_dma modparam or corresponding sysfs node. ata5.00: configured for UDMA/33 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.00: cmd a0/00:00:00:24:00/00:00:00:00:00/a0 tag 0 pio 36 in cdb 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5: soft resetting link ata5.00: configured for UDMA/33 ata5: EH complete ata5.00: limiting speed to UDMA/25:PIO4 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.00: cmd a0/00:00:00:24:00/00:00:00:00:00/a0 tag 0 pio 36 in cdb 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5: soft resetting link ata5.00: configured for UDMA/25 ata5: EH complete ata5.00: limiting speed to PIO4 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.00: cmd a0/00:00:00:24:00/00:00:00:00:00/a0 tag 0 pio 36 in cdb 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5: soft resetting link ata5.00: configured for PIO4 ata5: EH complete ata5.00: limiting speed to PIO3 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 09:54:48PM +0200, Wolfgang Grandegger wrote: Thinking more about it, it would be best to add the property i2c-clock-divider to the soc node and implement fsl_get_i2c_freq() in a similar way: [EMAIL PROTECTED] { #address-cells = 1; #size-cells = 1; device_type = soc; ranges = 0x0 0xe000 0x10; reg = 0xe000 0x1000; // CCSRBAR 1M bus-frequency = 0; i2c-clock-divider = 2; U-Boot could then fixup that value like bus-frequency() and the i2c-mpc driver simply calls fsl_get_i2c_freq(). Ugh. This is specifically related to the i2c device, so please place the property in the i2c device. Remember, device tree design is not about what will make the implementation simplest, but rather about what describes the hardware in the best way. Now, if you can argue that i2c-clock-frequency is actually a separate clock domain defined at the SoC level, not the i2c device level, then I will change my opinion. g. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Wolfgang Grandegger wrote: But clock-frequency, aka bus-frequency, is already used by fsl_get_sys_freq(): http://lxr.linux.no/linux+v2.6.26/arch/powerpc/sysdev/fsl_soc.c#L80 So? clock-frequency is a per-node property. I use it in the codec node on the 8610 (mpc8610_hpcd.dts). It does not mean platform clock frequency. U-Boot could then fixup that value like bus-frequency() and the i2c-mpc driver simply calls fsl_get_i2c_freq(). This is just more complicated than it needs to be. Why should the I2C driver fetch the platform clock and the divider from the parent node, and then do additional math, when it could just get the value it needs right from the node it's probing? Besides, U-Boot does not currently store the divider value. Look at the code I've posted twice already - it stores the frequency in i2c1_clk. So now I would need to create another variable in the gd_t to store the divider? No thanks. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Timur Tabi wrote: Wolfgang Grandegger wrote: But clock-frequency, aka bus-frequency, is already used by fsl_get_sys_freq(): http://lxr.linux.no/linux+v2.6.26/arch/powerpc/sysdev/fsl_soc.c#L80 So? clock-frequency is a per-node property. I use it in the codec node on the 8610 (mpc8610_hpcd.dts). It does not mean platform clock frequency. U-Boot could then fixup that value like bus-frequency() and the i2c-mpc driver simply calls fsl_get_i2c_freq(). This is just more complicated than it needs to be. Why should the I2C driver fetch the platform clock and the divider from the parent node, and then do additional math, when it could just get the value it needs right from the node it's probing? I'm a bit confused. The frequency of the I2C source clock and the real I2C clock frequency are two different things. The first one is common for all I2C devices, the second can be different. What properties would you like to use for defining both? Besides, U-Boot does not currently store the divider value. Look at the code I've posted twice already - it stores the frequency in i2c1_clk. So now I would need to create another variable in the gd_t to store the divider? No thanks. OK, that's an argument but it's biased by U-Boot. Wolfgang. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Grant Likely wrote: On Thu, Jul 31, 2008 at 09:54:48PM +0200, Wolfgang Grandegger wrote: Thinking more about it, it would be best to add the property i2c-clock-divider to the soc node and implement fsl_get_i2c_freq() in a similar way: [EMAIL PROTECTED] { #address-cells = 1; #size-cells = 1; device_type = soc; ranges = 0x0 0xe000 0x10; reg = 0xe000 0x1000; // CCSRBAR 1M bus-frequency = 0; i2c-clock-divider = 2; U-Boot could then fixup that value like bus-frequency() and the i2c-mpc driver simply calls fsl_get_i2c_freq(). Ugh. This is specifically related to the i2c device, so please place the property in the i2c device. Remember, device tree design is not about what will make the implementation simplest, but rather about what describes the hardware in the best way. Now, if you can argue that i2c-clock-frequency is actually a separate clock domain defined at the SoC level, not the i2c device level, then I will change my opinion. Is it not exactly that? For me it's not a per I2C device property, at least. Wolfgang. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Board level compatibility matching
This topic keeps coming up, so it is probably time to address it once and for all. When it comes to machine level support in arch/powerpc, there seems to me that there are two levels or machine support. Level 1 is the board specific stuff. Board X has a, b, and c things that need to be done for Linux to work correctly, but the fixups are entirely board specific and will only ever be used on a single board port. The lite5200 support in arch/powerpc/platforms/52xx is an example of this kind of board support. It binds on a value in the top level compatible property. Level 2 is kind of the generic catch-all machine support for systems that are unremarkable and don't require any special code to be run. In most cases, new boards can be supported by this generic code without any changes to the Linux kernel. arch/powerpc/platforms/52xx/mpc5200_simple.c is an example here. mpc5200_simple maintains a list of boards that are known to work with it. At the moment, every new board port forces a linux kernel source tree change, even if it is just adding a single string to the match table. I'm willing to wager that 99 times out of 100, boards based on the mpc5200 SoC will want to use the common board support code and that maintaining an explicit list of supported boards is completely unnecessary. I expect that the exact same is true for 8xxx and 4xx SoCs. So, rather than continuing to need to maintain explicit lists, I propose the following: - Add a property to the device tree that explicitly specifies the SoC that the board is based on. Something like 'soc-model = fsl,mpc5200b' would be appropriate. This in and of itself does not change the usage conventions, it just provides more information about the hardware. (Another idea is to add a string to the top level compatible property, but there are still arguments about what compatible really means in the root node.) - Prioritize board ports in the arch/powerpc/platforms directory to identify level-1 machines support from the level-2 ones. Make sure that level-1 stuff always gets probed before level-2 stuff within each SoC family. In all likelyhood, this would probably just involve making sure that board specific machines get linked in before the catchall machine. - Change level-2 machine support to bind on soc-model instead of an explicit compatible list. Doing so should simplify adding new board ports. In many cases it would just involve dropping in a new .dts file. However, it retains the flexability of overriding generic code with platform specific fixups as the need arises. I know we've been cautious about adding catch-all bindings to the device tree, and it is a big deal to avoid adding wildcards to compatible values. However, this solution should be workable because it doesn't involve stating something that is not true in the device tree and it maintains the ability to override cleanly when new bugs are discovered. It also doesn't try to define wildcard values in compatible or other device tree properties. Thoughts? g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: libata badness
On Jul 31, 2008, at 2:02 PM, Ben Dooks wrote: On Thu, Jul 31, 2008 at 01:53:45PM -0500, Kumar Gala wrote: I'm getting the following badness with top of tree on a embedded PowerPC w/a ULI 1575 bridge (M5229 IDE): 02:1f.0 IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev c8) If you need more info let me know. - k ahci :02:1f.1: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl SATA mode ahci :02:1f.1: flags: ncq sntf ilck pm led clo pmp pio slum part [ cut here ] Badness at c021e700 [verbose debug info unavailable] it would be helpful to compile a kernel with verbose fault information turned on. Doesn't seem to provide too much more info on ppc. just says the file/ line number of the badness. NIP: c021e700 LR: c021e6e8 CTR: c022ce90 REGS: ef82bca0 TRAP: 0700 Not tainted (2.6.27-rc1-00495-g33bd9fe- dirty) MSR: 00029032 EE,ME,IR,DR CR: 22044022 XER: 2000 TASK = ef83[1] 'swapper' THREAD: ef82a000 CPU: 1 GPR00: 0001 ef82bd50 ef83 9032 ef0b64fc GPR08: ef937c10 c022f93f efbfc8e0 ef900b60 22044028 fffdd3ff 0ffe8700 c044c17c GPR16: c04d52ec ef82f458 ef82f4f4 ef82f4f4 c044c234 c044c5d8 c044c5d8 c044c220 GPR24: c044c218 c04d52f0 0080 c022f940 c044c244 efbec490 NIP [c021e700] ata_host_activate+0x40/0x10c LR [c021e6e8] ata_host_activate+0x28/0x10c Call Trace: [ef82bd50] [c021e6e8] ata_host_activate+0x28/0x10c (unreliable) [ef82bd80] [c022f48c] ahci_init_one+0x8b4/0xd68 [ef82be30] [c01b69c0] pci_device_probe+0x84/0xa8 [ef82be50] [c01e5438] driver_probe_device+0xb4/0x1e8 [ef82be70] [c01e55f0] __driver_attach+0x84/0x88 [ef82be90] [c01e4ba0] bus_for_each_dev+0x5c/0xa4 [ef82bec0] [c01e5254] driver_attach+0x24/0x34 [ef82bed0] [c01e44b8] bus_add_driver+0x1d8/0x24c [ef82bef0] [c01e5810] driver_register+0x70/0x160 [ef82bf10] [c01b6c54] __pci_register_driver+0x64/0xc4 [ef82bf30] [c04aaa60] ahci_init+0x28/0x38 [ef82bf40] [c048717c] do_one_initcall+0x38/0x1ac [ef82bfb0] [c04874e0] kernel_init+0x1f0/0x1fc [ef82bff0] [c0013b04] kernel_thread+0x44/0x60 Instruction dump: 7cbb2b78 90010034 7cda3378 7cf93b78 7c7e1b78 4bff9dbd 7c7f1b79 408200c8 2f9c 409e002c 313b 7c09d910 0f00 80010034 7fc3f378 7f24cb78 scsi0 : ahci scsi1 : ahci scsi2 : ahci scsi3 : ahci ata1: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006100 ata2: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006180 ata3: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006200 ata4: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0x80006280 ata1: SATA link down (SStatus 0 SControl 300) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) scsi4 : pata_ali scsi5 : pata_ali ata5: PATA max UDMA/133 cmd 0x1200 ctl 0x1208 bmdma 0x1220 ata6: PATA max UDMA/133 cmd 0x1210 ctl 0x1218 bmdma 0x1228 Looks like you're running into the same problem as I did, with the fact that the M5529 is in native mode, but doesn't have any IRQ available. Do you know if the bridge chip in it is in routes the IRQs internally? They are routed via an i8259 in the bridge. - k ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Timur Tabi wrote: Wolfgang Grandegger wrote: I'm a bit confused. The frequency of the I2C source clock and the real I2C clock frequency are two different things. There are two frequencies: 1) The frequency of the input clock to the I2C device, after it has gone through a divider. This is what I call the I2C clock frequency and what I think belongs in the clock-frequency property. This is usually the platform clock divided by 1, 2, or 3. OK. 2) The speed of the I2C bus, as seen by devices on that bus. This is usually 400KHz. Which should be defined with the property current-speed, right? Wolfgang. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Wolfgang Grandegger wrote: I'm a bit confused. The frequency of the I2C source clock and the real I2C clock frequency are two different things. There are two frequencies: 1) The frequency of the input clock to the I2C device, after it has gone through a divider. This is what I call the I2C clock frequency and what I think belongs in the clock-frequency property. This is usually the platform clock divided by 1, 2, or 3. 2) The speed of the I2C bus, as seen by devices on that bus. This is usually 400KHz. The first one is common for all I2C devices, the second can be different. What properties would you like to use for defining both? The platform clock has no value to the I2C hardware, so I don't care anything about it. Besides, U-Boot does not currently store the divider value. Look at the code I've posted twice already - it stores the frequency in i2c1_clk. So now I would need to create another variable in the gd_t to store the divider? No thanks. OK, that's an argument but it's biased by U-Boot. As long as a method that is favorable to U-Boot does not put any undo hardship on non-U-Boot methods, I would say that it is the preferred method. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Wolfgang Grandegger wrote: Is it not exactly that? For me it's not a per I2C device property, at least. Of course it's a per-I2C device property. The input frequency to the I2C device is only seen by the I2C device, and no other device. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 2:19 PM, Timur Tabi [EMAIL PROTECTED] wrote: Wolfgang Grandegger wrote: I'm a bit confused. The frequency of the I2C source clock and the real I2C clock frequency are two different things. There are two frequencies: 1) The frequency of the input clock to the I2C device, after it has gone through a divider. This is what I call the I2C clock frequency and what I think belongs in the clock-frequency property. This is usually the platform clock divided by 1, 2, or 3. 2) The speed of the I2C bus, as seen by devices on that bus. This is usually 400KHz. analogous example: serial nodes use two properties; clock-frequency and current-speed, one for clock frequency routed into the device and one for the current baud rate. It is completely relevant to put the effective clock frequency into the i2c device node. g. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On 7/31/08, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Timur Tabi wrote: Wolfgang Grandegger wrote: But clock-frequency, aka bus-frequency, is already used by fsl_get_sys_freq(): http://lxr.linux.no/linux+v2.6.26/arch/powerpc/sysdev/fsl_soc.c#L80 So? clock-frequency is a per-node property. I use it in the codec node on the 8610 (mpc8610_hpcd.dts). It does not mean platform clock frequency. U-Boot could then fixup that value like bus-frequency() and the i2c-mpc driver simply calls fsl_get_i2c_freq(). This is just more complicated than it needs to be. Why should the I2C driver fetch the platform clock and the divider from the parent node, and then do additional math, when it could just get the value it needs right from the node it's probing? I'm a bit confused. The frequency of the I2C source clock and the real I2C clock frequency are two different things. The first one is common for all I2C devices, the second can be different. What properties would you like to use for defining both? For mpc5200 it is easy, mpc52xx_find_ipb_freq() already exists to get the source clock for the i2c devices. Each i2c node in the device tree can then specify clock-frequency = 40; or let it default. You have the input clock and the desired clock, do some math and you can set FDR. For the 8xxx chips there is no simple solution for obtain the input clock for the i2c section. Maybe create something like i2c-frequency and set it from uboot. The make another accessor function like mpc8xxx_find_i2c_freq(). PowerPC,[EMAIL PROTECTED] { timebase-frequency = 0; /* From Bootloader */ bus-frequency = 0;/* From Bootloader */ clock-frequency = 0; /* From Bootloader */ i2c-frequency = 0;/* From Bootloader */ }; Instead of creating i2c-frequency in the device tree, you could put this piece of code in the the mpc8xxx platform driver and use it to implement mpc8xxx_find_i2c_freq(). Read the PORDEVSR2_SEC_CFG bit back from whatever uboot set it too. #if defined(CONFIG_MPC8540) || defined(CONFIG_MPC8541) || \ defined(CONFIG_MPC8560) || defined(CONFIG_MPC8555) gd-i2c1_clk = sys_info.freqSystemBus; #elif defined(CONFIG_MPC8544) /* * On the 8544, the I2C clock is the same as the SEC clock. This can be * either CCB/2 or CCB/3, depending on the value of cfg_sec_freq. See * 4.4.3.3 of the 8544 RM. Note that this might actually work for all * 85xx, but only the 8544 has cfg_sec_freq, so it's unknown if the * PORDEVSR2_SEC_CFG bit is 0 on all 85xx boards that are not an 8544. */ if (gur-pordevsr2 MPC85xx_PORDEVSR2_SEC_CFG) gd-i2c1_clk = sys_info.freqSystemBus / 3; else gd-i2c1_clk = sys_info.freqSystemBus / 2; #else /* Most 85xx SOCs use CCB/2, so this is the default behavior. */ gd-i2c1_clk = sys_info.freqSystemBus / 2; #endif gd-i2c2_clk = gd-i2c1_clk; -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 2:32 PM, Jon Smirl [EMAIL PROTECTED] wrote: On 7/31/08, Wolfgang Grandegger [EMAIL PROTECTED] wrote: Timur Tabi wrote: Wolfgang Grandegger wrote: But clock-frequency, aka bus-frequency, is already used by fsl_get_sys_freq(): http://lxr.linux.no/linux+v2.6.26/arch/powerpc/sysdev/fsl_soc.c#L80 So? clock-frequency is a per-node property. I use it in the codec node on the 8610 (mpc8610_hpcd.dts). It does not mean platform clock frequency. U-Boot could then fixup that value like bus-frequency() and the i2c-mpc driver simply calls fsl_get_i2c_freq(). This is just more complicated than it needs to be. Why should the I2C driver fetch the platform clock and the divider from the parent node, and then do additional math, when it could just get the value it needs right from the node it's probing? I'm a bit confused. The frequency of the I2C source clock and the real I2C clock frequency are two different things. The first one is common for all I2C devices, the second can be different. What properties would you like to use for defining both? How is the divider controlled? Is it a fixed property of the SoC? a shared register setting? or a register setting within the i2c device? (My answer depends on the layout) g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Wolfgang Grandegger wrote: 2) The speed of the I2C bus, as seen by devices on that bus. This is usually 400KHz. Which should be defined with the property current-speed, right? I would use something like bus-speed, but yes. The word current shouldn't be in a property string. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: libata badness
On Thu, Jul 31, 2008 at 03:24:31PM -0500, Kumar Gala wrote: it would be helpful to compile a kernel with verbose fault information turned on. Doesn't seem to provide too much more info on ppc. I'm pretty sure it does the same thing on ppc as on all the other architectures. just says the file/ line number of the badness. ...which is a lot more useful than an address in someone else's kernel image. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Board level compatibility matching
Grant Likely wrote: Doing so should simplify adding new board ports. In many cases it would just involve dropping in a new .dts file. However, it retains the flexability of overriding generic code with platform specific fixups as the need arises. If it makes it easier for board vendors to do a clean port, you get a vote from me. I've seen a large amount of vendor code from that needed a lot of massaging before it would play nice with support for similar boards from other vendors--not to mention how ugly it looked with all the board-specific ifdefs. Chris ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Jon Smirl wrote: For mpc5200 it is easy, mpc52xx_find_ipb_freq() already exists to get the source clock for the i2c devices. Each i2c node in the device tree can then specify clock-frequency = 40; or let it default. You 400KHz is the *speed* of the I2C bus, so let's be sure to use speed in the property name. Reserve the word bus for the input clock to the I2C device. #if defined(CONFIG_MPC8540) || defined(CONFIG_MPC8541) || \ defined(CONFIG_MPC8560) || defined(CONFIG_MPC8555) gd-i2c1_clk = sys_info.freqSystemBus; #elif defined(CONFIG_MPC8544) /* * On the 8544, the I2C clock is the same as the SEC clock. This can be * either CCB/2 or CCB/3, depending on the value of cfg_sec_freq. See * 4.4.3.3 of the 8544 RM. Note that this might actually work for all * 85xx, but only the 8544 has cfg_sec_freq, so it's unknown if the * PORDEVSR2_SEC_CFG bit is 0 on all 85xx boards that are not an 8544. */ if (gur-pordevsr2 MPC85xx_PORDEVSR2_SEC_CFG) gd-i2c1_clk = sys_info.freqSystemBus / 3; else gd-i2c1_clk = sys_info.freqSystemBus / 2; #else /* Most 85xx SOCs use CCB/2, so this is the default behavior. */ gd-i2c1_clk = sys_info.freqSystemBus / 2; #endif gd-i2c2_clk = gd-i2c1_clk; I think the whole point is to eliminate duplicating this code in the Linux driver. It's hideously ugly, so it should be limited as much as possible. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On 7/31/08, Timur Tabi [EMAIL PROTECTED] wrote: Jon Smirl wrote: For mpc5200 it is easy, mpc52xx_find_ipb_freq() already exists to get the source clock for the i2c devices. Each i2c node in the device tree can then specify clock-frequency = 40; or let it default. You 400KHz is the *speed* of the I2C bus, so let's be sure to use speed in the property name. Reserve the word bus for the input clock to the I2C device. #if defined(CONFIG_MPC8540) || defined(CONFIG_MPC8541) || \ defined(CONFIG_MPC8560) || defined(CONFIG_MPC8555) gd-i2c1_clk = sys_info.freqSystemBus; #elif defined(CONFIG_MPC8544) /* * On the 8544, the I2C clock is the same as the SEC clock. This can be * either CCB/2 or CCB/3, depending on the value of cfg_sec_freq. See * 4.4.3.3 of the 8544 RM. Note that this might actually work for all * 85xx, but only the 8544 has cfg_sec_freq, so it's unknown if the * PORDEVSR2_SEC_CFG bit is 0 on all 85xx boards that are not an 8544. */ if (gur-pordevsr2 MPC85xx_PORDEVSR2_SEC_CFG) gd-i2c1_clk = sys_info.freqSystemBus / 3; else gd-i2c1_clk = sys_info.freqSystemBus / 2; #else /* Most 85xx SOCs use CCB/2, so this is the default behavior. */ gd-i2c1_clk = sys_info.freqSystemBus / 2; #endif gd-i2c2_clk = gd-i2c1_clk; I think the whole point is to eliminate duplicating this code in the Linux driver. It's hideously ugly, so it should be limited as much as possible. It wouldn't go into the i2c driver, it would go into the mpc8xxx platform driver. Why is it bad to put it into the mpc8xxx platform driver? It is an accurate description of the mpc8xxx platform isn't it? -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Grant Likely wrote: How is the divider controlled? Is it a fixed property of the SoC? Yes. The divider is either 1, 2, or 3, and the only way to know which one it is is to look up the specific SOC model number. And depending on the SOC model, there may also be a register that needs to be looked up. a shared register setting? or a register setting within the i2c device? The I2C device itself has no idea what the divider is. It only sees the result of the divider. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On Thu, Jul 31, 2008 at 03:37:07PM -0500, Timur Tabi wrote: Grant Likely wrote: How is the divider controlled? Is it a fixed property of the SoC? Yes. The divider is either 1, 2, or 3, and the only way to know which one it is is to look up the specific SOC model number. And depending on the SOC model, there may also be a register that needs to be looked up. a shared register setting? or a register setting within the i2c device? The I2C device itself has no idea what the divider is. It only sees the result of the divider. Then that absolutely suggests to me that either the final clock or the divider should be encoded in the i2c node; not in the soc node. g. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Board level compatibility matching
On 7/31/08, Grant Likely [EMAIL PROTECTED] wrote: This topic keeps coming up, so it is probably time to address it once and for all. When it comes to machine level support in arch/powerpc, there seems to me that there are two levels or machine support. .. Thoughts? g. As part of this, how can we going to solve the problem with triggering the load of a board specific machine/fabric driver in a generic way? -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
Jon Smirl wrote: It wouldn't go into the i2c driver, it would go into the mpc8xxx platform driver. Why is it bad to put it into the mpc8xxx platform driver? It is an accurate description of the mpc8xxx platform isn't it? There's no need to put that code in the platform driver because U-Boot will put the final result in the device tree! That way, we won't need to update the platform driver *and* U-Boot for any new SOC. -- Timur Tabi Linux kernel developer at Freescale ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Board level compatibility matching
On Thu, Jul 31, 2008 at 04:49:49PM -0400, Jon Smirl wrote: On 7/31/08, Grant Likely [EMAIL PROTECTED] wrote: This topic keeps coming up, so it is probably time to address it once and for all. When it comes to machine level support in arch/powerpc, there seems to me that there are two levels or machine support. .. Thoughts? g. As part of this, how can we going to solve the problem with triggering the load of a board specific machine/fabric driver in a generic way? That really is a separate problem. We *could* do this with a board specific powerpc machine driver, but I don't think it is the best solution. I'm still thinking that the drivers module_init() function could check the top level board model property and decide whether or not to load based on that. g. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] powerpc: i2c-mpc: make speed registers configurable via FDT
On 7/31/08, Grant Likely [EMAIL PROTECTED] wrote: On Thu, Jul 31, 2008 at 03:37:07PM -0500, Timur Tabi wrote: Grant Likely wrote: How is the divider controlled? Is it a fixed property of the SoC? Yes. The divider is either 1, 2, or 3, and the only way to know which one it is is to look up the specific SOC model number. And depending on the SOC model, there may also be a register that needs to be looked up. a shared register setting? or a register setting within the i2c device? The I2C device itself has no idea what the divider is. It only sees the result of the divider. Then that absolutely suggests to me that either the final clock or the divider should be encoded in the i2c node; not in the soc node. Isn't there a single global divider that generates all the i2c source clocks? You don't want to copy a global value into each i2c node. Aren't we talking about the /2 or /3 or /1 divider that appears to be randomly implemented on various members of the mpc8xxx family? g. -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Board level compatibility matching
On 7/31/08, Grant Likely [EMAIL PROTECTED] wrote: On Thu, Jul 31, 2008 at 04:49:49PM -0400, Jon Smirl wrote: On 7/31/08, Grant Likely [EMAIL PROTECTED] wrote: This topic keeps coming up, so it is probably time to address it once and for all. When it comes to machine level support in arch/powerpc, there seems to me that there are two levels or machine support. .. Thoughts? g. As part of this, how can we going to solve the problem with triggering the load of a board specific machine/fabric driver in a generic way? That really is a separate problem. We *could* do this with a board specific powerpc machine driver, but I don't think it is the best solution. I'm still thinking that the drivers module_init() function could check the top level board model property and decide whether or not to load based on that. You're assuming the driver is compiled in. If the drivers are on initrd selection has to happen via the normal device/driver matching process. Search for a device in the alias table of the drive file. -- Jon Smirl [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Board level compatibility matching
On Thu, Jul 31, 2008 at 02:19:57PM -0600, Grant Likely wrote: - Add a property to the device tree that explicitly specifies the SoC that the board is based on. Something like 'soc-model = fsl,mpc5200b' would be appropriate. Shouldn't that go in the compatible property of the soc node? - Prioritize board ports in the arch/powerpc/platforms directory to identify level-1 machines support from the level-2 ones. Make sure that level-1 stuff always gets probed before level-2 stuff within each SoC family. In all likelyhood, this would probably just involve making sure that board specific machines get linked in before the catchall machine. I don't think we're too far away from being able to have a catch-all that isn't even soc-type-specific -- the main things that come to mind are instantiating PCI controllers (should be easily fixed) and finding the root IRQ controller (I suppose we could pick a random interrupt controller and follow the interrupt tree to its root, though it'd be more robust if it were explicitly identified in the device tree). -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev