RE: [PATCH] [v2] power/fsl: add MDIO dt binding for FMan

2014-12-18 Thread Shaohui Xie
> -Original Message-
> From: Wood Scott-B07421
> Sent: Friday, December 19, 2014 6:01 AM
> To: Xie Shaohui-B21989
> Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org; Medve Emilian-
> EMMEDVE1; Liberman Igal-B31950
> Subject: Re: [PATCH] [v2] power/fsl: add MDIO dt binding for FMan
> 
> On Thu, 2014-12-18 at 06:53 -0600, Xie Shaohui-B21989 wrote:
> > Ping.
> >
> > Best Regards,
> > Shaohui Xie
> 
> I can't put patches in my -next until the merge window closes.
> 
> > > > > +EXAMPLE
> > > > > +
> > > > > +Example for FMan v2 external MDIO:
> > > > > +
> > > > > +mdio@f1000 {
> > > > > + compatible = "fsl,fman-xmdio";
> > > > > + reg = <0xf1000 0x1000>;
> > > > > + bus-frequency = <2>;
> > > > > +};
> > > >
> > > > So the bus frequency is only 20 KHz?  Or is the unit supposed to
> > > > be something other than Hz?
> > > [S.H] it's only an example, it could be different on real SoCs, but
> > > they always lower than the standard one, The standard one is 2.5MHz, I 
> > > have
> to use Hz for it.
> 
> Is there any SoC for which 20 kHz is the right frequency?  I just want to make
> sure the example is realistic.
[S.H] the clock divider has a limitation that the MAX value it can get on Fman 
v2 is 255 (0xff, 8 bits),
On Fman v3 is 511(0x1ff, 9 bits).

So the lowest frequency on Fman v2 is: Fman_clock / (2 * 255),
On Fman v3 is: Fman_clock / ((2 * 511) + 1).

Take default Fman frequency setting from SDK1.7 as example, the lowest clock 
used for Fman v2 is 581MHz,
The lowest clock for Fman v3 is 600MHz.

Then the lowest bus frequency can get is:
Fman v2: ~1140KHz
Fman v3: ~587KHz

20KHz is not practice, we don't have a suggested value in errata document.
For this example, should I post a new version with a value like 1200KHz?

Thanks!
Shaohui
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3] powerpc, powernv: Add OPAL platform event driver

2014-12-18 Thread Jeremy Kerr
Hi Anshuman,

> This patch creates a new OPAL platform event character driver
> which will give userspace clients the access to these events
> and process them effectively. Following platforms events are
> currently supported with this platform driver.
> 
>   (1) Environmental and Power Warning (EPOW)
>   (2) Delayed Power Off (DPO)
> 
> The user interface for this driver is /dev/opal_event character
> device file where the user space clients can poll and read for
> new opal platform events. The expected sequence of events driven
> from user space should be like the following.
> 
>   (1) Open the character device file
>   (2) Poll on the file for POLLIN event
>   (3) When unblocked, must attempt to read PLAT_EVENT_MAX_SIZE size
>   (4) Kernel driver will pass at most one opal_plat_event structure
>   (5) Poll again for more new events

Why is the poll() required? Can't read() just block if there are no
events (and !O_NONBLOCK, of course).

> @@ -281,6 +282,7 @@ enum OpalMessageType {
>   OPAL_MSG_EPOW,
>   OPAL_MSG_SHUTDOWN,
>   OPAL_MSG_HMI_EVT,
> + OPAL_MSG_DPO,
>   OPAL_MSG_TYPE_MAX,
>  };

I'd suggest adding the DPO interface definitions as a separate change,
but that's no huge deal.

>  
> @@ -452,6 +454,46 @@ struct opal_msg {
>   __be64 params[8];
>  };
>  
> +/*
> + * EPOW status sharing (OPAL and the host)
> + *
> + * The host will pass on OPAL, a buffer of length OPAL_SYSEPOW_MAX
> + * with individual elements being 16 bits wide to fetch the system
> + * wide EPOW status. Each element in the buffer will contain the
> + * EPOW status in it's bit representation for a particular EPOW sub
> + * class as defiend here. So multiple detailed EPOW status bits
> + * specific for any sub class can be represented in a single buffer
> + * element as it's bit representation.
> + */

Why do we have this in separate u16s? This just results in a really
sparse bitmask.

I guess the FW interface is already defined though, right?

[Minor grammar nit: "it's" is short for "it is", which isn't what you
want here :)]

> -int64_t opal_get_epow_status(__be64 *status);
> +int64_t opal_get_epow_status(uint16_t *status, uint16_t *length);

Please keep the endian annotations there.

> +int64_t opal_get_dpo_status(int64_t *dpo_timeout);

and __be64 * here.

> +
> +/* EPOW classification */
> +enum epow_condition {
> + EPOW_SYSPOWER_CHNG  = 0,/* Power */
> + EPOW_SYSPOWER_FAIL  = 1,
> + EPOW_SYSPOWER_INCL  = 2,
> + EPOW_SYSPOWER_UPS   = 3,
> + EPOW_SYSTEMP_AMB= 4,/* Temperature */
> + EPOW_SYSTEMP_INT= 5,
> + EPOW_SYSTEMP_HMD= 6,
> + EPOW_SYSCOOL_INSF   = 7,/* Cooling */
> + EPOW_MAX = 8,
> +};
> +
> +/* OPAL EPOW event */
> +struct epow_event {
> + __u64   epow[EPOW_MAX]; /* Detailed system EPOW status */

Why an array of u64s when you're only using one bit of each? Do we
expect other status details to be set in each of these?

If not, you're converting from one sparse bitmask (the OPAL
interface) to another different sparse bitmask (the kernel interface).
Either keep them the same, or use a more sensible format.

Also, are you expecting a single epow_event to have multiple conditons
set? If not, no need for a bitmask at all.

> + __u64   timeout;/* Timeout to shutdown in secs */
> +};
> +
> +/* OPAL DPO event */
> +struct dpo_event {
> + __u64   orig_timeout;   /* Platform provided timeout in secs */
> + __u64   remain_timeout; /* Timeout to shutdown in secs */
> +};

Will all of these events have a timeout? If so, just put a timeout in
struct opal_plat_event.

Why does the DPO timeout have orig/remain timeouts, but not the EPOW one?

See below, but perhaps we want a absolute timeout here, instead of
relative? (which is essentially invalid as soon as you've read it).

> +
> +/* OPAL event */
> +struct opal_plat_event {
> + __u32   type;   /* Type of OPAL platform event */
> +#define OPAL_PLAT_EVENT_TYPE_EPOW0
> +#define OPAL_PLAT_EVENT_TYPE_DPO 1
> +#define OPAL_PLAT_EVENT_TYPE_MAX 2
> + __u32   size;   /* Size of OPAL platform event */
> + union {
> + struct epow_event epow; /* EPOW platform event */
> + struct dpo_event  dpo;  /* DPO platform event */
> + };
> +};
> +
> +/*
> + * Suggested read size
> + *
> + * The user space client should attempt to read OPAL_PLAT_EVENT_READ_SIZE
> + * amount of data from the character device file '/dev/opal_event' at any
> + * point of time. The kernel driver will pass an entire opal_plat_event
> + * structure in every read. This ensures that minium data the user space
> + * client gets from the kernel is one opal_plat_event structure.
> + */
> +#define  PLAT_EVENT_MAX_SIZE 4096

-> OPAL_PLAT_EVENT_MAX_SIZE - this it a global header.

> +/*
> + * opal_event_read
> + *
> + * User client needs to attempt

net: ucc: tbi phy detection broken by 058112c7efc9ef43bb511c137293dddbe6e42908

2014-12-18 Thread Lennart Sorensen
I have been trying to move an 8360 based system from a 3.0 kernel to a
3.12 (on the way to 3.14 with ipipe/xenomai) kernel and encountered an
oops in the ucc_geth driver when using RTBI mode on one of the ucc
ports.  I haven't managed to find any commits to of_mdio or ucc_geth or
fsl_pq_mdio that would appear to address this problem, so I believe it
is still present in the latest kernel, but have not confirmed that with
testing yet.

Commit 058112c7efc9ef43bb511c137293dddbe6e42908 appears to have broken
ucc support for tbi phy detection.

With the patch in place, I am unable to get the mdio bus to create phy
devices for the tbi phy in the ucc on an 8360e, and the ucc_geth driver
causes a kernel oops, while with the patch reverted, it does create them
and the driver comes up and works.

The tbi phy is needed when using a ucc in RTBI, TBI or SGMII mode.

I am not convinced that the tbi phy really behaves quite like a real phy,
which may be why get_phy_device does not work with it.  Perhaps there
is a better way to deal with the tbi phy on the ucc for this purpose.

Certainly as it is, this patch has caused a regression though, although
probably not very many systems with ucc ports actually use one of the
affected modes so the damage isn't that great.

-- 
Len Sorensen
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] powerpc/book3s: Fix flush_tlb cpu_spec hook to take a generic argument.

2014-12-18 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

The flush_tlb hook in cpu_spec was introduced as a generic function hook
to invalidate TLBs. But the current implementation of flush_tlb hook
takes IS (invalidation selector) as an argument which is architecture
dependent. Hence, It is not right to have a generic routine where caller
has to pass non-generic argument.

This patch fixes this and makes flush_tlb hook as high level API.

Reported-by: Benjamin Herrenschmidt 
Signed-off-by: Mahesh Salgaonkar 
Signed-off-by: Michael Ellerman 
---
Changes in V3:
- Changed TLB flush action #defines into enum.

Changes in V2:
- Moved TLB flush action #defines to cputable.h
- Added missing ptesyncs to before and after TLB flush.
- Moved switch case to flush_tlb_206() function as suggested by
  Michael Ellerman.
---
 arch/powerpc/include/asm/cputable.h   |8 -
 arch/powerpc/include/asm/mmu-hash64.h |1 +
 arch/powerpc/kernel/cpu_setup_power.S |   10 +-
 arch/powerpc/kernel/cputable.c|4 +-
 arch/powerpc/kernel/mce_power.c   |   53 -
 arch/powerpc/kvm/book3s_hv_ras.c  |4 +-
 6 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index daa5af9..493af2d 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -100,7 +100,7 @@ struct cpu_spec {
/*
 * Processor specific routine to flush tlbs.
 */
-   void(*flush_tlb)(unsigned long inval_selector);
+   void(*flush_tlb)(unsigned int action);
 
 };
 
@@ -114,6 +114,12 @@ extern void do_feature_fixups(unsigned long value, void 
*fixup_start,
 
 extern const char *powerpc_base_platform;
 
+/* TLB flush actions. Used as argument to cpu_spec.flush_tlb() hook */
+enum {
+   TLB_INVAL_SCOPE_GLOBAL = 0, /* invalidate all TLBs */
+   TLB_INVAL_SCOPE_LPID = 1,   /* invalidate TLBs for current LPID */
+};
+
 #endif /* __ASSEMBLY__ */
 
 /* CPU kernel features */
diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
b/arch/powerpc/include/asm/mmu-hash64.h
index aeebc94..4f50db7 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -112,6 +112,7 @@
 #define TLBIEL_INVAL_SET_SHIFT 12
 
 #define POWER7_TLB_SETS128 /* # sets in POWER7 TLB */
+#define POWER8_TLB_SETS512 /* # sets in POWER8 TLB */
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index 4673353..9c9b741 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -137,15 +137,11 @@ __init_HFSCR:
 /*
  * Clear the TLB using the specified IS form of tlbiel instruction
  * (invalidate by congruence class). P7 has 128 CCs., P8 has 512.
- *
- * r3 = IS field
  */
 __init_tlb_power7:
-   li  r3,0xc00/* IS field = 0b11 */
-_GLOBAL(__flush_tlb_power7)
li  r6,128
mtctr   r6
-   mr  r7,r3   /* IS field */
+   li  r7,0xc00/* IS field = 0b11 */
ptesync
 2: tlbiel  r7
addir7,r7,0x1000
@@ -154,11 +150,9 @@ _GLOBAL(__flush_tlb_power7)
 1: blr
 
 __init_tlb_power8:
-   li  r3,0xc00/* IS field = 0b11 */
-_GLOBAL(__flush_tlb_power8)
li  r6,512
mtctr   r6
-   mr  r7,r3   /* IS field */
+   li  r7,0xc00/* IS field = 0b11 */
ptesync
 2: tlbiel  r7
addir7,r7,0x1000
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 8084059..81c1ba5 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -71,8 +71,8 @@ extern void __restore_cpu_power7(void);
 extern void __setup_cpu_power8(unsigned long offset, struct cpu_spec* spec);
 extern void __restore_cpu_power8(void);
 extern void __restore_cpu_a2(void);
-extern void __flush_tlb_power7(unsigned long inval_selector);
-extern void __flush_tlb_power8(unsigned long inval_selector);
+extern void __flush_tlb_power7(unsigned int action);
+extern void __flush_tlb_power8(unsigned int action);
 extern long __machine_check_early_realmode_p7(struct pt_regs *regs);
 extern long __machine_check_early_realmode_p8(struct pt_regs *regs);
 #endif /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index b6f123a..2c647b1 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -28,6 +28,55 @@
 #include 
 #include 
 
+static void flush_tlb_206(unsigned int num_sets, unsigned int action)
+{
+   unsigned long rb;
+   unsigned int i;
+
+   switch (action) {
+   case TLB_INVAL_SCOPE_GLOBAL:
+   rb = TLBIEL_INVAL_SET;
+   break;
+   case TLB_INVAL_SCOPE_LPID:
+   rb = TLBIEL_INVAL_SET_LPID;
+   break;
+   default:
+  

RE: [RFC PATCH 0/2] powerpc: CR based local atomic operation implementation

2014-12-18 Thread Rusty Russell
David Laight  writes:
> From: Rusty Russell
>> David Laight  writes:
>> > From: Madhavan Srinivasan [mailto:ma...@linux.vnet.ibm.com]
>> > ...
>> >> >>> I also wonder if it is possible to inspect the interrupted
>> >> >>> code to determine the start/end of the RAS block.
>> >> >>> (Easiest if you assume that there is a single 'write' instruction
>> >> >>> as the last entry in the block.)
>> >> >>>
>> >> >> So each local_* function also have code in the __ex_table section. 
>> >> >> IIUC,
>> >> >> __ex_table contains two address. So if the return address found in the
>> >> >> first column of the _ex_table, use the corresponding address in the
>> >> >> second column to continue from.
>> >> >
>> >> > That really doesn't scale.
>> >> > I don't know how many 1000 address pairs you table will have (and the
>> >> > ones in each loadable module), but the search isn't going to be cheap.
>> >> >
>> >> > If these sequences are restartable then they can only have one write
>> >> > to memory.
> ...
>> >> 2) resulting code with lot of condition and branch (for opcode decode)
>> >> will be lot messy and may be an issue incase of maintenance,
>> >
>> > You don't need to decode the instructions.
>> > Just look for the two specific instructions used as markers.
>> > This is only really possible with fixed-size instructions.
>> >
>> > It might also be that the 'interrupt entry' path is easier to
>> > modify than the 'interrupt exit' one (fewer code paths) and
>> > you just need to modify the 'pc' in the stack frame.
>> > You are only interested in interrupts from kernel space.
>> 
>> It's an overoptimization for case that statistically never happens.
>> You won't even be able to measure the difference.
>> 
>> The question of bloat remains, but that's also easily measured.  In
>> practice, I'd guess less than 1k.
>
> IIRC they were 'static inline' so the table of addresses is generated
> for every use site.
> (copyin/out generates a similarly enormous table of addresses on amd64)

There are about 20 callers in the entire kernel.

Cheers,
Rusty.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] [v2] power/fsl: add MDIO dt binding for FMan

2014-12-18 Thread Scott Wood
On Thu, 2014-12-18 at 06:53 -0600, Xie Shaohui-B21989 wrote:
> Ping.
> 
> Best Regards, 
> Shaohui Xie

I can't put patches in my -next until the merge window closes.

> > > > +EXAMPLE
> > > > +
> > > > +Example for FMan v2 external MDIO:
> > > > +
> > > > +mdio@f1000 {
> > > > +   compatible = "fsl,fman-xmdio";
> > > > +   reg = <0xf1000 0x1000>;
> > > > +   bus-frequency = <2>;
> > > > +};
> > >
> > > So the bus frequency is only 20 KHz?  Or is the unit supposed to be
> > > something other than Hz?
> > [S.H] it's only an example, it could be different on real SoCs, but they 
> > always
> > lower than the standard one, The standard one is 2.5MHz, I have to use Hz 
> > for it.

Is there any SoC for which 20 kHz is the right frequency?  I just want
to make sure the example is realistic.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc/kdump: Ignore failure in enabling big endian exception during crash

2014-12-18 Thread Hari Bathini
In LE kernel, we currently have a hack for kexec that resets the exception
endian before starting a new kernel as the kernel that is loaded could be a
big endian or a little endian kernel. In kdump case, resetting exception
endian fails when one or more cpus is disabled. But we can ignore the failure
and still go ahead, as in most cases crashkernel will be of same endianess
as primary kernel and reseting endianess is not even needed in those cases.
This patch adds a new inline function to say if this is kdump path. This
function is used at places where such a check is needed.

Changes from v1:
Instead of skipping, ignore failure in enabling big endian exception
during crash

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/kexec.h   |   10 ++
 arch/powerpc/kernel/machine_kexec_64.c |2 +-
 arch/powerpc/platforms/pseries/lpar.c  |   10 +-
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 19c36cb..0d96d4d 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -86,6 +86,11 @@ extern int overlaps_crashkernel(unsigned long start, 
unsigned long size);
 extern void reserve_crashkernel(void);
 extern void machine_kexec_mask_interrupts(void);
 
+static inline int is_kdump_path(void)
+{
+   return (crashing_cpu >= 0) ? 1 : 0;
+}
+
 #else /* !CONFIG_KEXEC */
 static inline void crash_kexec_secondary(struct pt_regs *regs) { }
 
@@ -106,6 +111,11 @@ static inline int 
crash_shutdown_unregister(crash_shutdown_t handler)
return 0;
 }
 
+static inline int is_kdump_path(void)
+{
+   return 0;
+}
+
 #endif /* CONFIG_KEXEC */
 #endif /* ! __ASSEMBLY__ */
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 879b3aa..b4fe804 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -330,7 +330,7 @@ void default_machine_kexec(struct kimage *image)
 * using debugger IPI.
 */
 
-   if (crashing_cpu == -1)
+   if (!is_kdump_path())
kexec_prepare_cpus();
 
pr_debug("kexec: Starting switchover sequence.\n");
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 469751d..63214fa 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "pseries.h"
@@ -257,6 +258,7 @@ static void pSeries_lpar_hptab_clear(void)
 *
 * This is also called on boot when a fadump happens. In that case we
 * must not change the exception endian mode.
+*
 */
if (firmware_has_feature(FW_FEATURE_SET_MODE) && !is_fadump_active()) {
long rc;
@@ -267,8 +269,14 @@ static void pSeries_lpar_hptab_clear(void)
 * out to the user, but at least this will stop us from
 * continuing on further and creating an even more
 * difficult to debug situation.
+*
+* But if we reaching here after a crash, no point panicking.
+* Also, in kdump path, resetting endianess may not be needed
+* as the crashkernel most of the times is of same endianess
+* as primary kernel. So, let's ignore the failure and try
+* kdump'ing anyway.
 */
-   if (rc)
+   if (rc && !is_kdump_path())
panic("Could not enable big endian exceptions");
}
 #endif

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: Don't use local named register variable in current_thread_info

2014-12-18 Thread Alexander Graf


On 18.12.14 07:25, Anton Blanchard wrote:
> On Thu, 18 Dec 2014 16:11:54 +1100
> Michael Ellerman  wrote:
> 
>> On Wed, 2014-12-17 at 02:16 +0100, Alexander Graf wrote:
>>> On 31.10.14 04:47, Anton Blanchard wrote:
 LLVM doesn't support local named register variables and is
 unlikely to. current_thread_info is using one, fix it by moving
 it out and calling it __current_r1().

 I gave it a bit of an obscure name because we don't want anyone
 else using it - they should use current_stack_pointer(). This
 specific case is performance critical and we can't afford to call
 a function to get it. Furthermore it isn't important to know
 exactly where in the stack we are since we mask the lower bits.

 Signed-off-by: Anton Blanchard 
>>>
>>> Git bisect managed to point me to this commit as the offender for
>>> OOPSes on e5500 and e6500 (and maybe the G4 as well, not sure).
>>>
>>> Doing a git revert of this commit on top of linus/master makes
>>> things work fine for me again.
>>>
>>>
>>> Alex
>>>
>>> Oops: Kernel access of bad area, sig: 11 [#2]
>>> SMP NR_CPUS=16 CoreNet Generic
>>> Modules linked in:
>>> CPU: 1 PID: 339 Comm: kworker/1:1 Tainted: G  D
>>> 3.18.0-09423-g988adfd #1
>>> Workqueue: rpciod .rpc_async_schedule
>>> task: c001f6397500 ti: c001f6638000 task.ti:
>>> c001f6638000 NIP: c04817a4 LR: c04817a4 CTR:
>>>  REGS: c001f663b0e0 TRAP: 0300   Tainted:
>>> G  D (3.18.0-09423-g988adfd)
>>> MSR: 80029000   CR: 24ad2e42  XER: 
>>> DEAR: 202031303438355f ESR:  SOFTE: 1
>>   = r9 + 40
>>
>>> GPR00: c04817a4 c001f663b360 c0988028
>>> 7f24333d GPR04: 5ff5738c1f2ebfb1 
>>>  08f8 GPR08: c0480ae8
>>> 2020313034383537 36204b4220617320 6469726563740a31 GPR12:
>>> 3937302d30312d30 cfff8780 c007f988 c001f64c1600
>>
>> GPRs 9-12 say: " 1048576 KB as direct\n1970-01-0"
>>
>> Which is rarely a good sign :)
>>
>> Looks like it might be part of your dmesg from setup_page_sizes().
>>
>>> GPR16:   
>>> 05dc GPR20: c09b8028 c0007e034200
>>> 0548 c000 GPR24: c001f663b4b0
>>> b225831e  0080 GPR28:
>>> 0548 08f8 0548 0094
>>> NIP [c04817a4] .__skb_checksum+0x194/0x378 LR
>>> [c04817a4] .__skb_checksum+0x194/0x378 Call Trace:
>>> [c001f663b360] [c04817a4] .__skb_checksum+0x194/0x378
>>> (unreliable)
>>> [c001f663b440] [c04819b4] .skb_checksum+0x2c/0x3c
>>> [c001f663b4c0] [c04fd0a8] .udp4_hwcsum+0xa8/0x16c
>>> [c001f663b560] [c04fd440] .udp_send_skb+0x2d4/0x370
>>> [c001f663b600]
>>> [c04fd51c] .udp_push_pending_frames+0x40/0x94
>>> [c001f663b680] [c04fec08] .udp_sendpage+0x150/0x1b4
>>> [c001f663b770] [c050ae54] .inet_sendpage+0xa0/0x120
>>> [c001f663b810] [c059c8cc] .xs_sendpages+0x2d0/0x30c
>>> [c001f663b8d0]
>>> [c059cae4] .xs_udp_send_request+0x58/0x120
>>> [c001f663b970] [c0598f04] .xprt_transmit+0x80/0x36c
>>> [c001f663ba20] [c05942d8] .call_transmit+0x19c/0x254
>>> [c001f663bab0] [c059ff64] .__rpc_execute+0xbc/0x3c0
>>> [c001f663bb90] [c00797f8] .process_one_work+0x1c0/0x474
>>> [c001f663bc40] [c007a518] .worker_thread+0x17c/0x54c
>>> [c001f663bd30] [c007fa8c] .kthread+0x104/0x124
>>> [c001f663be30]
>>> [c884] .ret_from_kernel_thread+0x58/0xd4 Instruction
>>> dump: 7d1f3a14 7c6a1850 e958 7fbd4050 786334e4 e90a
>>> 7c63ba14 f8490028 7c63ea14 7d0903a6 e84a0008 4e800421 
>>> 7c641b78 78270464 e9580008
>>
>> Which is:
>>
>> add  r8, r31, r7
>> subf r3, r10, r3
>> ld  r10, 0(r24)
>> subfr29, r29, r8
>> rldicr   r3, r3, 6, 51
>> ld   r8, 0(r10)
>> add  r3, r3, r23
>> std  r2, 40(r9)
>> add  r3, r3, r29
>> mtctrr8
>> ld   r2, 8(r10)
>> bctrl
>> ld   r2, 40(r9)  <---
>> mr   r4, r3
>> rldicr   r7, r1, 0, 49
>> ld  r10, 8(r24)
>>
>>
>> Which looks a bit odd. I'd expect us to be saving/restoring r2 to the
>> stack, though maybe r9 was pointing at the stack?
> 
> Nice catch! This looks like a compiler bug.
> 
>> Looking at your vmlinux.broken I don't see the same code gen.
> 
> For whatever reason we ended up with r10 this time:
> 
> 7c 2a 0b 78 mr  r10,r1
> ...
> f8 4a 00 28 std r2,40(r10)
> 7c 63 ba 14 add r3,r3,r23
> 7c e9 03 a6 mtctr   r7
> 7c 63 ea 14 add r3,r3,r29
> 38 a0 00 00 li  r5,0
> e8 48 00 08 ld  r2,8(r8)
> 4e 80 04 21 bctrl
> e8 4a 00 28 ld  r2,40(r10)
> 
> The indirect function call is allowed to clobber r10, gcc is doing
> something very wrong here.

Yeah, I couldn't see

Re: [PATCH 1/3] powerpc: Don't use local named register variable in current_thread_info

2014-12-18 Thread Alexander Graf


On 18.12.14 06:11, Michael Ellerman wrote:
> On Wed, 2014-12-17 at 02:16 +0100, Alexander Graf wrote:
>> On 31.10.14 04:47, Anton Blanchard wrote:
>>> LLVM doesn't support local named register variables and is unlikely
>>> to. current_thread_info is using one, fix it by moving it out and
>>> calling it __current_r1().
>>>
>>> I gave it a bit of an obscure name because we don't want anyone else
>>> using it - they should use current_stack_pointer(). This specific
>>> case is performance critical and we can't afford to call a function
>>> to get it. Furthermore it isn't important to know exactly where in
>>> the stack we are since we mask the lower bits.
>>>
>>> Signed-off-by: Anton Blanchard 
>>
>> Git bisect managed to point me to this commit as the offender for OOPSes
>> on e5500 and e6500 (and maybe the G4 as well, not sure).
>>
>> Doing a git revert of this commit on top of linus/master makes things
>> work fine for me again.
>>
>>
>> Alex
>>
>> Oops: Kernel access of bad area, sig: 11 [#2]
>> SMP NR_CPUS=16 CoreNet Generic
>> Modules linked in:
>> CPU: 1 PID: 339 Comm: kworker/1:1 Tainted: G  D
>> 3.18.0-09423-g988adfd #1
>> Workqueue: rpciod .rpc_async_schedule
>> task: c001f6397500 ti: c001f6638000 task.ti: c001f6638000
>> NIP: c04817a4 LR: c04817a4 CTR: 
>> REGS: c001f663b0e0 TRAP: 0300   Tainted: G  D
>> (3.18.0-09423-g988adfd)
>> MSR: 80029000   CR: 24ad2e42  XER: 
>> DEAR: 202031303438355f ESR:  SOFTE: 1
>   = r9 + 40
> 
>> GPR00: c04817a4 c001f663b360 c0988028 7f24333d
>> GPR04: 5ff5738c1f2ebfb1   08f8
>> GPR08: c0480ae8 2020313034383537 36204b4220617320 6469726563740a31
>> GPR12: 3937302d30312d30 cfff8780 c007f988 c001f64c1600
> 
> GPRs 9-12 say: " 1048576 KB as direct\n1970-01-0"
> 
> Which is rarely a good sign :)
> 
> Looks like it might be part of your dmesg from setup_page_sizes().
> 
>> GPR16:    05dc
>> GPR20: c09b8028 c0007e034200 0548 c000
>> GPR24: c001f663b4b0 b225831e  0080
>> GPR28: 0548 08f8 0548 0094
>> NIP [c04817a4] .__skb_checksum+0x194/0x378
>> LR [c04817a4] .__skb_checksum+0x194/0x378
>> Call Trace:
>> [c001f663b360] [c04817a4] .__skb_checksum+0x194/0x378
>> (unreliable)
>> [c001f663b440] [c04819b4] .skb_checksum+0x2c/0x3c
>> [c001f663b4c0] [c04fd0a8] .udp4_hwcsum+0xa8/0x16c
>> [c001f663b560] [c04fd440] .udp_send_skb+0x2d4/0x370
>> [c001f663b600] [c04fd51c] .udp_push_pending_frames+0x40/0x94
>> [c001f663b680] [c04fec08] .udp_sendpage+0x150/0x1b4
>> [c001f663b770] [c050ae54] .inet_sendpage+0xa0/0x120
>> [c001f663b810] [c059c8cc] .xs_sendpages+0x2d0/0x30c
>> [c001f663b8d0] [c059cae4] .xs_udp_send_request+0x58/0x120
>> [c001f663b970] [c0598f04] .xprt_transmit+0x80/0x36c
>> [c001f663ba20] [c05942d8] .call_transmit+0x19c/0x254
>> [c001f663bab0] [c059ff64] .__rpc_execute+0xbc/0x3c0
>> [c001f663bb90] [c00797f8] .process_one_work+0x1c0/0x474
>> [c001f663bc40] [c007a518] .worker_thread+0x17c/0x54c
>> [c001f663bd30] [c007fa8c] .kthread+0x104/0x124
>> [c001f663be30] [c884] .ret_from_kernel_thread+0x58/0xd4
>> Instruction dump:
>> 7d1f3a14 7c6a1850 e958 7fbd4050 786334e4 e90a 7c63ba14 f8490028
>> 7c63ea14 7d0903a6 e84a0008 4e800421  7c641b78 78270464 e9580008
> 
> Which is:
> 
> add  r8, r31, r7
> subf r3, r10, r3
> ld  r10, 0(r24)
> subfr29, r29, r8
> rldicr   r3, r3, 6, 51
> ld   r8, 0(r10)
> add  r3, r3, r23
> std  r2, 40(r9)
> add  r3, r3, r29
> mtctrr8
> ld   r2, 8(r10)
> bctrl
> ld   r2, 40(r9)   <---
> mr   r4, r3
> rldicr   r7, r1, 0, 49
> ld  r10, 8(r24)
> 
> 
> Which looks a bit odd. I'd expect us to be saving/restoring r2 to the stack,
> though maybe r9 was pointing at the stack?
> 
> Looking at your vmlinux.broken I don't see the same code gen.
> 
> Can you get an oops from a kernel and upload the exact binary? Or just post us
> the full code dump of __skb_checksum() (or wherever it oopses).

Ugh, sorry - I must've copied the wrong one. The serial output below is
from the uImage that (hopefully) is belongs to the vmlinux.broken:

  http://csgraf.de/agraf/current_thread_info/dmesg.txt


Alex
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] [v2] power/fsl: add MDIO dt binding for FMan

2014-12-18 Thread Shaohui Xie
Ping.

Best Regards, 
Shaohui Xie
> -Original Message-
> From: Xie Shaohui-B21989
> Sent: Wednesday, November 26, 2014 10:11 AM
> To: Wood Scott-B07421
> Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org; Medve Emilian-
> EMMEDVE1; Liberman Igal-B31950
> Subject: RE: [PATCH] [v2] power/fsl: add MDIO dt binding for FMan
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Wednesday, November 26, 2014 9:50 AM
> > To: shh@gmail.com
> > Cc: linuxppc-dev@lists.ozlabs.org; devicet...@vger.kernel.org; Medve
> > Emilian-EMMEDVE1; Liberman Igal-B31950; Xie Shaohui-B21989
> > Subject: Re: [PATCH] [v2] power/fsl: add MDIO dt binding for FMan
> >
> > On Fri, 2014-11-14 at 17:53 +0800, shh@gmail.com wrote:
> > > From: Shaohui Xie 
> > >
> > > This binding is for FMan MDIO, it covers FMan v2 & FMan v3.
> > >
> > > Signed-off-by: Shaohui Xie 
> > > ---
> > > changes in V2:
> > > addressed comments from Scott in V1.
> > >
> > >  .../devicetree/bindings/powerpc/fsl/fman.txt   | 69
> > ++
> > >  1 file changed, 69 insertions(+)
> > >
> > > diff --git a/Documentation/devicetree/bindings/powerpc/fsl/fman.txt
> > > b/Documentation/devicetree/bindings/powerpc/fsl/fman.txt
> > > index edeea16..1523a87 100644
> > > --- a/Documentation/devicetree/bindings/powerpc/fsl/fman.txt
> > > +++ b/Documentation/devicetree/bindings/powerpc/fsl/fman.txt
> > > @@ -7,6 +7,7 @@ CONTENTS
> > >- FMan MURAM Node
> > >- FMan dTSEC/XGEC/mEMAC Node
> > >- FMan IEEE 1588 Node
> > > +  - FMan MDIO Node
> > >- Example
> > >
> > >
> > > 
> > > == === @@ -357,6 +358,68 @@ ptp-timer@fe000 {  };
> > >
> > >
> > > 
> > > ==
> > > ===
> > > +FMan MDIO Node
> > > +
> > > +DESCRIPTION
> > > +
> > > +The MDIO is a bus to which the PHY devices are connected.
> > > +
> > > +PROPERTIES
> > > +
> > > +- compatible
> > > + Usage: required
> > > + Value type: 
> > > + Definition: A standard property.
> > > + Must include "fsl,fman-mdio" for 1 Gb/s MDIO from FMan v2.
> > > + Must include "fsl,fman-xmdio" for 10 Gb/s MDIO from FMan v2.
> > > + Must include "fsl,fman-memac-mdio" for 1/10 Gb/s MDIO from
> > > + FMan v3.
> > > +
> > > +- reg
> > > + Usage: required
> > > + Value type: 
> > > + Definition: A standard property.
> > > +
> > > +- bus-frequency
> > > + Usage: optional
> > > + Value type: 
> > > + Definition: Specifies external MDIO bus clock speed which is
> > > + different from MDIO standard 2.5MHz. Should be defined for
> > SoCs
> > > + on which the standard one cannot work.
> > > +
> > > +- interrupts
> > > + Usage: optional
> > > + Value type: 
> > > + Definition: Event interrupt of external MDIO controller.
> > > + 1 Gb/s MDIO and 10 Gb/s MDIO has one interrupt respectively.
> > > +
> > > +- fsl,fman-internal-mdio
> > > + Usage: required for internal MDIO
> > > + Value type: boolean
> > > + Definition: Fman has internal MDIO for internal PCS(Physical
> > > + Coding Sublayer) PHYs and external MDIO for external PHYs.
> > > + The settings and programming routines for internal/external
> > > + MDIO are different. Must be included for internal MDIO.
> > > +
> > > +EXAMPLE
> > > +
> > > +Example for FMan v2 external MDIO:
> > > +
> > > +mdio@f1000 {
> > > + compatible = "fsl,fman-xmdio";
> > > + reg = <0xf1000 0x1000>;
> > > + bus-frequency = <2>;
> > > +};
> >
> > So the bus frequency is only 20 KHz?  Or is the unit supposed to be
> > something other than Hz?
> [S.H] it's only an example, it could be different on real SoCs, but they 
> always
> lower than the standard one, The standard one is 2.5MHz, I have to use Hz for 
> it.
> 
> Thanks!
> Shaohui
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2 PATCH 1/2] powerpc32: adds handling of _PAGE_RO

2014-12-18 Thread Scott Wood
On Thu, 2014-12-18 at 08:11 +0100, leroy christophe wrote:
> Le 18/12/2014 03:14, Scott Wood a écrit :
> > On Wed, 2014-12-17 at 10:14 +0100, Christophe Leroy wrote:
> >> Some powerpc like the 8xx don't have a RW bit in PTE bits but a RO (Read 
> >> Only) bit.
> >> This patch implements the handling of a _PAGE_RO flag to be used in place 
> >> of _PAGE_RW
> >>
> >> Signed-off-by: Christophe Leroy 
> >>
> >> ---
> >> v2 is a complete rework compared to v1
> >>
> >>   arch/powerpc/include/asm/pgtable-ppc32.h | 11 ++-
> >>   arch/powerpc/include/asm/pgtable.h   | 10 +++---
> >>   arch/powerpc/include/asm/pte-common.h| 27 ++-
> >>   arch/powerpc/mm/gup.c|  2 ++
> >>   arch/powerpc/mm/mem.c|  2 +-
> >>   arch/powerpc/mm/pgtable_32.c | 24 
> >>   6 files changed, 54 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/arch/powerpc/include/asm/pgtable-ppc32.h 
> >> b/arch/powerpc/include/asm/pgtable-ppc32.h
> >> index 543bb8e..64ed9e1 100644
> >> --- a/arch/powerpc/include/asm/pgtable-ppc32.h
> >> +++ b/arch/powerpc/include/asm/pgtable-ppc32.h
> >> @@ -125,7 +125,7 @@ extern int icache_44x_need_flush;
> >>   #ifndef __ASSEMBLY__
> >>   
> >>   #define pte_clear(mm, addr, ptep) \
> >> -  do { pte_update(ptep, ~_PAGE_HASHPTE, 0); } while (0)
> >> +  do { pte_update(ptep, ~_PAGE_HASHPTE, _PAGE_RO); } while (0)
> > Is this really necessary?  It's already clearing the valid bit.
> >
> > Likewise in several other places that set or check for _PAGE_RO on pages
> > for which no access is permitted.
> >
> >> @@ -287,8 +287,9 @@ static inline void huge_ptep_set_wrprotect(struct 
> >> mm_struct *mm,
> >>   static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
> >>   {
> >>unsigned long bits = pte_val(entry) &
> >> -  (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
> >> -  pte_update(ptep, 0, bits);
> >> +  (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_RO |
> >> +   _PAGE_EXEC);
> >> +  pte_update(ptep, _PAGE_RO, bits);
> >>   }
> > You're unconditionally clearing _PAGE_RO, and apparently relying on the
> > undocumented behavior of pte_update() to clear "clr" before setting
> > "set".
> >
> > Instead I'd write this as:
> >
> > unsigned long set = pte_val(entry) &
> > (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
> > unsigned long clr = pte_val(entry) & _PAGE_RO;
> Don't you mean ?
> 
>   unsigned long clr = ~pte_val(entry) & _PAGE_RO;
> 
> Because, we want to clear _PAGE_RO when _PAGE_RO is not set in entry.

Yes, sorry.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [RFC PATCH 0/2] powerpc: CR based local atomic operation implementation

2014-12-18 Thread David Laight
From: Rusty Russell
> David Laight  writes:
> > From: Madhavan Srinivasan [mailto:ma...@linux.vnet.ibm.com]
> > ...
> >> >>> I also wonder if it is possible to inspect the interrupted
> >> >>> code to determine the start/end of the RAS block.
> >> >>> (Easiest if you assume that there is a single 'write' instruction
> >> >>> as the last entry in the block.)
> >> >>>
> >> >> So each local_* function also have code in the __ex_table section. IIUC,
> >> >> __ex_table contains two address. So if the return address found in the
> >> >> first column of the _ex_table, use the corresponding address in the
> >> >> second column to continue from.
> >> >
> >> > That really doesn't scale.
> >> > I don't know how many 1000 address pairs you table will have (and the
> >> > ones in each loadable module), but the search isn't going to be cheap.
> >> >
> >> > If these sequences are restartable then they can only have one write
> >> > to memory.
...
> >> 2) resulting code with lot of condition and branch (for opcode decode)
> >> will be lot messy and may be an issue incase of maintenance,
> >
> > You don't need to decode the instructions.
> > Just look for the two specific instructions used as markers.
> > This is only really possible with fixed-size instructions.
> >
> > It might also be that the 'interrupt entry' path is easier to
> > modify than the 'interrupt exit' one (fewer code paths) and
> > you just need to modify the 'pc' in the stack frame.
> > You are only interested in interrupts from kernel space.
> 
> It's an overoptimization for case that statistically never happens.
> You won't even be able to measure the difference.
> 
> The question of bloat remains, but that's also easily measured.  In
> practice, I'd guess less than 1k.

IIRC they were 'static inline' so the table of addresses is generated
for every use site.
(copyin/out generates a similarly enormous table of addresses on amd64)


If they were real functions (so only appeared once) it wouldn't be as bad.
Indeed, in that case, by putting all such functions into a separate code
section a simple 'window test' can be done on the return address instead
of reserving one of the CR bits.

You also only need to save the start and end of each block, not the
restart address for every instruction within the block.

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev