Re: [PATCH] gianfar: Fix TX ring processing on SMP machines

2010-03-04 Thread David Miller
From: Anton Vorontsov avoront...@ru.mvista.com
Date: Wed, 3 Mar 2010 21:18:58 +0300

 Starting with commit a3bc1f11e9b867a4f49505 (gianfar: Revive SKB
 recycling) gianfar driver sooner or later stops transmitting any
 packets on SMP machines.
 
 start_xmit() prepares new skb for transmitting, generally it does
 three things:
 
 1. sets up all BDs (marks them ready to send), except the first one.
 2. stores skb into tx_queue-tx_skbuff so that clean_tx_ring()
would cleanup it later.
 3. sets up the first BD, i.e. marks it ready.
 
 Here is what clean_tx_ring() does:
 
 1. reads skbs from tx_queue-tx_skbuff
 2. checks if the *last* BD is ready. If it's still ready [to send]
then it it isn't transmitted, so clean_tx_ring() returns.
Otherwise it actually cleanups BDs. All is OK.
 
 Now, if there is just one BD, code flow:
 
 - start_xmit(): stores skb into tx_skbuff. Note that the first BD
   (which is also the last one) isn't marked as ready, yet.
 - clean_tx_ring(): sees that skb is not null, *and* its lstatus
   says that it is NOT ready (like if BD was sent), so it cleans
   it up (bad!)
 - start_xmit(): marks BD as ready [to send], but it's too late.
 
 We can fix this simply by reordering lstatus/tx_skbuff writes.
 
 Reported-by: Martyn Welch martyn.we...@ge.com
 Bisected-by: Paul Gortmaker paul.gortma...@windriver.com
 Signed-off-by: Anton Vorontsov avoront...@ru.mvista.com
 Tested-by: Paul Gortmaker paul.gortma...@windriver.com
 Tested-by: Martyn Welch martyn.we...@ge.com

Applied.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [Patch v.2] mpc5200b/uart: improve baud rate calculation (reach high baud rates, better accuracy)

2010-03-04 Thread Albrecht Dre�
Hi Grant:

Thanks a lot for your input!

[snip]
 Save yourself some duplicated code here.  The above 14 lines can be
 shared between the 512x, 52xx and 5200b versions.  Create yourself an
 internal __mpc5xxx_psc_set_divisor() function that is passed the *psc,
 the divisor, and the clock select register setting (both the 5200 and
 the 5121 have the clock select register).

Hmm, yes, that's true.  Will look into that.

[snip]
  @@ -604,7 +676,6 @@ mpc52xx_uart_set_termios(struct uart_por
 
         baud = uart_get_baud_rate(port, new, old, 0, port-uartclk/16);
 
 I'm probably nitpicking, because I don't know if the io pin will
 handle this speed but uartclk/16 is no longer the maximum baudrate if
 a /4 prescaler is used.

Yes, you are right. Must of course be fixed.

[snip]
  @@ -635,8 +706,7 @@ mpc52xx_uart_set_termios(struct uart_por
         out_8(psc-command, MPC52xx_PSC_SEL_MODE_REG_1);
         out_8(psc-mode, mr1);
         out_8(psc-mode, mr2);
  -       out_8(psc-ctur, ctr  8);
  -       out_8(psc-ctlr, ctr  0xff);
  +       psc_ops-set_divisor(port, quot);
 
 Hmmm.  The divisor calculations have some tricky bits to them.  I
 would consider changing the set_divisor() function to accept a baud
 rate, and modify the set_divisor function to call uart_get_divisor().

That sounds like a good idea to me.  I will change the code that way.

 That way each set_divisor() can do whatever makes the most sense for
 the divisors available to it.  The 5121 for example has both a /10 and
 a /32 divisor, plus it can use an external clock.

Ouch.  I don't have a 512x, but isn't the current code plain wrong then?  It 
uses mpc5xxx_get_bus_frequency() as input for the baud rate calculation, and if 
the serial code assumes /16 instead of /10, the result must be terribly off.  
Or did I miss something here?

Best, Albrecht.

Tolle Dekolletés oder scharfe Tatoos? Vote jetzt ... oder mach selbst mit und 
zeige Deine Schokoladenseite
bei Topp oder Hopp von Arcor: http://www.arcor.de/rd/footer.toh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Strange OOPS in 2.6.33

2010-03-04 Thread Joakim Tjernlund

Got this OOPS a few times after coldstarting out
board a few times:

Unable to handle kernel paging request for unknown fault
Faulting instruction address: 0xc020e2b4
Oops: Kernel access of bad area, sig: 11 [#1]
TMCUTU
Modules linked in:
NIP: c020e2b4 LR: c020e274 CTR: 
REGS: c7a41b40 TRAP: 0600   Not tainted  (2.6.33)
MSR: 9032 EE,ME,IR,DR  CR: 28002424  XER: 
DAR: 09f52312, DSISR: 0120
TASK = c7889940[420] 'syslogd' THREAD: c7a4
GPR00: 09f52312 c7a41bf0 c7889940  0002 c7a41c40 c02734ac c78acc68
GPR08: c7a41c00 c78acc00  09f5214b c796e3d4 1001f444  bfe78700
GPR16: bfe77400 bfe77ee0 bfe773f8 0021 0ffef130  c7a41df0 
GPR24:  c7a41cf0 c7a41c70 0011 7f01 09f5214a c034a5cc c7a41c00
NIP [c020e2b4] ip_dev_find+0x90/0xf0
LR [c020e274] ip_dev_find+0x50/0xf0
Call Trace:
[c7a41bf0] [c020e274] ip_dev_find+0x50/0xf0 (unreliable)
[c7a41c60] [c01dd86c] __ip_route_output_key+0x8d4/0xb00
[c7a41d50] [c01ddab8] ip_route_output_flow+0x1c/0xa0
[c7a41d60] [c01ff8a0] ip4_datagram_connect+0x17c/0x2b8
[c7a41e30] [c020a75c] inet_dgram_connect+0x5c/0xa8
[c7a41e50] [c01a5030] sys_connect+0x7c/0xcc
[c7a41f00] [c01a6008] sys_socketcall+0x128/0x214
[c7a41f40] [c0011800] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff6e004
LR = 0xfe2dac0
Instruction dump:
bb810060 38210070 7c0803a6 4e800020 88010052 2f82 409e0028 81210054
83a90068 2f9d 419e0018 381d01c8 7d200028 31290001 7d20012d 40a2fff4
---[ end trace 0824e85bac28e7e4 ]---

gdb says:
(gdb) list *0xc020e2b4
0xc020e2b4 is in ip_dev_find 
(/usr/local/src/BUILD/trunk/os2kernel/arch/powerpc/include/asm/atomic.h:106).
101
102 static __inline__ void atomic_inc(atomic_t *v)
103 {
104 int t;
105
106 __asm__ __volatile__(
107 1: lwarx   %0,0,%2 # atomic_inc\n\
108 addic   %0,%0,1\n
109 PPC405_ERR77(0,%2)
110stwcx.  %0,0,%2 \n\

gdb) disass 0xc020e2b4 0xc020e2c4
Dump of assembler code from 0xc020e2b4 to 0xc020e2c4:
0xc020e2b4 ip_dev_find+144:   lwarx   r9,0,r0
0xc020e2b8 ip_dev_find+148:   addic   r9,r9,1
0xc020e2bc ip_dev_find+152:   stwcx.  r9,0,r0
0xc020e2c0 ip_dev_find+156:   bne-0xc020e2b4 ip_dev_find+144

This is on a MPC8321 CPU
gcc 3.4.6

Any ideas?

  Jocke

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.

2010-03-04 Thread Heiko Schocher
Hello Joakim,

Joakim Tjernlund wrote:
 Could you try reverting patch:
   8xx: Don't touch ACCESSED when no SWAP.
 and see if that makes a difference?
[...]
 Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an 
 improvement,
 regardless of my patches.

here the results:

run version

1-4 2.6.33-rc6 without your patches
5-8 2.6.33-rc6 with all your patches
9-122.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED 
when no SWAP)
13-16   2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y

 Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an 
 improvement,
 regardless of my patches.

make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'

 L M B E N C H  3 . 0   S U M M A R Y
 
 (Alpha software, do not distribute)

Basic system parameters
--
Host OS Description  Mhz  tlb  cache  mem   scal
 pages line   par   load
   bytes
- - ---  - - -- 
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   663216 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   66 716 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   66 716 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   663216 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   663216 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   66 716 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   66 716 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   663216 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   663216 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   663216 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   663216 1.01001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   663216 1.01001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   662816 1.17001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   66 716 1.01001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   662816 1.04001
tqm8xxLinux 2.6.33-   powerpc-linux-gnu   66 716 1.04001


Processor, Processes - times in microseconds - smaller is better
--
Host OS  Mhz null null  open slct sig  sig  fork exec sh
 call  I/O stat clos TCP  inst hndl proc proc proc
- -           
tqm8xxLinux 2.6.33-   66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K
tqm8xxLinux 2.6.33-   66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K
tqm8xxLinux 2.6.33-   66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K
tqm8xxLinux 2.6.33-   66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K
tqm8xxLinux 2.6.33-   66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K
tqm8xxLinux 2.6.33-   66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K
tqm8xxLinux 2.6.33-   66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K
tqm8xxLinux 2.6.33-   66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K
tqm8xxLinux 2.6.33-   66 3.06 8.83 128. 1355 269. 20.7 89.2 6927 29.K 87.K
tqm8xxLinux 2.6.33-   66 3.05 8.84 127. 1344 271. 21.6 90.5 6868 29.K 88.K
tqm8xxLinux 2.6.33-   66 3.06 8.84 131. 1376 260. 21.4 88.1 7119 29.K 87.K
tqm8xxLinux 2.6.33-   66 3.05 8.90 122. 1342 272. 21.4 88.6 6847 29.K 88.K
tqm8xxLinux 2.6.33-   66 3.19 9.10 122. 1205 265. 20.9 90.3 6358 27.K 83.K
tqm8xxLinux 2.6.33-   66 3.28 9.10 124. 1208 270. 20.9 95.2 6217 27.K 82.K
tqm8xxLinux 2.6.33-   66 3.19 8.98 125. 1210 270. 21.1 87.9 6364 27.K 83.K
tqm8xxLinux 2.6.33-   66 3.19 8.86 124. 1237 262. 21.3 90.7 6311 27.K 84.K

Basic integer operations - times in nanoseconds - smaller is better
---
Host OS  intgr intgr  intgr  intgr  intgr
  bit   addmuldivmod
- - -- -- -- -- --
tqm8xxLinux 2.6.33-   15.7   18.0 1.5600  124.2  203.1
tqm8xxLinux 2.6.33-   15.7   17.4 1.5800  121.1  202.8
tqm8xxLinux 2.6.33-   15.2   17.9 1.6200  124.2  202.7
tqm8xxLinux 2.6.33-   15.2   17.9 1.6000  125.0  204.0
tqm8xxLinux 2.6.33-   15.7   18.1 1.5600  124.7  204.4
tqm8xxLinux 2.6.33-   15.7   18.1 1.5800  124.2  202.8
tqm8xxLinux 2.6.33-   15.7   17.9 1.5500  124.2  203.2
tqm8xxLinux 2.6.33-   15.7   18.1 1.5500  124.5  202.0
tqm8xxLinux 2.6.33-   15.7   18.1 1.5500  

Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.

2010-03-04 Thread Wolfgang Denk
Dear Heiko,

thanks for running the tests.

In message 4b8f8bb4.6070...@denx.de you wrote:
 
 here the results:
 
 run   version
 
 1-4   2.6.33-rc6 without your patches
 5-8   2.6.33-rc6 with all your patches
 9-12  2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED 
 when no SWAP)
 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y

So CONFIG_PIN_TLB imroves the performance as expected, while the other
patches don;t show any measurable improvememt - or am I reading the
results incorrectly?


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de
And now remains  That we find out the cause of this effect, Or rather
say, the cause of this defect...   -- Hamlet, Act II, Scene 2
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.

2010-03-04 Thread Joakim Tjernlund
Wolfgang Denk w...@denx.de wrote on 2010/03/04 13:16:56:

 From: Wolfgang Denk w...@denx.de
 To: h...@denx.de
 Cc: Joakim Tjernlund joakim.tjernl...@transmode.se, Klaus-Jürgen
 heyd...@kieback-peter.de, linuxppc-...@ozlabs.org, Scott Wood
 scottw...@freescale.com
 Date: 2010/03/04 13:17
 Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.

 Dear Heiko,

 thanks for running the tests.

 In message 4b8f8bb4.6070...@denx.de you wrote:
 
  here the results:
 
  run   version
 
  1-4   2.6.33-rc6 without your patches
  5-8   2.6.33-rc6 with all your patches
  9-12   2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED
 when no SWAP)
  13-16   2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y

 So CONFIG_PIN_TLB imroves the performance as expected, while the other
 patches don;t show any measurable improvememt - or am I reading the
 results incorrectly?

Close but not quite. What stands out most is:

Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
--
Host OS   Mhz   L1 $   L2 $Main memRand memGuesses
- -   ---      ---
tqm8xxLinux 2.6.33-66   31.8  141.0   184.0  1165.7
tqm8xxLinux 2.6.33-66   31.8  141.2   184.2  1165.3
tqm8xxLinux 2.6.33-66   31.8  141.3   184.3  1165.6
tqm8xxLinux 2.6.33-66   31.8  141.3   184.2  1166.2

tqm8xxLinux 2.6.33-66   31.8  141.0   171.8  1100.5No L2 
cache?
tqm8xxLinux 2.6.33-66   31.8  141.0   171.8  1102.5No L2 
cache?
tqm8xxLinux 2.6.33-66   31.8  141.0   171.8  1101.7No L2 
cache?
tqm8xxLinux 2.6.33-66   31.8  141.0   171.8  1101.6No L2 
cache?

tqm8xxLinux 2.6.33-66   31.8  141.1   173.4  1149.1No L2 
cache?
tqm8xxLinux 2.6.33-66   31.8  141.1   173.4  1149.0No L2 
cache?
tqm8xxLinux 2.6.33-66   31.7  141.1   173.4  1148.7No L2 
cache?
tqm8xxLinux 2.6.33-66   31.7  141.1   173.4  1148.2No L2 
cache?

tqm8xxLinux 2.6.33-66   31.8  171.1   171.7  1099.8No L2 
cache?
tqm8xxLinux 2.6.33-66   31.8  171.1   171.6  1100.5No L2 
cache?
tqm8xxLinux 2.6.33-66   31.7  171.0   171.7  1101.0No L2 
cache?
tqm8xxLinux 2.6.33-66   31.8  171.0   171.6  1101.3No L2 
cache?


Besides the numbers, note how the first group doesn't have a Guesses entry.
Is there something odd with the results for the first group?

Also, since you are using MODULES, patch 2 is nullified.
Patch 1 is very minor and should not show I think.
This leaves patches 3  4.
There appears to be something funny with patch 3,Don't touch ACCESSED when no 
SWAP, as
it yields bad numbers for Prot Fault so perhaps I am missing something that 
needs ACCESSED
even if NO_SWAP. Perhaps a someone that knows MM in Linux knows?
Is there any messages in the kernel log(dmesg)?

 Jocke

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [Patch v.2] mpc5200b/uart: improve baud rate calculation (reach high baud rates, better accuracy)

2010-03-04 Thread Grant Likely
On Thu, Mar 4, 2010 at 2:56 AM, Albrecht Dreß albrecht.dr...@arcor.de wrote:
 That way each set_divisor() can do whatever makes the most sense for
 the divisors available to it.  The 5121 for example has both a /10 and
 a /32 divisor, plus it can use an external clock.

 Ouch.  I don't have a 512x, but isn't the current code plain wrong then?  It 
 uses mpc5xxx_get_bus_frequency() as input for the baud rate calculation, and 
 if the serial code assumes /16 instead of /10, the result must be terribly 
 off.  Or did I miss something here?

If you are, then I'm missing the same thing.  Do you best to keep the
5121 calculation work out to the same value it uses now.  We'll ask
someone with a 5121 to test it out before I add the patch to my -next
branch.

g.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.

2010-03-04 Thread Heiko Schocher
Hello Joakim,

Joakim Tjernlund wrote:
 Wolfgang Denk w...@denx.de wrote on 2010/03/04 13:16:56:
 From: Wolfgang Denk w...@denx.de
 To: h...@denx.de
 Cc: Joakim Tjernlund joakim.tjernl...@transmode.se, Klaus-Jürgen
 heyd...@kieback-peter.de, linuxppc-...@ozlabs.org, Scott Wood
 scottw...@freescale.com
 Date: 2010/03/04 13:17
 Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.

 Dear Heiko,

 thanks for running the tests.

 In message 4b8f8bb4.6070...@denx.de you wrote:
 here the results:

 run   version

 1-4   2.6.33-rc6 without your patches
 5-8   2.6.33-rc6 with all your patches
 9-12   2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED
 when no SWAP)
 13-16   2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y
 So CONFIG_PIN_TLB imroves the performance as expected, while the other
 patches don;t show any measurable improvememt - or am I reading the
 results incorrectly?
 
 Close but not quite. What stands out most is:
 
 Memory latencies in nanoseconds - smaller is better
 (WARNING - may not be correct, check graphs)
 --
 Host OS   Mhz   L1 $   L2 $Main memRand memGuesses
 - -   ---      ---
 tqm8xxLinux 2.6.33-66   31.8  141.0   184.0  1165.7
 tqm8xxLinux 2.6.33-66   31.8  141.2   184.2  1165.3
 tqm8xxLinux 2.6.33-66   31.8  141.3   184.3  1165.6
 tqm8xxLinux 2.6.33-66   31.8  141.3   184.2  1166.2
 
 tqm8xxLinux 2.6.33-66   31.8  141.0   171.8  1100.5No L2 
 cache?
 tqm8xxLinux 2.6.33-66   31.8  141.0   171.8  1102.5No L2 
 cache?
 tqm8xxLinux 2.6.33-66   31.8  141.0   171.8  1101.7No L2 
 cache?
 tqm8xxLinux 2.6.33-66   31.8  141.0   171.8  1101.6No L2 
 cache?
 
 tqm8xxLinux 2.6.33-66   31.8  141.1   173.4  1149.1No L2 
 cache?
 tqm8xxLinux 2.6.33-66   31.8  141.1   173.4  1149.0No L2 
 cache?
 tqm8xxLinux 2.6.33-66   31.7  141.1   173.4  1148.7No L2 
 cache?
 tqm8xxLinux 2.6.33-66   31.7  141.1   173.4  1148.2No L2 
 cache?
 
 tqm8xxLinux 2.6.33-66   31.8  171.1   171.7  1099.8No L2 
 cache?
 tqm8xxLinux 2.6.33-66   31.8  171.1   171.6  1100.5No L2 
 cache?
 tqm8xxLinux 2.6.33-66   31.7  171.0   171.7  1101.0No L2 
 cache?
 tqm8xxLinux 2.6.33-66   31.8  171.0   171.6  1101.3No L2 
 cache?
 
 
 Besides the numbers, note how the first group doesn't have a Guesses entry.
 Is there something odd with the results for the first group?

Hmm.. just to be safe, I made this test again, but it shows also no entry in
Guesses ... Hardware, Linux Source, rootFS, lmbench sources, all the
same ...

 Also, since you are using MODULES, patch 2 is nullified.
 Patch 1 is very minor and should not show I think.
 This leaves patches 3  4.
 There appears to be something funny with patch 3,Don't touch ACCESSED when no 
 SWAP, as
 it yields bad numbers for Prot Fault so perhaps I am missing something that 
 needs ACCESSED
 even if NO_SWAP. Perhaps a someone that knows MM in Linux knows?
 Is there any messages in the kernel log(dmesg)?

I couldn;t find something in the output with dmesg ... but if you
want this output, I can send it to you.

bye
Heiko
-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] gianfar: Fix TX ring processing on SMP machines

2010-03-04 Thread Kumar Gala

On Mar 4, 2010, at 2:41 AM, David Miller wrote:

 From: Anton Vorontsov avoront...@ru.mvista.com
 Date: Wed, 3 Mar 2010 21:18:58 +0300
 
 Starting with commit a3bc1f11e9b867a4f49505 (gianfar: Revive SKB
 recycling) gianfar driver sooner or later stops transmitting any
 packets on SMP machines.
 
 start_xmit() prepares new skb for transmitting, generally it does
 three things:
 
 1. sets up all BDs (marks them ready to send), except the first one.
 2. stores skb into tx_queue-tx_skbuff so that clean_tx_ring()
   would cleanup it later.
 3. sets up the first BD, i.e. marks it ready.
 
 Here is what clean_tx_ring() does:
 
 1. reads skbs from tx_queue-tx_skbuff
 2. checks if the *last* BD is ready. If it's still ready [to send]
   then it it isn't transmitted, so clean_tx_ring() returns.
   Otherwise it actually cleanups BDs. All is OK.
 
 Now, if there is just one BD, code flow:
 
 - start_xmit(): stores skb into tx_skbuff. Note that the first BD
  (which is also the last one) isn't marked as ready, yet.
 - clean_tx_ring(): sees that skb is not null, *and* its lstatus
  says that it is NOT ready (like if BD was sent), so it cleans
  it up (bad!)
 - start_xmit(): marks BD as ready [to send], but it's too late.
 
 We can fix this simply by reordering lstatus/tx_skbuff writes.
 
 Reported-by: Martyn Welch martyn.we...@ge.com
 Bisected-by: Paul Gortmaker paul.gortma...@windriver.com
 Signed-off-by: Anton Vorontsov avoront...@ru.mvista.com
 Tested-by: Paul Gortmaker paul.gortma...@windriver.com
 Tested-by: Martyn Welch martyn.we...@ge.com
 
 Applied.

Anton,

Once this makes it into Linus's tree can you make sure we get it added to 
-stable.

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Renaming following split of GE Fanuc joint venture

2010-03-04 Thread Kumar Gala

On Mar 1, 2010, at 8:41 AM, Martyn Welch wrote:

 This patch renames GE Fanuc boards following the split-up of the GE Fanuc 
 joint venture. These boards are now made by GE Intelligent platorms.
 
 Signed-off-by: Martyn Welch martyn.we...@gefanuc.com
 ---
 
 arch/powerpc/boot/dts/gef_ppc9a.dts  |4 ++--
 arch/powerpc/boot/dts/gef_sbc310.dts |4 ++--
 arch/powerpc/boot/dts/gef_sbc610.dts |4 ++--
 arch/powerpc/platforms/86xx/Kconfig  |   12 ++--
 arch/powerpc/platforms/86xx/gef_gpio.c   |   10 +-
 arch/powerpc/platforms/86xx/gef_pic.c|6 +++---
 arch/powerpc/platforms/86xx/gef_ppc9a.c  |   12 ++--
 arch/powerpc/platforms/86xx/gef_sbc310.c |   12 ++--
 arch/powerpc/platforms/86xx/gef_sbc610.c |   12 ++--
 9 files changed, 38 insertions(+), 38 deletions(-)

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 44/66] arch/powerpc/sysdev/cpm2_pic.h: Checkpatch cleanup

2010-03-04 Thread Kumar Gala

On Feb 27, 2010, at 10:51 AM, Andrea Gelmini wrote:

 arch/powerpc/sysdev/cpm2_pic.h:6: ERROR: (foo*) should be (foo *)
 
 Signed-off-by: Andrea Gelmini andrea.gelm...@gelma.net
 ---
 arch/powerpc/sysdev/cpm2_pic.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] perf_event: Build callchain code regardless of hardware event support.

2010-03-04 Thread Kumar Gala

On Feb 25, 2010, at 6:09 PM, Paul Mackerras wrote:

 On Thu, Feb 25, 2010 at 06:04:33PM -0600, Scott Wood wrote:
 It's also useful for software events, as well as future support for
 other types of hardware counters.
 
 Signed-off-by: Scott Wood scottw...@freescale.com
 
 Acked-by: Paul Mackerras pau...@samba.org

applied to next

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] perf_event: e500 support

2010-03-04 Thread Kumar Gala

On Feb 25, 2010, at 6:09 PM, Scott Wood wrote:

 This implements perf_event support for the Freescale embedded performance
 monitor, based on the existing perf_event.c that supports server/classic
 chips.
 
 Some limitations:
 - Performance monitor interrupts are regular EE interrupts, and thus you
  can't profile places with interrupts disabled.  We may want to implement
  soft IRQ-disabling, with perfmon interrupts exempted and treated as NMIs.
 - When trying to schedule multiple event groups at once, and using
  restricted events, situations could arise where scheduling fails even
  though it would be possible.  Consider three groups, each with two events.
  One group has restricted events, the others don't.  The two non-restricted
  groups are scheduled, then one is removed, which happens to occupy the two
  counters that can't do restricted events.  The remaining non-restricted
  group will not be moved to the non-restricted-capable counters to make
  room if the restricted group tries to be scheduled.
 
 Signed-off-by: Scott Wood scottw...@freescale.com
 ---
 Changes from previous version:
 - Factored out callchain makefile patch
 - Split up header files
 - Renamed pmu struct
 - Added threshold support
 
 arch/powerpc/include/asm/perf_event.h  |  133 +
 arch/powerpc/include/asm/perf_event_fsl_emb.h  |   50 ++
 .../asm/{perf_event.h = perf_event_server.h}  |4 +-
 arch/powerpc/include/asm/reg_fsl_emb.h |2 +-
 arch/powerpc/kernel/Makefile   |4 +
 arch/powerpc/kernel/cputable.c |2 +-
 arch/powerpc/kernel/e500-pmu.c |  129 
 arch/powerpc/kernel/perf_event_fsl_emb.c   |  654 
 arch/powerpc/platforms/Kconfig.cputype |   10 +
 9 files changed, 874 insertions(+), 114 deletions(-)
 rewrite arch/powerpc/include/asm/perf_event.h (92%)
 create mode 100644 arch/powerpc/include/asm/perf_event_fsl_emb.h
 rename arch/powerpc/include/asm/{perf_event.h = perf_event_server.h} (98%)
 create mode 100644 arch/powerpc/kernel/e500-pmu.c
 create mode 100644 arch/powerpc/kernel/perf_event_fsl_emb.c

Paul do you intend to Ack this or don't care?

- k

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[Patch v.3] mpc5200b/uart: improve baud rate calculation (reach high baud rates, better accuracy)

2010-03-04 Thread Albrecht Dreß
On the MPC5200B, make very high baud rates (e.g. 3 MBaud) accessible and
achieve a higher precision for high baud rates in general.  This is done by
selecting the appropriate prescaler (/4 or /32).  As to keep the code clean,
the getuartclk method has been dropped, and all calculations are done in a
new set_baudrate method.

Notes: only fsl,mpc5200b-psc-uart compatible devices benefit from these
improvements.
The 512x may or may not work; the patch keeps the current implementation
(using a /16 prescaler), but according to the data sheet, this is plain
wrong.  See the comment in mpc512x_psc_set_baudrate().  Any insight and
testing of the code would be appreciated.

Tested on a custom 5200B based board, from 110 baud up to 3 MBaud, and with
both fsl,mpc5200b-psc-uart and fsl,mpc5200-psc-uart devices.

Signed-off-by: Albrecht Dreß albrecht.dr...@arcor.de

---

Changes vs. v.2: Pick up Grant's comments by shifting the calculations to the
new set_baudrate method.


--- linux-2.6.33-orig/drivers/serial/mpc52xx_uart.c 2010-02-24 
19:52:17.0 +0100
+++ linux-2.6.33/drivers/serial/mpc52xx_uart.c  2010-03-04 17:13:47.0 
+0100
@@ -144,9 +144,21 @@ struct psc_ops {
unsigned char   (*read_char)(struct uart_port *port);
void(*cw_disable_ints)(struct uart_port *port);
void(*cw_restore_ints)(struct uart_port *port);
-   unsigned long   (*getuartclk)(void *p);
+   unsigned int(*set_baudrate)(struct uart_port *port,
+   struct ktermios *new,
+   struct ktermios *old);
 };
 
+/* setting the prescaler and divisor reg is common for all chips */
+static inline void mpc52xx_set_divisor(struct mpc52xx_psc __iomem *psc,
+  u16 prescaler, unsigned int divisor)
+{
+   /* select prescaler */
+   out_be16(psc-mpc52xx_psc_clock_select, prescaler);
+   out_8(psc-ctur, divisor  8);
+   out_8(psc-ctlr, divisor  0xff);
+}
+
 #ifdef CONFIG_PPC_MPC52xx
 #define FIFO_52xx(port) ((struct mpc52xx_psc_fifo __iomem *)(PSC(port)+1))
 static void mpc52xx_psc_fifo_init(struct uart_port *port)
@@ -154,9 +166,6 @@ static void mpc52xx_psc_fifo_init(struct
struct mpc52xx_psc __iomem *psc = PSC(port);
struct mpc52xx_psc_fifo __iomem *fifo = FIFO_52xx(port);
 
-   /* /32 prescaler */
-   out_be16(psc-mpc52xx_psc_clock_select, 0xdd00);
-
out_8(fifo-rfcntl, 0x00);
out_be16(fifo-rfalarm, 0x1ff);
out_8(fifo-tfcntl, 0x07);
@@ -245,15 +254,47 @@ static void mpc52xx_psc_cw_restore_ints(
out_be16(PSC(port)-mpc52xx_psc_imr, port-read_status_mask);
 }
 
-/* Search for bus-frequency property in this node or a parent */
-static unsigned long mpc52xx_getuartclk(void *p)
-{
-   /*
-* 5200 UARTs have a / 32 prescaler
-* but the generic serial code assumes 16
-* so return ipb freq / 2
-*/
-   return mpc5xxx_get_bus_frequency(p) / 2;
+static unsigned int mpc5200_psc_set_baudrate(struct uart_port *port,
+struct ktermios *new,
+struct ktermios *old)
+{
+   unsigned int baud;
+   unsigned int divisor;
+
+   /* The 5200 has a fixed /32 prescaler, uartclk contains the ipb freq */
+   baud = uart_get_baud_rate(port, new, old,
+ port-uartclk / (32 * 0x) + 1,
+ port-uartclk / 32);
+   divisor = (port-uartclk + 16 * baud) / (32 * baud);
+
+   /* enable the /32 prescaler and set the divisor */
+   mpc52xx_set_divisor(PSC(port), 0xdd00, divisor);
+   return baud;
+}
+
+static unsigned int mpc5200b_psc_set_baudrate(struct uart_port *port,
+ struct ktermios *new,
+ struct ktermios *old)
+{
+   unsigned int baud;
+   unsigned int divisor;
+   u16 prescaler;
+
+   /* The 5200B has a selectable /4 or /32 prescaler, uartclk contains the
+* ipb freq */
+   baud = uart_get_baud_rate(port, new, old,
+ port-uartclk / (32 * 0x) + 1,
+ port-uartclk / 4);
+   divisor = (port-uartclk + 2 * baud) / (4 * baud);
+
+   /* select the proper prescaler and set the divisor */
+   if (divisor  0x) {
+   divisor = (divisor + 4) / 8;
+   prescaler = 0xdd00; /* /32 */
+   } else
+   prescaler = 0xff00; /* /4 */
+   mpc52xx_set_divisor(PSC(port), prescaler, divisor);
+   return baud;
 }
 
 static struct psc_ops mpc52xx_psc_ops = {
@@ -272,7 +313,26 @@ static struct psc_ops mpc52xx_psc_ops = 
.read_char = mpc52xx_psc_read_char,
.cw_disable_ints = mpc52xx_psc_cw_disable_ints,
.cw_restore_ints = mpc52xx_psc_cw_restore_ints,
-   .getuartclk = mpc52xx_getuartclk,
+ 

Re: [RFC: PATCH 08/13] powerpc/476: define specific cpu table entry for DD1 and DD1.1 cores

2010-03-04 Thread Hollis Blanchard
On Mon, Mar 1, 2010 at 11:13 AM, Dave Kleikamp sha...@linux.vnet.ibm.comwrote:

 powerpc/476: define specific cpu table entry for DD1 and DD1.1 cores

 From: Benjamin Herrenschmidt b...@kernel.crashing.org

 There are still some unstable bits on the DD1 and DD1.1 cores.  Don't use
 the FPU or the tlbivax operation.  Define CPU_FTR_476_DD1 and
 CPU_FTR_476_DD1_1 for additional workarounds in later patches.

 The DD1 core requires workarounds triggered by both CPU_FTR_476_DD1
 and CPU_FTR_476_DD1_1.  the DD1.1 core only needs CPU_FTR_476_DD1_1
 defined.

 Isn't the policy generally not to commit workarounds for early/errataful
hardware which will not be seen in the real world? Otherwise, every new
half-broken core could burn a bunch of feature bits...

-Hollis
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCHv4 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-03-04 Thread Michael Neuling
In message 1267541076.25158.60.ca...@laptop you wrote:
 On Sat, 2010-02-27 at 21:21 +1100, Michael Neuling wrote:
  In message 11927.1267010...@neuling.org you wrote:
 If there's less the group will normally be balanced and we fall out a
nd
 end up in check_asym_packing().
 
 So what I tried doing with that loop is detect if there's a hole in t
he
 packing before busiest. Now that I think about it, what we need to ch
eck
 is if this_cpu (the removed cpu argument) is idle and less than busie
st.
 
 So something like:
 
 static int check_asym_pacing(struct sched_domain *sd,
  struct sd_lb_stats *sds,
  int this_cpu, unsigned long *imbalance)
 {
   int busiest_cpu;
 
   if (!(sd-flags  SD_ASYM_PACKING))
   return 0;
 
   if (!sds-busiest)
   return 0;
 
   busiest_cpu = group_first_cpu(sds-busiest);
   if (cpu_rq(this_cpu)-nr_running || this_cpu  busiest_cpu)
   return 0;
 
   *imbalance = (sds-max_load * sds-busiest-cpu_power) /
   SCHED_LOAD_SCALE;
   return 1;
 }
 
 Does that make sense?

I think so.

I'm seeing check_asym_packing do the right thing with the simple SMT2
with 1 process case.  It marks cpu0 as imbalanced when cpu0 is idle and
cpu1 is busy.

Unfortunately the process doesn't seem to be get migrated down though.
Do we need to give *imbalance a higher value? 
   
   So with ego help, I traced this down a bit more.  
   
   In my simple test case (SMT2, t0 idle, t1 active) if f_b_g() hits our
   new case in check_asym_packing(), load_balance then runs f_b_q().
   f_b_q() has this:
   
 if (capacity  rq-nr_running == 1  wl  imbalance)
 continue;
   
   when check_asym_packing() hits, wl = 1783 and imbalance = 1024, so we
   continue and busiest remains NULL. 
   
   load_balance then does goto out_balanced and it doesn't attempt to
   move the task.
   
   Based on this and on egos suggestion I pulled in Suresh Siddha patch
   from: http://lkml.org/lkml/2010/2/12/352.  This fixes the problem.  The
   process is moved down to t0.  
   
   I've only tested SMT2 so far.  
  
  I'm finding this SMT2 result to be unreliable. Sometimes it doesn't work
  for the simple 1 process case.  It seems to change boot to boot.
  Sometimes it works as expected with t0 busy and t1 idle, but other times
  it's the other way around.
  
  When it doesn't work, check_asym_packing() is still marking processes to
  be pulled down but only gets run about 1 in every 4 calls to
  load_balance().
  
  For 2 of the other calls to load_balance, idle is CPU_NEWLY_IDLE and
  hence check_asym_packing() doesn't get called.  This results in
  sd-nr_balance_failed being reset.  When load_balance is next called and
  check_asym_packing() hits, need_active_balance() returns 0 as
  sd-nr_balance_failed is too small.  This means the migration thread on
  t1 is not woken and the process remains there.  
  
  So why does thread0 change from NEWLY_IDLE to IDLE and visa versa, when
  there is nothing running on it?  Is this expected? 
 
 Ah, yes, you should probably allow both those.
 
 NEWLY_IDLE is when we are about to schedule the idle thread, IDLE is
 when a tick hits the idle thread.
 
 I'm thinking that NEWLY_IDLE should also solve the NO_HZ case, since
 we'll have passed through that before we enter tickless state, just make
 sure SD_BALANCE_NEWIDLE is set on the relevant levels (should already be
 so).

OK, thanks.

There seems to be a regression in Linus' latest tree (also -next) where
new processes usually end up on the thread 1 rather than 0 (when in SMT2
mode).

This only seems to happen with newly created processes.  If you pin a
process to t0 and then unpin it, it stays on t0.  Also if a process is
migrated to another core, it can end up on t0.

This happens with a vanilla linus or -next tree on ppc64
pseries_defconfig - NO_HZ.  I've not tried with NO_HZ.

Anyway, this regression seems to be causing problems when we apply our
patch.  We are trying to pull down to T0 which works, but we immediately
get pulled back upto t1 due to the above regression.  This happens over
and over, causing process to ping-pong every few sched ticks.  

We've not tried to bisect this problem but that's the next step unless
someone has some insights to the problem.

Also, we had to change the following to get the pull down to work
correctly in the original patch:

@@ -2618,8 +2618,8 @@ static int check_asym_packing(struct sch
if (this_cpu  busiest_cpu)
return 0;
 
-   *imbalance = (sds-max_load * sds-busiest-cpu_power) /
-   SCHED_LOAD_SCALE;
+   *imbalance = DIV_ROUND_CLOSEST(sds-max_load * sds-busiest-cpu_power, 
+  

Re: [PATCH 2/2] perf_event: e500 support

2010-03-04 Thread Paul Mackerras
On Thu, Mar 04, 2010 at 10:48:03AM -0600, Kumar Gala wrote:

 Paul do you intend to Ack this or don't care?

Sorry, thought I had.

Acked-by: Paul Mackerras pau...@samba.org
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 7/7] powerpc/85xx: Fix the RapidIO maintenance access functions

2010-03-04 Thread tmo

Quoting Micha Nelissen mi...@neli.hopto.org:


Bounine, Alexandre wrote:

Hi Micha,

I tested it on my setup - it works.
Maybe Thomas may give more details on this change.


Did you (for fun) try once to decrease the maintenance window to  
say, 4 kB? Then you really need these high bits to work properly.


We have never changed the configuration of the window size in Linux,  
but we have an test application that uses these 4kb window setting.
With the current window configuration in Linux, I expect problems when  
someone tries to read a register that is located at offset  512KB. I  
have currently no equipment to verify this behaviour.




Or did you try with some register at offset  4MB? The Tundra's have  
registers going up to 0x14000 or so? So don't need 16MB addressing  
for that.
We have devices that requires access to registers that are located at  
offset  15MB.




Thanks, Micha



-Original Message-
From: Micha Nelissen [mailto:mi...@neli.hopto.org]
Sent: Wednesday, February 24, 2010 3:21 PM
To: Alexandre Bounine
Subject: Re: [PATCH 7/7] powerpc/85xx: Fix the RapidIO maintenance

access functions

Alexandre Bounine wrote:

out_be32(priv-maint_atmu_regs-rowtar,
-(destid  22) | (hopcount  12) | ((offset  ~0x3) 

9));

+(destid  22) | (hopcount  12) | (offset  12));
+   out_be32(priv-maint_atmu_regs-rowtear,  (destid  10));

Did this actually work for you? The (offset  12) is due to the 4MB
window size right?

Micha







___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev