Re: [9fans] freedom (was Re: Compiling 9atom kernel)

2011-05-07 Thread Daniel Lyons

On May 6, 2011, at 11:01 PM, Lucio De Re wrote:

 has anybody
 conclusively established why something MS could not successfully market
 has found such a ready audience when supplied by Apple?  Is it the UI
 as one might conclude from an earlier post?


The stylus, maybe? http://www.tuaw.com/files/stevesings.mp3

My pet theory is that it's the same reason people don't buy paper tablets that 
weigh 5+ lbs and have with built-in typewriters. Of course, you can't discount 
the possibility that there's a difference between a car and a horseless 
carriage.

— 
Daniel Lyons




[9fans] _xinc vs ainc

2011-05-07 Thread erik quanstrom
i'm confused by the recent change to the thread library.
the old code was simply to do a locked incl.  the new code
does a locked exchange /within a loop/ until it's seen that
nobody else has updated the value at the same time, thus
insuring that the value has indeed been updated.

since the expensive operation is the MESI(F) negotiation
behind the scenes to get exclusive access to the cacheline,
i don't understand the motiviation is for replacing _xinc
with ainc.  since ainc can loop on an expensive lock instruction.

that is, i think the old version was wait free, and the new version
is not.

can someone explain what i'm missing here?

thanks!

- erik



TEXT_xinc(SB),$0/* void _xinc(long *); */

MOVLl+0(FP),AX
LOCK
INCL0(AX)
RET



TEXT ainc(SB), $0   /* long ainc(long *); */
MOVLaddr+0(FP), BX
ainclp:
MOVL(BX), AX
MOVLAX, CX
INCLCX
LOCK
BYTE$0x0F; BYTE $0xB1; BYTE $0x0B   /* CMPXCHGL CX, (BX) */
JNZ ainclp
MOVLCX, AX
RET



Re: [9fans] _xinc vs ainc

2011-05-07 Thread Bakul Shah
On May 7, 2011, at 6:05 AM, erik quanstrom quans...@quanstro.net  
wrote:



i'm confused by the recent change to the thread library.
the old code was simply to do a locked incl.  the new code
does a locked exchange /within a loop/ until it's seen that
nobody else has updated the value at the same time, thus
insuring that the value has indeed been updated.

since the expensive operation is the MESI(F) negotiation
behind the scenes to get exclusive access to the cacheline,
i don't understand the motiviation is for replacing _xinc
with ainc.  since ainc can loop on an expensive lock instruction.

that is, i think the old version was wait free, and the new version
is not.

can someone explain what i'm missing here?



thanks!

- erik



TEXT_xinc(SB),$0/* void _xinc(long *); */

   MOVLl+0(FP),AX
   LOCK
   INCL0(AX)
   RET



TEXT ainc(SB), $0/* long ainc(long *); */
   MOVLaddr+0(FP), BX
ainclp:
   MOVL(BX), AX
   MOVLAX, CX
   INCLCX
   LOCK
   BYTE$0x0F; BYTE $0xB1; BYTE $0x0B/* CMPXCHGL CX, (BX) */
   JNZainclp
   MOVLCX, AX
   RET



Just guessing. May be the new code allows more concurrency? If the  
value is not in the processor cache, will the old code block other  
processors for much longer? The new code forces caching with the first  
read so may be high likelyhood cmpxchg will finish faster. I haven't  
studied x86 cache behavior so this guess could be completely wrong.  
Suggest asking on comp.arch where people like Andy Glew can give you a  
definitive answer.




Re: [9fans] Plan 9 GSoC projects selected

2011-05-07 Thread hiro
 The main idea is to avoid the duplication of xlib dependent code in
 inferno, p9p, 9vx and drawterm and write a wsys device to use the
 window manager of the host system through a file server similar to
 rio(4). If the new x11 code will be a library or some sort of p9p's
 devdraw(1) and if the wsys device will be an inferno/drawterm/9vx
 kernel device or an external program serving 9P are open questions
 yet.

 Feel free to ask if you want to know more.


 --
 - yiyus || JGL .



Awesome, feels like Christmas to me. Good luck :)



Re: [9fans] _xinc vs ainc

2011-05-07 Thread erik quanstrom
 Just guessing. May be the new code allows more concurrency? If the  
 value is not in the processor cache, will the old code block other  
 processors for much longer? The new code forces caching with the first  
 read so may be high likelyhood cmpxchg will finish faster. I haven't  
 studied x86 cache behavior so this guess could be completely wrong.  
 Suggest asking on comp.arch where people like Andy Glew can give you a  
 definitive answer.

according to intel, this is a myth.  search for myth in this page.

http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures/

and this stands to reason, since both techniques revolve around a
LOCK'd instruction, thus invoking the x86 architectural MESI(f)
protocol.

the difference, and my main point is that the loop in ainc means
that it is not a wait-free algorithm.  this is not only sub optimal,
but also could lead to incorrect behavior.

- erik



Re: [9fans] _xinc vs ainc

2011-05-07 Thread Bakul Shah
On Sat, 07 May 2011 18:47:54 EDT erik quanstrom quans...@quanstro.net  wrote:
  Just guessing. May be the new code allows more concurrency? If the  
  value is not in the processor cache, will the old code block other  
  processors for much longer? The new code forces caching with the first  
  read so may be high likelyhood cmpxchg will finish faster. I haven't  
  studied x86 cache behavior so this guess could be completely wrong.  
  Suggest asking on comp.arch where people like Andy Glew can give you a  
  definitive answer.
 
 according to intel, this is a myth.  search for myth in this page.
 
 http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-f
 or-multi-core-intel-em64t-and-ia32-architectures/
 
 and this stands to reason, since both techniques revolve around a
 LOCK'd instruction, thus invoking the x86 architectural MESI(f)
 protocol.
 
 the difference, and my main point is that the loop in ainc means
 that it is not a wait-free algorithm.  this is not only sub optimal,
 but also could lead to incorrect behavior.

I think a more likely possibility for the change is to have a
*copy* of what was incremented. lock incl 0(ax) won't tell you
what the value was when it was incremented.

But I don't see how the change will lead to an incorrect behavior.



Re: [9fans] _xinc vs ainc

2011-05-07 Thread erik quanstrom
  the difference, and my main point is that the loop in ainc means
  that it is not a wait-free algorithm.  this is not only sub optimal,
  but also could lead to incorrect behavior.
 
 I think a more likely possibility for the change is to have a
 *copy* of what was incremented. lock incl 0(ax) won't tell you
 what the value was when it was incremented.

you can read the code.  that value is not used by the thread library.

 But I don't see how the change will lead to an incorrect behavior.

could.

imagine you have two threads entering ainc.  the loser will
loop.  imagine that before the loser completes his loop a
third thread enters aintr and becomes a two-time loser.  by
induction it's possible that the loser never completes in n
loops for any given n.

this of course is basically the definition of a waiting algorithm.

if your program depends on time-bounded behavior from
the thread library, you could have trouble with a non-wait-free
algorithm like this.

perhaps my concern is unfounded.  i'd like to hear the argument.

- erik



Re: [9fans] _xinc vs ainc

2011-05-07 Thread Bakul Shah
On Sat, 07 May 2011 20:25:25 EDT erik quanstrom quans...@quanstro.net  wrote:
   the difference, and my main point is that the loop in ainc means
   that it is not a wait-free algorithm.  this is not only sub optimal,
   but also could lead to incorrect behavior.
  
  I think a more likely possibility for the change is to have a
  *copy* of what was incremented. lock incl 0(ax) won't tell you
  what the value was when it was incremented.
 
 you can read the code.  that value is not used by the thread library.

If you want to use the value being atomically incremented,
there is no more efficient way on x86. May not be used now but
may be it can be used to make some synchronization primitive
more efficient?

 if your program depends on time-bounded behavior from
 the thread library, you could have trouble with a non-wait-free
 algorithm like this.

Yes, but I think associating time-bounded behavior with any
shared memory access is iffy.  You always have this
possibility on processors that provide nothing stronger than
LL/SC (load-linked/stored-conditional).