Re: [9fans] freedom (was Re: Compiling 9atom kernel)
On May 6, 2011, at 11:01 PM, Lucio De Re wrote: has anybody conclusively established why something MS could not successfully market has found such a ready audience when supplied by Apple? Is it the UI as one might conclude from an earlier post? The stylus, maybe? http://www.tuaw.com/files/stevesings.mp3 My pet theory is that it's the same reason people don't buy paper tablets that weigh 5+ lbs and have with built-in typewriters. Of course, you can't discount the possibility that there's a difference between a car and a horseless carriage. — Daniel Lyons
[9fans] _xinc vs ainc
i'm confused by the recent change to the thread library. the old code was simply to do a locked incl. the new code does a locked exchange /within a loop/ until it's seen that nobody else has updated the value at the same time, thus insuring that the value has indeed been updated. since the expensive operation is the MESI(F) negotiation behind the scenes to get exclusive access to the cacheline, i don't understand the motiviation is for replacing _xinc with ainc. since ainc can loop on an expensive lock instruction. that is, i think the old version was wait free, and the new version is not. can someone explain what i'm missing here? thanks! - erik TEXT_xinc(SB),$0/* void _xinc(long *); */ MOVLl+0(FP),AX LOCK INCL0(AX) RET TEXT ainc(SB), $0 /* long ainc(long *); */ MOVLaddr+0(FP), BX ainclp: MOVL(BX), AX MOVLAX, CX INCLCX LOCK BYTE$0x0F; BYTE $0xB1; BYTE $0x0B /* CMPXCHGL CX, (BX) */ JNZ ainclp MOVLCX, AX RET
Re: [9fans] _xinc vs ainc
On May 7, 2011, at 6:05 AM, erik quanstrom quans...@quanstro.net wrote: i'm confused by the recent change to the thread library. the old code was simply to do a locked incl. the new code does a locked exchange /within a loop/ until it's seen that nobody else has updated the value at the same time, thus insuring that the value has indeed been updated. since the expensive operation is the MESI(F) negotiation behind the scenes to get exclusive access to the cacheline, i don't understand the motiviation is for replacing _xinc with ainc. since ainc can loop on an expensive lock instruction. that is, i think the old version was wait free, and the new version is not. can someone explain what i'm missing here? thanks! - erik TEXT_xinc(SB),$0/* void _xinc(long *); */ MOVLl+0(FP),AX LOCK INCL0(AX) RET TEXT ainc(SB), $0/* long ainc(long *); */ MOVLaddr+0(FP), BX ainclp: MOVL(BX), AX MOVLAX, CX INCLCX LOCK BYTE$0x0F; BYTE $0xB1; BYTE $0x0B/* CMPXCHGL CX, (BX) */ JNZainclp MOVLCX, AX RET Just guessing. May be the new code allows more concurrency? If the value is not in the processor cache, will the old code block other processors for much longer? The new code forces caching with the first read so may be high likelyhood cmpxchg will finish faster. I haven't studied x86 cache behavior so this guess could be completely wrong. Suggest asking on comp.arch where people like Andy Glew can give you a definitive answer.
Re: [9fans] Plan 9 GSoC projects selected
The main idea is to avoid the duplication of xlib dependent code in inferno, p9p, 9vx and drawterm and write a wsys device to use the window manager of the host system through a file server similar to rio(4). If the new x11 code will be a library or some sort of p9p's devdraw(1) and if the wsys device will be an inferno/drawterm/9vx kernel device or an external program serving 9P are open questions yet. Feel free to ask if you want to know more. -- - yiyus || JGL . Awesome, feels like Christmas to me. Good luck :)
Re: [9fans] _xinc vs ainc
Just guessing. May be the new code allows more concurrency? If the value is not in the processor cache, will the old code block other processors for much longer? The new code forces caching with the first read so may be high likelyhood cmpxchg will finish faster. I haven't studied x86 cache behavior so this guess could be completely wrong. Suggest asking on comp.arch where people like Andy Glew can give you a definitive answer. according to intel, this is a myth. search for myth in this page. http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures/ and this stands to reason, since both techniques revolve around a LOCK'd instruction, thus invoking the x86 architectural MESI(f) protocol. the difference, and my main point is that the loop in ainc means that it is not a wait-free algorithm. this is not only sub optimal, but also could lead to incorrect behavior. - erik
Re: [9fans] _xinc vs ainc
On Sat, 07 May 2011 18:47:54 EDT erik quanstrom quans...@quanstro.net wrote: Just guessing. May be the new code allows more concurrency? If the value is not in the processor cache, will the old code block other processors for much longer? The new code forces caching with the first read so may be high likelyhood cmpxchg will finish faster. I haven't studied x86 cache behavior so this guess could be completely wrong. Suggest asking on comp.arch where people like Andy Glew can give you a definitive answer. according to intel, this is a myth. search for myth in this page. http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-f or-multi-core-intel-em64t-and-ia32-architectures/ and this stands to reason, since both techniques revolve around a LOCK'd instruction, thus invoking the x86 architectural MESI(f) protocol. the difference, and my main point is that the loop in ainc means that it is not a wait-free algorithm. this is not only sub optimal, but also could lead to incorrect behavior. I think a more likely possibility for the change is to have a *copy* of what was incremented. lock incl 0(ax) won't tell you what the value was when it was incremented. But I don't see how the change will lead to an incorrect behavior.
Re: [9fans] _xinc vs ainc
the difference, and my main point is that the loop in ainc means that it is not a wait-free algorithm. this is not only sub optimal, but also could lead to incorrect behavior. I think a more likely possibility for the change is to have a *copy* of what was incremented. lock incl 0(ax) won't tell you what the value was when it was incremented. you can read the code. that value is not used by the thread library. But I don't see how the change will lead to an incorrect behavior. could. imagine you have two threads entering ainc. the loser will loop. imagine that before the loser completes his loop a third thread enters aintr and becomes a two-time loser. by induction it's possible that the loser never completes in n loops for any given n. this of course is basically the definition of a waiting algorithm. if your program depends on time-bounded behavior from the thread library, you could have trouble with a non-wait-free algorithm like this. perhaps my concern is unfounded. i'd like to hear the argument. - erik
Re: [9fans] _xinc vs ainc
On Sat, 07 May 2011 20:25:25 EDT erik quanstrom quans...@quanstro.net wrote: the difference, and my main point is that the loop in ainc means that it is not a wait-free algorithm. this is not only sub optimal, but also could lead to incorrect behavior. I think a more likely possibility for the change is to have a *copy* of what was incremented. lock incl 0(ax) won't tell you what the value was when it was incremented. you can read the code. that value is not used by the thread library. If you want to use the value being atomically incremented, there is no more efficient way on x86. May not be used now but may be it can be used to make some synchronization primitive more efficient? if your program depends on time-bounded behavior from the thread library, you could have trouble with a non-wait-free algorithm like this. Yes, but I think associating time-bounded behavior with any shared memory access is iffy. You always have this possibility on processors that provide nothing stronger than LL/SC (load-linked/stored-conditional).