Re: [fpc-devel] LockFree Queue algorithm
DarekM schrieb: > Florian Klaempfl pisze: >> An if is unimportant, more important is the number of locked operations, >> especially on multi core systems they might eat hundreds of clock cycles. >> > There are atomic operations, the should not eat much more than ordinal > INC or ADD, If course they are. They require no more a bus lock as in 486 times but a synchronization of all caches for this particular memory location. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Fpc Target directory function
> Does a function like this exist: > > function FpcTargetDir: string; > begin > result:= {$I %FPCTARGETCPU%}+'-'+{$I %FPCTARGETOS%}; > end; > > Above works for my needs, just wonder if something already exists in RTL > or if it should be added for convenience. It doesn't exists and i don't see a reason why it should be added. The above line is simple. And there is no generic rule that says that you need to encode TargetDir like cpu-os for every program. Peter ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LockFree Queue algorithm
Florian Klaempfl pisze: An if is unimportant, more important is the number of locked operations, especially on multi core systems they might eat hundreds of clock cycles. There are atomic operations, the should not eat much more than ordinal INC or ADD, and second CAS is invoked only one time per fSize iteriations. Darek __ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] Fpc Target directory function
Does a function like this exist: function FpcTargetDir: string; begin result:= {$I %FPCTARGETCPU%}+'-'+{$I %FPCTARGETOS%}; end; Above works for my needs, just wonder if something already exists in RTL or if it should be added for convenience. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
interlocked commands [Re: [fpc-devel] LockFree Queue algorithm]
While watching this thread, I started to ask myself 2 questions (which may be related): They just came to mind a multi-core systems where mentioned, possible even systems with several CPUs. (Admiringly they are more looking like they should be on an intel-mailing list, I just hope someone may know...) 1) procedure tFLQueue.setObject(lp : integer;const aobject : tNodeQueue); begin tab[lp and fMask]:=aObject; end; The index ("lp and fMask") has been derived via "interlockedIncrement", and the surrounding code makes sure, that only one thread will access this value at this time. But lets assume the value was read immediately before, by another core/cpu. It therefore is in that core/cpu's cache. Will this cache be invalidated/updated by a *simple* write to memory? Or will the other core/cpu see the old value from it's cache? I am no expert on this, but from the page referred below, "lock"ed cpu-instructions, take special care of this. I don't know about unlocked instructions? http://static.scribd.com/docs/59o7jahfstz7r.swf?INITIAL_VIEW=width chapter 7 Because frequently used memory locations are often cached in a processor's L1 or L2 caches, atomic operations can often be carried out inside a processor's caches without asserting the bus lock. Here the processor's cache coherency protocols insure that other processors that are caching the same memory locations are managed properly while atomic operations are performed on cached memory locations. "caching the same memory locations are managed properly while atomic operations are performed" What does the cache coherency do (if anything) while non atomic operations are performed? 2) I found various references that interlockedIncrement and co, work only on 32 bit bounded data? This may or may not vary on the CPU. The Intel doc only says, it will affect execution time, but looking at the MS doc http://msdn2.microsoft.com/en-us/library/ms683614.aspx it says it must be on a 32bit boundary. Does that affect FPC? ( as there may be none intel CPUs?) If so, then the Implementation of the Queue would have to ensure alignment (as I believe fpc, aligns integer on 16 bit?) Martin Florian Klaempfl wrote: DarekM schrieb: Martin Friebe pisze: You will need to test it, but the following may also work procedure tFlQueue.push(const tm : tNodeQueue); var newTemp, lastTail, newTail : integer; begin newTemp := temp; while newTemp >= fsize begin // if it is true, every thread is going to attempt to fix it, before doing the increment. // this means no thread will increase the value // => one thread will to succeed (since the only reason "temp NE newTemp" is that temp has been decreased) // newTemp is bigger than fsize, so the result can not become negative. interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp)); newTemp := temp; end; newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are always in range lastTail:=newTemp-1; if lastTail < 0 then lastTail := fsize -1; setObject(lastTail,tm); // you can remove the "mod" in setobject repeat pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail)); until (newTail=lastTail); end; It seems ok, but then we have 2 IF more. An if is unimportant, more important is the number of locked operations, especially on multi core systems they might eat hundreds of clock cycles. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LockFree Queue algorithm
DarekM schrieb: > Martin Friebe pisze: >> You will need to test it, but the following may also work >> >> procedure tFlQueue.push(const tm : tNodeQueue); >> var >> newTemp, >> lastTail, >> newTail : integer; >> begin >> newTemp := temp; >> while newTemp >= fsize begin >>// if it is true, every thread is going to attempt to fix it, >> before doing the increment. >>// this means no thread will increase the value >>// => one thread will to succeed (since the only reason "temp NE >> newTemp" is that temp has been decreased) >>// newTemp is bigger than fsize, so the result can not become >> negative. >> >> interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp)); >> >> newTemp := temp; >> end; > newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are > always in range >> lastTail:=newTemp-1; >> if lastTail < 0 then lastTail := fsize -1; >> setObject(lastTail,tm); // you can remove the "mod" in setobject >> repeat >> >> pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail)); >> >> until (newTail=lastTail); >> end; >> > It seems ok, but then we have 2 IF more. An if is unimportant, more important is the number of locked operations, especially on multi core systems they might eat hundreds of clock cycles. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LockFree Queue algorithm
Martin Friebe pisze: You will need to test it, but the following may also work procedure tFlQueue.push(const tm : tNodeQueue); var newTemp, lastTail, newTail : integer; begin newTemp := temp; while newTemp >= fsize begin // if it is true, every thread is going to attempt to fix it, before doing the increment. // this means no thread will increase the value // => one thread will to succeed (since the only reason "temp NE newTemp" is that temp has been decreased) // newTemp is bigger than fsize, so the result can not become negative. interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp)); newTemp := temp; end; newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are always in range lastTail:=newTemp-1; if lastTail < 0 then lastTail := fsize -1; setObject(lastTail,tm); // you can remove the "mod" in setobject repeat pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail)); until (newTail=lastTail); end; It seems ok, but then we have 2 IF more. And the same should repeat with pop The "repeat" at the end does still work. In order to set "tail" to "0" (newTemp=0) => "tail" must first get "9" (as lastTail will be 9). Only flaw is, if fsize is close enough to the upper data tpye boundary, and enough threads do InterLockedInvrement, (after all having simultaneously passed the repeat at the start), then you can still run over the boundary. However this can only happen if you have more threads, than (upperBoundary - fsize). So you can specify, that it is save up to this amount of parallel threads Because we cant change size of queue in run, we have to make it so big that always will be enough. For me it should be ten time bigger, than observe maximum (1000 its only 4 kB of ram). IMHO the unit should have a warning (in the comments) that there it is the users responsibility *not* to store more elements than fsize. There is no check, and you will loose data, if you try to do so. Ok, Ive write warning in source And i've upload new version of program. Darek ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LockFree Queue algorithm
You will need to test it, but the following may also work procedure tFlQueue.push(const tm : tNodeQueue); var newTemp, lastTail, newTail : integer; begin newTemp := temp; while newTemp >= fsize begin // if it is true, every thread is going to attempt to fix it, before doing the increment. // this means no thread will increase the value // => one thread will to succeed (since the only reason "temp NE newTemp" is that temp has been decreased) // newTemp is bigger than fsize, so the result can not become negative. interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp)); newTemp := temp; end; newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are always in range lastTail:=newTemp-1; if lastTail < 0 then lastTail := fsize -1; setObject(lastTail,tm); // you can remove the "mod" in setobject repeat pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail)); until (newTail=lastTail); end; The "repeat" at the end does still work. In order to set "tail" to "0" (newTemp=0) => "tail" must first get "9" (as lastTail will be 9). Only flaw is, if fsize is close enough to the upper data tpye boundary, and enough threads do InterLockedInvrement, (after all having simultaneously passed the repeat at the start), then you can still run over the boundary. However this can only happen if you have more threads, than (upperBoundary - fsize). So you can specify, that it is save up to this amount of parallel threads IMHO the unit should have a warning (in the comments) that there it is the users responsibility *not* to store more elements than fsize. There is no check, and you will loose data, if you try to do so. But as I said, no warranty that this will work, I haven't done any testing... Martin DarekM wrote: Martin Friebe pisze: What about a long running (eg daemon) application? If temp/tail hits the upper boundary of Integer? (If I understand it correctly) I don't know if interlockedIncrement gives a boundary error, but if not, it still fails. - With currently integer, it gets a negative value, once crossing 0x7fff, and SetObject will attempt to read/write out-of-bounds memory. - Assuming temp/tail being unsigned: it will go from 0x to 0. "0x mod fsize" may return a value greater 0, "0x00 mod fsize" will be zero. You make an unexpected jump within the list. Thats right. But I can avoid this. 1. make length of tab equal power of two 2. use longword instead of integer 3.instead MOD use result:=tab[lp and fmask]; and fmask:=$F or $FF or $FFF Thanks to find bug. Darek ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LockFree Queue algorithm
Martin Friebe pisze: What about a long running (eg daemon) application? If temp/tail hits the upper boundary of Integer? (If I understand it correctly) I don't know if interlockedIncrement gives a boundary error, but if not, it still fails. - With currently integer, it gets a negative value, once crossing 0x7fff, and SetObject will attempt to read/write out-of-bounds memory. - Assuming temp/tail being unsigned: it will go from 0x to 0. "0x mod fsize" may return a value greater 0, "0x00 mod fsize" will be zero. You make an unexpected jump within the list. Thats right. But I can avoid this. 1. make length of tab equal power of two 2. use longword instead of integer 3.instead MOD use result:=tab[lp and fmask]; and fmask:=$F or $FF or $FFF Thanks to find bug. Darek ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LockFree Queue algorithm
What about a long running (eg daemon) application? If temp/tail hits the upper boundary of Integer? (If I understand it correctly) I don't know if interlockedIncrement gives a boundary error, but if not, it still fails. - With currently integer, it gets a negative value, once crossing 0x7fff, and SetObject will attempt to read/write out-of-bounds memory. - Assuming temp/tail being unsigned: it will go from 0x to 0. "0x mod fsize" may return a value greater 0, "0x00 mod fsize" will be zero. You make an unexpected jump within the list. Martin DarekM wrote: Hi This is my proposition of algorithm and its implementing multithreaded FIFO queue without lock. First use array of pointers to handle messages. I've use it in my program, it works. I think it may by useful. site: http://www.emadar.com/fpc/lockfree.htm source: http://www.emadar.com/fpc/flqueue.pas There is also generic implementation of the same algorithm. Its my first generic approach. Any help appreciated. it;s not so much text, but source is very short, its explain much better. Darek ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LockFree Queue algorithm
Micha Nelissen pisze: DarekM wrote: Hi This is my proposition of algorithm and its implementing multithreaded FIFO queue without lock. Hmm 'Push' and 'Pop' sound like a stack, but the implementation seems to implement a FIFO indeed, with a head and tail. It should be enqueue and dequeue. I was left because first I try implement stack. You're sure about the situation where two threads are accessing head and tail at the same time, and there is only one item in your FIFO ? Of course (if CAS (interlockedCompareExchange) has proper implementation , that, I think, should be avoided Queue is empty when tail=head, when is not equal only one thread can receive object form queue, second make again iteration The same with ABA problem. Darek ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LockFree Queue algorithm
DarekM wrote: > Hi > This is my proposition of algorithm and its implementing multithreaded > FIFO queue without lock. Hmm 'Push' and 'Pop' sound like a stack, but the implementation seems to implement a FIFO indeed, with a head and tail. You're sure about the situation where two threads are accessing head and tail at the same time, and there is only one item in your FIFO ? Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel