Re: [Flashcoders] faster, longer, better ... for programming maniaks
Hm yes, there's no difference between For and while. What do you think about a[ a.length ] and a.push(); ? How did you get the decompiles function ? Thanks for pushing it. L Juan Pablo Califano a écrit : I'd take these results with a pinch of salt. I don't think they're conclusive, since one test seems to affect the performance of the others (and we are talking about really small differences anyway, which I think could be attributed to other factors than the tested code itself). Let's take, for instance the for / while tests. If I run all your tests, I get results similar to yours. takeLengthPlusOut : 1958 takeForWithLengthPlus : 1788 However, if I run just those two tests the results change. And if I alter the order in which the tests are run, the first one always seems to take less time. // running just these two, tracing the results in the same order the loops are executed takeLengthPlusOut : 1876 takeForWithLengthPlus : 2003 takeLengthPlusOut : 1876 takeForWithLengthPlus : 1935 takeForWithLengthPlus : 1874 takeLengthPlusOut : 1946 takeForWithLengthPlus : 1861 takeLengthPlusOut : 1935 I think this particular for / while case is very illustrative of some external bias, because the execution order consistently affects the results and because if you disassemble both loops, they're nearly identical. The only difference is the order in which one operation previous to the loop is executed (it's not even the body of the loop or its conditional test). FOR: function takeForWithLengthPlus():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushnan 10setlocal3 11findpropstrict Array 13constructprop Array (0) 16coerce Array 18setlocal2 19findpropstrict flash.utils::getTimer 21callproperty flash.utils::getTimer (0) 24convert_d 25setlocal3 26pushbyte 0 28setlocal1 29jump L1 L2: 33label 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull 41inclocal_i 1 L1: 43getlocal1 44pushint1000 // 0x989680 46iflt L2 50pushstring takeForWithLengthPlus : 52findpropstrict flash.utils::getTimer 54callproperty flash.utils::getTimer (0) 57getlocal3 58subtract 59add 60returnvalue } WHILE: function takeLengthPlusOut():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushnan 10setlocal3 11pushbyte 0 13setlocal1 14findpropstrict Array 16constructprop Array (0) 19coerce Array 21setlocal2 22findpropstrict flash.utils::getTimer 24callproperty flash.utils::getTimer (0) 27convert_d 28setlocal3 29jump L1 L2: 33label 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull 41inclocal_i 1 L1: 43getlocal1 44pushint1000 // 0x989680 46iflt L2 50pushstring takeLengthPlusOut : 52findpropstrict flash.utils::getTimer 54callproperty flash.utils::getTimer (0) 57getlocal3 58subtract 59add 60returnvalue } The only difference is that in the foor loop, i = 0 is run immediately before testing i against 1000: 26pushbyte 0 28setlocal1 whereas in the while loop, that assignment is executed before constructing the array: 11pushbyte 0 13setlocal1 Cheers Juan Pablo Califano 2008/7/29, laurent [EMAIL PROTECTED]: Hi, I often asked myself if a[ a.length ] = xxx was faster or slower then a.push( xxx ), I did some test at wake up, fresh with coffee. So now I know the answer and I got a bit more about while and for, and more obvious about using them with decremental or incremental counters. from results, means faster : for while hey yes...oO increment decrement length push increment or certainly make any number operation at same time than putting the value in variable is slower than separate those actions, like: a[ a.length ] = i++; slower than: i++; a[ a.length ] = i; while incremental is faster than for decremental. This is a totaly useless
Re: [Flashcoders] faster, longer, better ... for programming maniaks
My undestanding is that push should be slower than accessing the index directly, but take that as common sense: a function call should involve a bit more processing, but I don't know the specifics. I do know that both compiles to different bytecode. Off the top of my head, it was 4 ops, something like push value push index push theobject setproperty== theobject[index] = value Making a push involves a callProperty operation, where you pass the object, the method and the arguments. Maybe it's the same number of operations, but I think the call operation is a bit more complex internally, as it creates a new frame on the stack, it has to pass the arguments, executes the function, clears the stack frame and then returns back to the callee. For disassembling, I'm using abcdump, a tool that is included in the Tamarin project. You can find a compiled version for Windows and use instructions here (though it's super easy to use): http://iteratif.free.fr/blog/index.php?2006/11/15/61-un-premier-decompileur-as3 It's in french, but looking at the text added to your reply, I assume you'll have not problems with that ;) Another post about the subject, in English http://www.5etdemi.com/blog/archives/2007/01/as3-decompiler/ Cheers Juan Pablo Califano 2008/7/30, laurent [EMAIL PROTECTED]: Hm yes, there's no difference between For and while. What do you think about a[ a.length ] and a.push(); ? How did you get the decompiles function ? Thanks for pushing it. L Juan Pablo Califano a écrit : I'd take these results with a pinch of salt. I don't think they're conclusive, since one test seems to affect the performance of the others (and we are talking about really small differences anyway, which I think could be attributed to other factors than the tested code itself). Let's take, for instance the for / while tests. If I run all your tests, I get results similar to yours. takeLengthPlusOut : 1958 takeForWithLengthPlus : 1788 However, if I run just those two tests the results change. And if I alter the order in which the tests are run, the first one always seems to take less time. // running just these two, tracing the results in the same order the loops are executed takeLengthPlusOut : 1876 takeForWithLengthPlus : 2003 takeLengthPlusOut : 1876 takeForWithLengthPlus : 1935 takeForWithLengthPlus : 1874 takeLengthPlusOut : 1946 takeForWithLengthPlus : 1861 takeLengthPlusOut : 1935 I think this particular for / while case is very illustrative of some external bias, because the execution order consistently affects the results and because if you disassemble both loops, they're nearly identical. The only difference is the order in which one operation previous to the loop is executed (it's not even the body of the loop or its conditional test). FOR: function takeForWithLengthPlus():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushnan 10setlocal3 11findpropstrict Array 13constructprop Array (0) 16coerce Array 18setlocal2 19findpropstrict flash.utils::getTimer 21callproperty flash.utils::getTimer (0) 24convert_d 25setlocal3 26pushbyte 0 28setlocal1 29jump L1 L2: 33label 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull 41inclocal_i 1 L1: 43getlocal1 44pushint1000 // 0x989680 46iflt L2 50pushstring takeForWithLengthPlus : 52findpropstrict flash.utils::getTimer 54callproperty flash.utils::getTimer (0) 57getlocal3 58subtract 59add 60returnvalue } WHILE: function takeLengthPlusOut():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushnan 10setlocal3 11pushbyte 0 13setlocal1 14findpropstrict Array 16constructprop Array (0) 19coerce Array 21setlocal2 22findpropstrict flash.utils::getTimer 24callproperty flash.utils::getTimer (0) 27convert_d 28setlocal3 29jump L1 L2: 33label 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull 41inclocal_i 1 L1: 43getlocal1 44
Re: [Flashcoders] faster, longer, better ... for programming maniaks
I will use while and length for futur codes, even if the diffenrences seems very little. Merci pour les liens et les precisions :] L Juan Pablo Califano a écrit : My undestanding is that push should be slower than accessing the index directly, but take that as common sense: a function call should involve a bit more processing, but I don't know the specifics. I do know that both compiles to different bytecode. Off the top of my head, it was 4 ops, something like push value push index push theobject setproperty== theobject[index] = value Making a push involves a callProperty operation, where you pass the object, the method and the arguments. Maybe it's the same number of operations, but I think the call operation is a bit more complex internally, as it creates a new frame on the stack, it has to pass the arguments, executes the function, clears the stack frame and then returns back to the callee. For disassembling, I'm using abcdump, a tool that is included in the Tamarin project. You can find a compiled version for Windows and use instructions here (though it's super easy to use): http://iteratif.free.fr/blog/index.php?2006/11/15/61-un-premier-decompileur-as3 It's in french, but looking at the text added to your reply, I assume you'll have not problems with that ;) Another post about the subject, in English http://www.5etdemi.com/blog/archives/2007/01/as3-decompiler/ Cheers Juan Pablo Califano 2008/7/30, laurent [EMAIL PROTECTED]: Hm yes, there's no difference between For and while. What do you think about a[ a.length ] and a.push(); ? How did you get the decompiles function ? Thanks for pushing it. L Juan Pablo Califano a écrit : I'd take these results with a pinch of salt. I don't think they're conclusive, since one test seems to affect the performance of the others (and we are talking about really small differences anyway, which I think could be attributed to other factors than the tested code itself). Let's take, for instance the for / while tests. If I run all your tests, I get results similar to yours. takeLengthPlusOut : 1958 takeForWithLengthPlus : 1788 However, if I run just those two tests the results change. And if I alter the order in which the tests are run, the first one always seems to take less time. // running just these two, tracing the results in the same order the loops are executed takeLengthPlusOut : 1876 takeForWithLengthPlus : 2003 takeLengthPlusOut : 1876 takeForWithLengthPlus : 1935 takeForWithLengthPlus : 1874 takeLengthPlusOut : 1946 takeForWithLengthPlus : 1861 takeLengthPlusOut : 1935 I think this particular for / while case is very illustrative of some external bias, because the execution order consistently affects the results and because if you disassemble both loops, they're nearly identical. The only difference is the order in which one operation previous to the loop is executed (it's not even the body of the loop or its conditional test). FOR: function takeForWithLengthPlus():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushnan 10setlocal3 11findpropstrict Array 13constructprop Array (0) 16coerce Array 18setlocal2 19findpropstrict flash.utils::getTimer 21callproperty flash.utils::getTimer (0) 24convert_d 25setlocal3 26pushbyte 0 28setlocal1 29jump L1 L2: 33label 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull 41inclocal_i 1 L1: 43getlocal1 44pushint1000 // 0x989680 46iflt L2 50pushstring takeForWithLengthPlus : 52findpropstrict flash.utils::getTimer 54callproperty flash.utils::getTimer (0) 57getlocal3 58subtract 59add 60returnvalue } WHILE: function takeLengthPlusOut():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushnan 10setlocal3 11pushbyte 0 13setlocal1 14findpropstrict Array 16constructprop Array (0) 19coerce Array 21setlocal2 22findpropstrict flash.utils::getTimer 24callproperty flash.utils::getTimer (0) 27convert_d 28setlocal3 29jump L1 L2: 33label 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull
Re: [Flashcoders] faster, longer, better ... for programming maniaks
Can you decompile the push method to see how it use the stack ? :) Juan Pablo Califano a écrit : PD: To add to how accessing the last element in the array works, this is the relevant bit: 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull local2 is the array and local1 is the i variable: So, considering this snippet as a self contained block regarding stack state, breaking it down what happens is this: stack state actionscript pseudo-equivalent 34 theArray 35 theArray, theArray 36 theArray, theArray.length 38 theArray, theArray.length, variable_i 39 [empty] theArray[theArray.length] = variable_i Cheers Juan Pablo Califano 2008/7/30, Juan Pablo Califano [EMAIL PROTECTED]: My undestanding is that push should be slower than accessing the index directly, but take that as common sense: a function call should involve a bit more processing, but I don't know the specifics. I do know that both compiles to different bytecode. Off the top of my head, it was 4 ops, something like push value push index push theobject setproperty== theobject[index] = value Making a push involves a callProperty operation, where you pass the object, the method and the arguments. Maybe it's the same number of operations, but I think the call operation is a bit more complex internally, as it creates a new frame on the stack, it has to pass the arguments, executes the function, clears the stack frame and then returns back to the callee. For disassembling, I'm using abcdump, a tool that is included in the Tamarin project. You can find a compiled version for Windows and use instructions here (though it's super easy to use): http://iteratif.free.fr/blog/index.php?2006/11/15/61-un-premier-decompileur-as3 It's in french, but looking at the text added to your reply, I assume you'll have not problems with that ;) Another post about the subject, in English http://www.5etdemi.com/blog/archives/2007/01/as3-decompiler/ Cheers Juan Pablo Califano 2008/7/30, laurent [EMAIL PROTECTED]: Hm yes, there's no difference between For and while. What do you think about a[ a.length ] and a.push(); ? How did you get the decompiles function ? Thanks for pushing it. L Juan Pablo Califano a écrit : I'd take these results with a pinch of salt. I don't think they're conclusive, since one test seems to affect the performance of the others (and we are talking about really small differences anyway, which I think could be attributed to other factors than the tested code itself). Let's take, for instance the for / while tests. If I run all your tests, I get results similar to yours. takeLengthPlusOut : 1958 takeForWithLengthPlus : 1788 However, if I run just those two tests the results change. And if I alter the order in which the tests are run, the first one always seems to take less time. // running just these two, tracing the results in the same order the loops are executed takeLengthPlusOut : 1876 takeForWithLengthPlus : 2003 takeLengthPlusOut : 1876 takeForWithLengthPlus : 1935 takeForWithLengthPlus : 1874 takeLengthPlusOut : 1946 takeForWithLengthPlus : 1861 takeLengthPlusOut : 1935 I think this particular for / while case is very illustrative of some external bias, because the execution order consistently affects the results and because if you disassemble both loops, they're nearly identical. The only difference is the order in which one operation previous to the loop is executed (it's not even the body of the loop or its conditional test). FOR: function takeForWithLengthPlus():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushnan 10setlocal3 11findpropstrict Array 13constructprop Array (0) 16coerce Array 18setlocal2 19findpropstrict flash.utils::getTimer 21callproperty flash.utils::getTimer (0) 24convert_d 25setlocal3 26pushbyte 0 28setlocal1 29jump L1 L2: 33label 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull 41inclocal_i 1 L1: 43getlocal1 44pushint1000 // 0x989680 46iflt L2 50pushstring takeForWithLengthPlus : 52findpropstrict flash.utils::getTimer 54callproperty flash.utils::getTimer (0) 57getlocal3 58subtract 59add 60returnvalue } WHILE: function takeLengthPlusOut():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0
Re: [Flashcoders] faster, longer, better ... for programming maniaks
If you mean decompiling the push method itself, you can't because it's not actioscript but a native code, implemented directly in the player. If you mean how the push method is called, it'd be something like this: Actionscript: function test():void { var i:int = 0; var arr:Array = new Array(); arr.push(i); } Disassembled bytecode: function test():void /* disp_id 0*/ { // local_count=3 max_scope=1 max_stack=2 code_len=26 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushbyte 0 11setlocal1 12findpropstrict Array 14constructprop Array (0) 17coerce Array 19setlocal2 20getlocal2 21getlocal1 22callpropvoid http://adobe.com/AS3/2006/builtin::push (1) 25returnvoid } The relevant part is this (local2 is the array and local1 the variable i) 20getlocal2 21getlocal1 22callpropvoid http://adobe.com/AS3/2006/builtin::push (1) Basically, you push the array onto the stack, then the arguments (the i variable), and then use the callpropvoid native method. That method pops the stack to get the arguments (the number of arguments is specified by the caller, in this case it's 1 as you can see between the parens), and then it pops the stack again to get the object (the array in this case). Then the player calls the method passed to callpropvoid (push), on the array, and passes the arguments to it (the variable i). If push returned a value, callprop would have been used instead of callpropvoid. ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
Re: [Flashcoders] faster, longer, better ... for programming maniaks
On Wed, Jul 30, 2008 at 5:44 PM, laurent [EMAIL PROTECTED] wrote: Can you decompile the push method to see how it use the stack ? :) I take the :) you already know the answer, but just in case, it's intrinsic and you can probably look the C++ code up in the Tamarin sources. I have to agree with Juan that you shouldn't take those results seriously. The numbers are closer to each other than the margin of error. When I set up such a test (sorry, no code on this machine), I wait a few frames so the player is fully initialized, and then run several test of each variant. I take the best result and discard the rest -- the average or median are worthless, they just tell you how much other processes on the system interfere with the test, and the worst tells you if the garbage collector did its round. In any case, don't go down the X does Y the fastest, so I'll only use X from now on route. If you're writing some selected inner loops or a math library, OK, but most of your code won't get executed thousands of times each frame, so it's much more important that it's readable and easy to understand for others (including future-you). This is especially true if your fastest solution is verbose and repetitive -- it's inconvenient if you have to modify it in the future, add traces for debugging or step through with the debugger. Also, whether writing something in a single line or two is faster depends on the compiler, and might change when better optimization is introduced in a future version. Common ways of doing something are probably more likely to get optimized. Stating the obvious, try to find a better algorithm first. And finally, have a look at haXe. That touches the compiler optimization part again -- the haXe compiler knows much about your code than the AS3 compilers, so it can do better optimization. Part of it is explained here: http://blog.haxe.org/entry/31 Note inlining and haxe.rtti.Generic. For an AS3 vs haXe example, read: http://blog.haxe.org/entry/35 Mark ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
RE: [Flashcoders] faster, longer, better ... for programming maniaks
Juan Pablo Califano wrote: If you mean decompiling the push method itself, you can't because it's not actioscript but a native code, implemented directly in the player. Nice work, Juan Pablo. The code you have been posting prompts me to comment on the underlying mechanism of Flash. I know, from experience, that a lot of Flash coders (and Director, and Java) don't understand about bytecode vs. native code. If you're writing in a true compiled language like C++, your code will compile to machine language specific to your CPU. Machine code is 1's and 0's, the on/off switches that are the basis of any binary computer. Flash is cross-platform, though. It has to work on Intel processors, PowerPC, and others. It has to work on different OS's like Windows, Mac, and Unix. The machine code is different for every processor, and the implementation is specific to an OS. So, the Flash compiler can't compile to machine code. Instead, Macromedia, and now Adobe, have written a player for each of the supported platforms. The player is in machine code (ones and zeros), but our ActionScript code is not. ActionScript compiles to an intermediate bytecode, or token. The player reads these tokens, and executes the appropriate machine code. That's what makes Flash slower than C++, and also more secure--it's much more difficult to write malicious code if you don't have direct access to the machine, but have to go through an interpreter. This idea has been around for 25 years or so. The first implementation I used was UCSC Pascal, which, like Flash, compiled down to an intermediate token which was, in turn interpreted and executed by the player (we called it a virtual machine back then). It has only been in the last 10 years or so that machines have gotten fast enough to run this sort of code satisfactorily. If you understand this, you can find the bottlenecks in your code more easily, and optimize it. Loops are often the main culprit, as they have to interpret the bytecode each time through the loop. Also, if you're working with something with a fixed length like an array or XML nodes (really the same thing), it's faster if you store the length of the array in a register variable. An illustration: var arrLen:int; arrLen = myArray.length(); for (var i:int; i arrLen; i++) works faster than for (var i:int; i myArray.length(); i++) Cordially, Kerry Thompson ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
Re: [Flashcoders] faster, longer, better ... for programming maniaks
Nitpicking, but just as anything digital the SWF opcodes essentially are 1s and 0s, too. :) Anyway, the new VM supports JIT compilation to native machine code. I must admit I don't know if /all/ code gets JIT compiled or only hotspots, and I don't know if it will be recompiled for each use to hardcode variables, but that would also have implications. Mark On Wed, Jul 30, 2008 at 8:21 PM, Kerry Thompson [EMAIL PROTECTED] wrote: Juan Pablo Califano wrote: If you mean decompiling the push method itself, you can't because it's not actioscript but a native code, implemented directly in the player. Nice work, Juan Pablo. The code you have been posting prompts me to comment on the underlying mechanism of Flash. I know, from experience, that a lot of Flash coders (and Director, and Java) don't understand about bytecode vs. native code. If you're writing in a true compiled language like C++, your code will compile to machine language specific to your CPU. Machine code is 1's and 0's, the on/off switches that are the basis of any binary computer. Flash is cross-platform, though. It has to work on Intel processors, PowerPC, and others. It has to work on different OS's like Windows, Mac, and Unix. The machine code is different for every processor, and the implementation is specific to an OS. So, the Flash compiler can't compile to machine code. Instead, Macromedia, and now Adobe, have written a player for each of the supported platforms. The player is in machine code (ones and zeros), but our ActionScript code is not. ActionScript compiles to an intermediate bytecode, or token. The player reads these tokens, and executes the appropriate machine code. That's what makes Flash slower than C++, and also more secure--it's much more difficult to write malicious code if you don't have direct access to the machine, but have to go through an interpreter. This idea has been around for 25 years or so. The first implementation I used was UCSC Pascal, which, like Flash, compiled down to an intermediate token which was, in turn interpreted and executed by the player (we called it a virtual machine back then). It has only been in the last 10 years or so that machines have gotten fast enough to run this sort of code satisfactorily. If you understand this, you can find the bottlenecks in your code more easily, and optimize it. Loops are often the main culprit, as they have to interpret the bytecode each time through the loop. Also, if you're working with something with a fixed length like an array or XML nodes (really the same thing), it's faster if you store the length of the array in a register variable. An illustration: var arrLen:int; arrLen = myArray.length(); for (var i:int; i arrLen; i++) works faster than for (var i:int; i myArray.length(); i++) Cordially, Kerry Thompson ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
RE: [Flashcoders] faster, longer, better ... for programming maniaks
Mark Winterhalder wrote: Nitpicking, but just as anything digital the SWF opcodes essentially are 1s and 0s, too. :) Fair enough. Following that to its logical conclusion, _everything_ on your computer is 1s and 0s, including the text in this email ^_^ You clearly understand what I was saying, Mark, but just a brief reiteration: compiled ActionScript has to be interpreted by the VM, which is _always_ slower than compiling directly to machine language. When I was doing Director full time, I ran some tests that showed C++ to run up to 400 times as fast as Lingo. I lobbied for years to get a true machine-language compiler for Lingo, at least for desktop apps. I was struck by how few developers understood the implications, and without other developers clamoring for the need for speed, Macromedia never went there. Director could have been a major player in the 3D game world. And don't tell me that Director 3D is fast enough. Hard-core gamers buy $8,000 machines to squeeze every last fps out of their games. With lights, shaders, high-poly objects, multiple cameras, Director is just not fast enough for a Quake or Doom LAN party. And, of course, neither is Flash. Anyway, the new VM supports JIT compilation to native machine code. I must admit I don't know if /all/ code gets JIT compiled or only hotspots, and I don't know if it will be recompiled for each use to hardcode variables, but that would also have implications. One major implication would be in loops. The complier would have no way of knowing if an array would change length in a loop, for example, so it couldn't hard code the length. Cordially, Kerry Thompson ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
Re: [Flashcoders] faster, longer, better ... for programming maniaks
You clearly understand what I was saying, Mark, but just a brief reiteration: compiled ActionScript has to be interpreted by the VM, which is _always_ slower than compiling directly to machine language. Yes, I understand and am not even disagreeing. :) However, there have been benchmarks where Java was actually marginally /faster/ than C++ for some specific tests. This seems counterintuitive, but the JIT compiler knows more at runtime than the traditional compiler can know in advance, and I'm guessing that's why it can (not generally, but in some rare situations) do better optimizations. When you have ideal programmers then of course compiled languages will be faster, but that difference is getting less as technology evolves. But of course we're talking about the Flashplayer here, and size, portability and start-up time are more important design goals than execution speed, so we'll most definitely will always have to live with a very noticeable performance penalty. Then again, we don't have to manage memory ourselves, which is a big plus. (Btw memory allocation and optimization: recycling instances where possible is also a good idea.) If somebody knows a good explanation about the when and how of the AVM2 JIT compiler, I'd be curious. The same goes for a table that shows relative performance of stuff the renderer does -- like with alpha vs. without, if rendering time grows linear with the number of pixels, how much time is wasted on DisplayObjects outside of the visible Stage, stuff like that. Mark On Wed, Jul 30, 2008 at 9:05 PM, Kerry Thompson [EMAIL PROTECTED] wrote: Mark Winterhalder wrote: Nitpicking, but just as anything digital the SWF opcodes essentially are 1s and 0s, too. :) Fair enough. Following that to its logical conclusion, _everything_ on your computer is 1s and 0s, including the text in this email ^_^ You clearly understand what I was saying, Mark, but just a brief reiteration: compiled ActionScript has to be interpreted by the VM, which is _always_ slower than compiling directly to machine language. When I was doing Director full time, I ran some tests that showed C++ to run up to 400 times as fast as Lingo. I lobbied for years to get a true machine-language compiler for Lingo, at least for desktop apps. I was struck by how few developers understood the implications, and without other developers clamoring for the need for speed, Macromedia never went there. Director could have been a major player in the 3D game world. And don't tell me that Director 3D is fast enough. Hard-core gamers buy $8,000 machines to squeeze every last fps out of their games. With lights, shaders, high-poly objects, multiple cameras, Director is just not fast enough for a Quake or Doom LAN party. And, of course, neither is Flash. Anyway, the new VM supports JIT compilation to native machine code. I must admit I don't know if /all/ code gets JIT compiled or only hotspots, and I don't know if it will be recompiled for each use to hardcode variables, but that would also have implications. One major implication would be in loops. The complier would have no way of knowing if an array would change length in a loop, for example, so it couldn't hard code the length. Cordially, Kerry Thompson ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
Re: [Flashcoders] faster, longer, better ... for programming maniaks
Check these slides: http://www.onflex.org/ACDS/AS3TuningInsideAVM2JIT.pdf From page 43: * We make a simple hotspot-like decision about whether to interpret or JIT * Initialization functions ($init, $cinit) are interpreted * Everything else is JIT * Upshot: Don't put performance-intensive code in class initialization Cheers Juan Pablo Califano 2008/7/30, Mark Winterhalder [EMAIL PROTECTED]: You clearly understand what I was saying, Mark, but just a brief reiteration: compiled ActionScript has to be interpreted by the VM, which is _always_ slower than compiling directly to machine language. Yes, I understand and am not even disagreeing. :) However, there have been benchmarks where Java was actually marginally /faster/ than C++ for some specific tests. This seems counterintuitive, but the JIT compiler knows more at runtime than the traditional compiler can know in advance, and I'm guessing that's why it can (not generally, but in some rare situations) do better optimizations. When you have ideal programmers then of course compiled languages will be faster, but that difference is getting less as technology evolves. But of course we're talking about the Flashplayer here, and size, portability and start-up time are more important design goals than execution speed, so we'll most definitely will always have to live with a very noticeable performance penalty. Then again, we don't have to manage memory ourselves, which is a big plus. (Btw memory allocation and optimization: recycling instances where possible is also a good idea.) If somebody knows a good explanation about the when and how of the AVM2 JIT compiler, I'd be curious. The same goes for a table that shows relative performance of stuff the renderer does -- like with alpha vs. without, if rendering time grows linear with the number of pixels, how much time is wasted on DisplayObjects outside of the visible Stage, stuff like that. Mark On Wed, Jul 30, 2008 at 9:05 PM, Kerry Thompson [EMAIL PROTECTED] wrote: Mark Winterhalder wrote: Nitpicking, but just as anything digital the SWF opcodes essentially are 1s and 0s, too. :) Fair enough. Following that to its logical conclusion, _everything_ on your computer is 1s and 0s, including the text in this email ^_^ You clearly understand what I was saying, Mark, but just a brief reiteration: compiled ActionScript has to be interpreted by the VM, which is _always_ slower than compiling directly to machine language. When I was doing Director full time, I ran some tests that showed C++ to run up to 400 times as fast as Lingo. I lobbied for years to get a true machine-language compiler for Lingo, at least for desktop apps. I was struck by how few developers understood the implications, and without other developers clamoring for the need for speed, Macromedia never went there. Director could have been a major player in the 3D game world. And don't tell me that Director 3D is fast enough. Hard-core gamers buy $8,000 machines to squeeze every last fps out of their games. With lights, shaders, high-poly objects, multiple cameras, Director is just not fast enough for a Quake or Doom LAN party. And, of course, neither is Flash. Anyway, the new VM supports JIT compilation to native machine code. I must admit I don't know if /all/ code gets JIT compiled or only hotspots, and I don't know if it will be recompiled for each use to hardcode variables, but that would also have implications. One major implication would be in loops. The complier would have no way of knowing if an array would change length in a loop, for example, so it couldn't hard code the length. Cordially, Kerry Thompson ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
Re: [Flashcoders] faster, longer, better ... for programming maniaks
On Wed, Jul 30, 2008 at 11:02 PM, Juan Pablo Califano [EMAIL PROTECTED] wrote: Check these slides: http://www.onflex.org/ACDS/AS3TuningInsideAVM2JIT.pdf From page 43: * We make a simple hotspot-like decision about whether to interpret or JIT * Initialization functions ($init, $cinit) are interpreted * Everything else is JIT * Upshot: Don't put performance-intensive code in class initialization Thanks for the link, but I was hoping for something more specific, like an article that explains when the compilation happens. For example, a method could be compiled initially, when it first runs, or each time it gets called. I'm just curious, it's not important to know. Mark ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
[Flashcoders] faster, longer, better ... for programming maniaks
Hi, I often asked myself if a[ a.length ] = xxx was faster or slower then a.push( xxx ), I did some test at wake up, fresh with coffee. So now I know the answer and I got a bit more about while and for, and more obvious about using them with decremental or incremental counters. from results, means faster : for while hey yes...oO increment decrement length push increment or certainly make any number operation at same time than putting the value in variable is slower than separate those actions, like: a[ a.length ] = i++; slower than: i++; a[ a.length ] = i; while incremental is faster than for decremental. This is a totaly useless information as if you can use the while incremental instead of For decremental, then just use For incremental. I heard that any loop was compiled to a while loop so I started coding everything with a while, what is faster to write and more elegant. Now I'll go back to those For loops, I promess I never stoped loving you guys... here some convincing results : takeLengthMinus : 1695 takeLengthMinusOut : 1598 takeLengthPlus : 1580 takeLengthPlusOut : 1550 takePushMinus : 1860 takePushMinusOut : 1768 takePushPlus : 1756 takePushPlusOut : 1685 Don't compare results between separate paragraphe because they did not run all together: takeForWithLengthMinus : 1624 takeForWithLengthPlus : 1581 The For loop is directly with operation outside, so it has to be compared to the other outside operations: takeLengthMinusOut : 1686 takeLengthPlusOut : 1610 takePushMinusOut : 1788 takePushPlusOut : 1666 takeForWithLengthMinus : 1626 takeForWithLengthPlus : 1563 I guess that's why I learned to use for( i = 0; i n; i++ ) for my first loops. So if we want our code faster we have to make it longer and actually more human readable, at the same time it means more computer readable as it gets fasterhm, is the computer so close to human...?! I [ mean my brain ] actually use a dicotomic way to find my current client folder in the list of all my works. cheers. L and here the codes : function takeLengthMinus():String{ var i : int = 1000; var a: Array = new Array(); var t: Number = getTimer(); while( i-- ){ a[ a.length ] = i; } return takeLengthMinus : + ( getTimer() - t ); } function takeLengthMinusOut():String{ var i : int = 1000; var a: Array = new Array(); var t: Number = getTimer(); while( i ){ a[ a.length ] = i; i--; } return takeLengthMinusOut : + ( getTimer() - t ); } function takeLengthPlus():String{ var i : int = 0; var a: Array = new Array(); var t: Number = getTimer(); while( i 1000 ){ a[ a.length ] = i++; } return takeLengthPlus : + ( getTimer() - t ); } function takeLengthPlusOut():String{ var i : int = 0; var a: Array = new Array(); var t: Number = getTimer(); while( i 1000 ){ a[ a.length ] = i; i++; } return takeLengthPlusOut : + ( getTimer() - t ); } function takePushMinus():String{ var i : int = 1000; var a: Array = new Array(); var t: Number = getTimer(); while( i-- ){ a.push( i ); } return takePushMinus : + ( getTimer() - t ); } function takePushMinusOut():String{ var i : int = 1000; var a: Array = new Array(); var t: Number = getTimer(); while( i ){ i--; a.push( i ); } return takePushMinusOut : + ( getTimer() - t ); } function takePushPlus():String{ var i : int = 0; var a: Array = new Array(); var t: Number = getTimer(); while( i 1000 ){ a.push( i++ ); } return takePushPlus : + ( getTimer() - t ); } function takePushPlusOut():String{ var i : int = 0; var a: Array = new Array(); var t: Number = getTimer(); while( i 1000 ){ a.push( i ); i++; } return takePushPlusOut : + ( getTimer() - t ); } function takeForWithLengthMinus():String{ var i : int; var a: Array = new Array(); var t: Number = getTimer(); for( i = 1000; i 0; i-- ){ a[ a.length ] = i; } return takeForWithLengthMinus : + ( getTimer() - t ); } function takeForWithLengthPlus():String{ var i : int; var a: Array = new Array(); var t: Number = getTimer(); for( i = 0; i 1000; i++ ){ a[ a.length ] = i; } return takeForWithLengthPlus : + ( getTimer() - t ); } //trace( takeLengthMinus() ); trace( takeLengthMinusOut() ); //trace( takeLengthPlus() ); trace( takeLengthPlusOut() ); //trace( takePushMinus() ); trace( takePushMinusOut() ); //trace( takePushPlus() ); trace( takePushPlusOut() ); trace( takeForWithLengthMinus() ); trace( takeForWithLengthPlus() ); ___ Flashcoders mailing list Flashcoders@chattyfig.figleaf.com http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
Re: [Flashcoders] faster, longer, better ... for programming maniaks
I'd take these results with a pinch of salt. I don't think they're conclusive, since one test seems to affect the performance of the others (and we are talking about really small differences anyway, which I think could be attributed to other factors than the tested code itself). Let's take, for instance the for / while tests. If I run all your tests, I get results similar to yours. takeLengthPlusOut : 1958 takeForWithLengthPlus : 1788 However, if I run just those two tests the results change. And if I alter the order in which the tests are run, the first one always seems to take less time. // running just these two, tracing the results in the same order the loops are executed takeLengthPlusOut : 1876 takeForWithLengthPlus : 2003 takeLengthPlusOut : 1876 takeForWithLengthPlus : 1935 takeForWithLengthPlus : 1874 takeLengthPlusOut : 1946 takeForWithLengthPlus : 1861 takeLengthPlusOut : 1935 I think this particular for / while case is very illustrative of some external bias, because the execution order consistently affects the results and because if you disassemble both loops, they're nearly identical. The only difference is the order in which one operation previous to the loop is executed (it's not even the body of the loop or its conditional test). FOR: function takeForWithLengthPlus():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushnan 10setlocal3 11findpropstrict Array 13constructprop Array (0) 16coerce Array 18setlocal2 19findpropstrict flash.utils::getTimer 21callproperty flash.utils::getTimer (0) 24convert_d 25setlocal3 26pushbyte 0 28setlocal1 29jump L1 L2: 33label 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull 41inclocal_i 1 L1: 43getlocal1 44pushint1000 // 0x989680 46iflt L2 50pushstring takeForWithLengthPlus : 52findpropstrict flash.utils::getTimer 54callproperty flash.utils::getTimer (0) 57getlocal3 58subtract 59add 60returnvalue } WHILE: function takeLengthPlusOut():String /* disp_id 0*/ { // local_count=4 max_scope=1 max_stack=3 code_len=61 0 getlocal0 1 pushscope 2 pushbyte 0 4 setlocal1 5 pushnull 6 coerce Array 8 setlocal2 9 pushnan 10setlocal3 11pushbyte 0 13setlocal1 14findpropstrict Array 16constructprop Array (0) 19coerce Array 21setlocal2 22findpropstrict flash.utils::getTimer 24callproperty flash.utils::getTimer (0) 27convert_d 28setlocal3 29jump L1 L2: 33label 34getlocal2 35getlocal2 36getpropertylength 38getlocal1 39setpropertynull 41inclocal_i 1 L1: 43getlocal1 44pushint1000 // 0x989680 46iflt L2 50pushstring takeLengthPlusOut : 52findpropstrict flash.utils::getTimer 54callproperty flash.utils::getTimer (0) 57getlocal3 58subtract 59add 60returnvalue } The only difference is that in the foor loop, i = 0 is run immediately before testing i against 1000: 26pushbyte 0 28setlocal1 whereas in the while loop, that assignment is executed before constructing the array: 11pushbyte 0 13setlocal1 Cheers Juan Pablo Califano 2008/7/29, laurent [EMAIL PROTECTED]: Hi, I often asked myself if a[ a.length ] = xxx was faster or slower then a.push( xxx ), I did some test at wake up, fresh with coffee. So now I know the answer and I got a bit more about while and for, and more obvious about using them with decremental or incremental counters. from results, means faster : for while hey yes...oO increment decrement length push increment or certainly make any number operation at same time than putting the value in variable is slower than separate those actions, like: a[ a.length ] = i++; slower than: i++; a[ a.length ] = i; while incremental is faster than for decremental. This is a totaly useless information as if you can use the while incremental instead of For decremental, then just use For incremental. I heard that any loop was compiled to a while loop so I started coding everything with a