On 15.08.2011 18:07, Levente Uzonyi wrote: > On Mon, 15 Aug 2011, Philippe Marschall wrote: > >> On 15.08.2011 00:21, Levente Uzonyi wrote: >>> On Sun, 14 Aug 2011, Philippe Marschall wrote: >>> >>>> On 14.08.2011 22:00, Levente Uzonyi wrote: >>>>> On Sun, 14 Aug 2011, Philippe Marschall wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> In Seaside we get a lot of performance gains out of >>>>>> primitiveFindFirstInString. One thing that always annoyed me a bit is >>>>>> that it's not as optimized as it could be. >>>>>> >>>>>> The inclusionMap you give it are 256 consecutive boolean values (0 or >>>>>> 1). There is no need for this to be a 256 element ByteArray when each >>>>>> element can only be 0 or 1. We could as well make it a 32 element >>>>>> ByteArray and each byte holding eight bit values. Instead of using >>>>>> the >>>>>> asciiValue to directly index into the inclusionMap we would use >>>>>> the top >>>>>> five bits to index into the inclusionMap and the bottom 3 bits to >>>>>> "index >>>>>> into the byte". >>>>>> >>>>>> Did that make any sense? >>>>> >>>>> Do you want to save space? >>>> >>>> I'm trying to trade memory access (which is slow) for a bit shift and >>>> two bit ands (which is fast). >>> >>> 256 bytes + object header easily fit into the L1 cache of any recent x86 >>> CPU, even the old P4's had 8kB of it. >> >> Sure once it's in the cache then it's in the cache. But a cache line is >> only 64 bytes. So a 32 element array should fit and a 256 element >> doesn't. So a 32 element array should be there in one memory access >> while a 256 element array probably takes five. > > A 32 element array needs 1-2 cache lines, while the 256 element array > needs 4-5. It's not that big difference IMHO. (Especially if you want to > pass a ByteArray and convert it ot a 32 byte array in C.) > >> >>> Moving less bytes around may make >>> some difference though, but if you create your ByteArrays right before >>> you use them, then it's pretty likely that your object is already in the >>> cache. >> >> I don't of course. >> >>>> >>>>> Are you storing lots of inclusion maps (maybe >>>>> CharacterSets)? If not, then IMHO it's not worth to adding this >>>>> feature >>>>> to this primitive, because runtime performance will be worse on >>>>> both the >>>>> image side and the VM side. >>>> >>>> Why? >>> >>> Filling the contents of a compact 32 sized ByteArray in the image takes >>> more time than a 256 byte ByteArray. On the VM side memory access + >>> shifts + bitands cost more than just memory access. >> >> Uhm, it's a primitive. My understanding is this would result in "C >> shifts + bitands". > > Hm. Are you about to change the primitive to create an array of 32 bytes > from the ByteArray argument and use that during the search?
Of course not. I'm going to create a new primitive that takes a 32 element byte array. Anything else is pointless. >> >>> But (as usual) a >>> benchmark can tell if it's really worth to do it or not. >> >> This there somewhere some documentation how one adds a primitive and >> compiles a VM (Cog ideally)? > > This primitive is in the image, search for the method with the string > 'primitiveFindFirstInString'. The same way you can add others methods to > MiscPrimitivePlugin. IIRC there's a registry (an array in a method on > the plugin's class' side) for these methods where you should add the new > method, look for the method name (as string/symbol) to find it. That part is clear to me. > There are multiple ways to build VMs, but I guess you want to follow the > way developed by Igor. That part is not at all clear to me. Where can I find documentation about this. Cheers Philippe
