On Mon, 15 Aug 2011, Philippe Marschall wrote:

On 15.08.2011 00:21, Levente Uzonyi wrote:
On Sun, 14 Aug 2011, Philippe Marschall wrote:

On 14.08.2011 22:00, Levente Uzonyi wrote:
On Sun, 14 Aug 2011, Philippe Marschall wrote:

Hi

In Seaside we get a lot of performance gains out of
primitiveFindFirstInString. One thing that always annoyed me a bit is
that it's not as optimized as it could be.

The inclusionMap you give it are 256 consecutive boolean values (0 or
1). There is no need for this to be a 256 element ByteArray when each
element can only be 0 or 1. We could as well make it a 32 element
ByteArray and each byte holding eight bit values. Instead of using the
asciiValue to directly index into the inclusionMap we would use the top
five bits to index into the inclusionMap and the bottom 3 bits to
"index
into the byte".

Did that make any sense?

Do you want to save space?

I'm trying to trade memory access (which is slow) for a bit shift and
two bit ands (which is fast).

256 bytes + object header easily fit into the L1 cache of any recent x86
CPU, even the old P4's had 8kB of it.

Sure once it's in the cache then it's in the cache. But a cache line is
only 64 bytes. So a 32 element array should fit and a 256 element
doesn't. So a 32 element array should be there in one memory access
while a 256 element array probably takes five.

A 32 element array needs 1-2 cache lines, while the 256 element array needs 4-5. It's not that big difference IMHO. (Especially if you want to pass a ByteArray and convert it ot a 32 byte array in C.)


Moving less bytes around may make
some difference though, but if you create your ByteArrays right before
you use them, then it's pretty likely that your object is already in the
cache.

I don't of course.


Are you storing lots of inclusion maps (maybe
CharacterSets)? If not, then IMHO it's not worth to adding this feature
to this primitive, because runtime performance will be worse on both the
image side and the VM side.

Why?

Filling the contents of a compact 32 sized ByteArray in the image takes
more time than a 256 byte ByteArray. On the VM side memory access +
shifts + bitands cost more than just memory access.

Uhm, it's a primitive. My understanding is this would result in "C
shifts + bitands".

Hm. Are you about to change the primitive to create an array of 32 bytes from the ByteArray argument and use that during the search?


But (as usual) a
benchmark can tell if it's really worth to do it or not.

This there somewhere some documentation how one adds a primitive and
compiles a VM (Cog ideally)?

This primitive is in the image, search for the method with the string 'primitiveFindFirstInString'. The same way you can add others methods to MiscPrimitivePlugin. IIRC there's a registry (an array in a method on the plugin's class' side) for these methods where you should add the new method, look for the method name (as string/symbol) to find it.

There are multiple ways to build VMs, but I guess you want to follow the way developed by Igor.


Levente


Cheers
Philippe




Reply via email to