On Mon, 15 Aug 2011, Philippe Marschall wrote:

On 15.08.2011 18:07, Levente Uzonyi wrote:
On Mon, 15 Aug 2011, Philippe Marschall wrote:

On 15.08.2011 00:21, Levente Uzonyi wrote:
On Sun, 14 Aug 2011, Philippe Marschall wrote:

On 14.08.2011 22:00, Levente Uzonyi wrote:
On Sun, 14 Aug 2011, Philippe Marschall wrote:

Hi

In Seaside we get a lot of performance gains out of
primitiveFindFirstInString. One thing that always annoyed me a bit is
that it's not as optimized as it could be.

The inclusionMap you give it are 256 consecutive boolean values (0 or
1). There is no need for this to be a 256 element ByteArray when each
element can only be 0 or 1. We could as well make it a 32 element
ByteArray and each byte holding eight bit values. Instead of using
the
asciiValue to directly index into the inclusionMap we would use
the top
five bits to index into the inclusionMap and the bottom 3 bits to
"index
into the byte".

Did that make any sense?

Do you want to save space?

I'm trying to trade memory access (which is slow) for a bit shift and
two bit ands (which is fast).

256 bytes + object header easily fit into the L1 cache of any recent x86
CPU, even the old P4's had 8kB of it.

Sure once it's in the cache then it's in the cache. But a cache line is
only 64 bytes. So a 32 element array should fit and a 256 element
doesn't. So a 32 element array should be there in one memory access
while a 256 element array probably takes five.

A 32 element array needs 1-2 cache lines, while the 256 element array
needs 4-5. It's not that big difference IMHO. (Especially if you want to
pass a ByteArray and convert it ot a 32 byte array in C.)


Moving less bytes around may make
some difference though, but if you create your ByteArrays right before
you use them, then it's pretty likely that your object is already in the
cache.

I don't of course.


Are you storing lots of inclusion maps (maybe
CharacterSets)? If not, then IMHO it's not worth to adding this
feature
to this primitive, because runtime performance will be worse on
both the
image side and the VM side.

Why?

Filling the contents of a compact 32 sized ByteArray in the image takes
more time than a 256 byte ByteArray. On the VM side memory access +
shifts + bitands cost more than just memory access.

Uhm, it's a primitive. My understanding is this would result in "C
shifts + bitands".

Hm. Are you about to change the primitive to create an array of 32 bytes
from the ByteArray argument and use that during the search?

Of course not. I'm going to create a new primitive that takes a 32
element byte array. Anything else is pointless.

Okay, so you will create the 32 sized ByteArrays in the image as I first thought. This means that the initialization of the ByteArrays will take more time. Btw I wouldn't create a new primitive. The current primitive only accepts 256 sized ByteArrays, so adding support for 32 byte ByteArrays won't cause any trouble and you can reuse all existing image side code.



But (as usual) a
benchmark can tell if it's really worth to do it or not.

This there somewhere some documentation how one adds a primitive and
compiles a VM (Cog ideally)?

This primitive is in the image, search for the method with the string
'primitiveFindFirstInString'. The same way you can add others methods to
MiscPrimitivePlugin. IIRC there's a registry (an array in a method on
the plugin's class' side) for these methods where you should add the new
method, look for the method name (as string/symbol) to find it.

That part is clear to me.

There are multiple ways to build VMs, but I guess you want to follow the
way developed by Igor.

That part is not at all clear to me. Where can I find documentation
about this.

I guess this is what you are looking for:
http://marianopeck.wordpress.com/2011/04/10/building-the-vm-from-scratch-using-git-and-cmakevmmaker/


Levente


Cheers
Philippe




Reply via email to