Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3

2019-06-11 Thread Mark H Weaver
Hi Arne,

Arne Babenhauserheide  writes:

> I don’t understand the implications of this, but I realized that I would
> love to have info about the sizes of different datastructures in Guile.

Sure.

> I recently started measuring how much memory is required for a record
> compared to a list, so I could give people concrete guidelines which
> datastructures to use for which algorithm, and for that such low-level
> details would be great to have! (or just to know where to read them up —
> or which keywords to look for)

For core data structures implemented in C, the place to look is in the
corresponding *.h and *.c files in libguile, for example struct.[ch],
strings.[ch], numbers.[ch], etc.  However, pairs receive special
handling, and only part of their implementation is in pairs.[ch].

I will summarize the costs of some important structures below:

In the following, a "word" is the size of a pointer (4 bytes on a 32-bit
system, or 8 bytes on a 64-bit system).

I will _not_ include the SCM word itself in the accounting of the size
of the object represented by that SCM.  That's because for
heap-allocated objects, the SCM is simply a pointer to the object, and
there can be many pointers to the same object.  The cost of SCM words
will instead be included in the cost of the aggregate objects that
contain them.

In this method of accounting, immediate objects cost nothing, because
they are entirely stored within the SCM itself.

Pairs cost 2 words.  Lists cost 2 words per element.  The empty list is
immediate, and therefore costs nothing.

In current 'master', records cost 1 word per field, plus another word,
rounded up to the nearest multiple of 2 words.  In 2.0 and 2.2, they
cost 1 word per field, plus another _2_ words, rounded up

Vectors cost 1 word per element, plus another word, rounded up to the
nearest multiple of 2 words.

Bytevectors cost 4 words plus the data itself, rounded up to the nearest
multiple of 2 words.  As a special case, empty bytevectors cost nothing.

Non-immediate inexact real numbers cost 16 bytes.  Non-immediate complex
numbers cost 24 bytes on 32-bit systems, or 32 bytes on 64-bit systems.

In the common case, strings currently cost 6 words plus the character
data itself, rounded up to the nearest multiple of 2 words.  The
character data costs either 1 byte per character (if all code points are
less than 256, i.e. the string is latin-1), or 4 bytes per character.
(However, I may propose a new string representation for 3.0).

As a special case, the empty string costs nothing.

Finally, I should mention that the ability to share substructure in
lists and pairs make them potentially more memory efficient than
vectors, depending on the use case.  For example, if you need a
structure with 4 fields, and a common operation in your application will
involve copying the structure except for one field which is changed,
then you might be able save memory by using lists, even though a
4-element list requires more memory than a 4-element vector or struct.
If you put the commonly-changed field in the CAR, then you can update
that field by allocating only a single pair (2 words) and sharing the
rest of the structure with the old version.  This is not possible with
vectors or structs, where you must allocate a fresh new object of 5 or 6
words to perform the same operation.

 Regards,
   Mark



Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3

2019-06-11 Thread Ludovic Courtès
Hi,

Mark H Weaver  skribis:

> Ludovic Courtès  writes:
>> Though an immediate, like a fixnum or an iflo, is still something
>> different from a tagged heap object like a pair, right?  So I would
>> expect SCM_THOB_P to be a different test, not a drop-in replacement for
>> SCM_NIMP, is that correct?
>
> That's right.  It's not possible to create a drop-in replacement for
> SCM_NIMP, because it is being used to answer two different questions
> which used to be effectively equivalent, but no longer are:
>
> (1) Is X a pointer to a heap object with a heap tag in the first word?
> (2) Is X a reference to a heap object?
>
> Test (1) needs to be done before checking the heap tag, to implement
> type predicates for heap objects.  Test (2) is needed in relatively few
> places, e.g. to decide whether to register disappearing links when
> adding an entry to a weak hash table.
>
> Actually, in my current branch I've removed the SCM_IMP and SCM_NIMP
> macros outright, because it seems to me they are likely to be misused.
>
> SCM_THOB_P implements test (1) and SCM_HEAP_OBJECT_P implements test (2).

I see.

> There's no masking involved.  Rather, it is subtracted from the pointer,
> which allows the tag to be fused with the field offset.  For example, on
> x86-64, whereas CAR and CDR were previously:
>
>  1c0:   48 8b 07mov(%rdi),%rax  ;old car
> and:
>  1d0:   48 8b 47 08 mov0x8(%rdi),%rax   ;old cdr
>
> Now they become:
>
>  1e0:   48 8b 47 fa mov-0x6(%rdi),%rax  ;new car
> and:
>  1f0:   48 8b 47 02 mov0x2(%rdi),%rax   ;new cdr

Looks reasonable.  :-)

> Fortunately, BDW-GC provides GC_REGISTER_DISPLACEMENT, which allows us
> to register a small offset K, such that BDW-GC should recognize pointers
> that point K bytes into a heap block.  We've been using this in both 2.0
> and 2.2 from scm_init_struct (), and in 'master' it's done in
> scm_storage_prehistory ().
>
> This new approach entails registering one additional displacement.

Cool; if it’s just one displacement, that’s OK.

> What do you think?

It all looks perfectly reasonable to me!

Thanks for explaining,
Ludo’.



Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3

2019-06-11 Thread Mark H Weaver
Ludovic Courtès  writes:
> Though an immediate, like a fixnum or an iflo, is still something
> different from a tagged heap object like a pair, right?

I should clarify that in this new approach, a pair is *not* a tagged
heap object.  Tagged heap objects are those which have a tag in their
first word.  In this new approach, pairs are not tagged in this sense,
and it is no longer possible to distinguish a pair from a non-pair by
looking at their raw heap blocks.

Tagged heap objects (thobs) have 000 in the low 3 bits of the SCM.
Pairs have 0110 in the low 4 bits of the SCM.

Regards,
  Mark



Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3

2019-06-11 Thread Mark H Weaver
Hi Ludovic,

Ludovic Courtès  writes:
> Though an immediate, like a fixnum or an iflo, is still something
> different from a tagged heap object like a pair, right?  So I would
> expect SCM_THOB_P to be a different test, not a drop-in replacement for
> SCM_NIMP, is that correct?

That's right.  It's not possible to create a drop-in replacement for
SCM_NIMP, because it is being used to answer two different questions
which used to be effectively equivalent, but no longer are:

(1) Is X a pointer to a heap object with a heap tag in the first word?
(2) Is X a reference to a heap object?

Test (1) needs to be done before checking the heap tag, to implement
type predicates for heap objects.  Test (2) is needed in relatively few
places, e.g. to decide whether to register disappearing links when
adding an entry to a weak hash table.

Actually, in my current branch I've removed the SCM_IMP and SCM_NIMP
macros outright, because it seems to me they are likely to be misused.

SCM_THOB_P implements test (1) and SCM_HEAP_OBJECT_P implements test (2).

>> (2) Our existing VM instructions almost invariably specify offsets with
>> a granularity of whole words.  To support tagged pair pointers with
>> good performance, I think we need a few new instructions that
>> specify byte offsets, to avoid the expensive extra step of removing
>> the tag before accessing the CAR or CDR of a pair.
>
> So instead of a pointer dereference, SCM_CAR becomes mask + dereference,
> right?

There's no masking involved.  Rather, it is subtracted from the pointer,
which allows the tag to be fused with the field offset.  For example, on
x86-64, whereas CAR and CDR were previously:

 1c0:   48 8b 07mov(%rdi),%rax  ;old car
and:
 1d0:   48 8b 47 08 mov0x8(%rdi),%rax   ;old cdr

Now they become:

 1e0:   48 8b 47 fa mov-0x6(%rdi),%rax  ;new car
and:
 1f0:   48 8b 47 02 mov0x2(%rdi),%rax   ;new cdr

In the VM, I've added four new instructions:

  make-tagged-non-immediate dst:12 tag:12 offset:32
  tagged-allocate-words/immediate dst:8 count:8 tag:8
  tagged-scm-ref/immediate dst:8 obj:8 byte-offset:8
  tagged-scm-set!/immediate obj:8 byte-offset:8 val:8

The last two instructions above are like 'scm-ref/immediate' and
'scm-set!/immediate' except that they accept byte offsets instead of
word offsets.  CAR and CDR become:

 (tagged-scm-ref/immediate DST SRC -6);CAR
and:
 (tagged-scm-ref/immediate DST SRC 2) ;CDR

(although at present the -6 prints as 250 in the disassembler).

> I think we disable GC “interior pointer” scanning.

I agree.

> With this scheme, an SCM for a pair would actually point in the middle
> of a pair; could this be an issue for GC?

It is already the case that Guile has tagged pointers in the first word
of every struct.  The first word of a struct contains a pointer to the
vtable, with scm_tc3_struct added.

Fortunately, BDW-GC provides GC_REGISTER_DISPLACEMENT, which allows us
to register a small offset K, such that BDW-GC should recognize pointers
that point K bytes into a heap block.  We've been using this in both 2.0
and 2.2 from scm_init_struct (), and in 'master' it's done in
scm_storage_prehistory ().

This new approach entails registering one additional displacement.

In 2.0 and 2.2, we register displacements 0, 16, and 17.  The last two
are for struct vtables, which point 2 words into a heap block, even
before adding scm_tc3_struct (which is 1).

In current master, we register 0 and 1 (scm_tc3_struct).  With this new
approach, we would register 0, 1, and 6 (scm_pair_tag).

What do you think?

  Thanks,
Mark



Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3

2019-06-11 Thread Ludovic Courtès
Hello,

Mark H Weaver  skribis:

> Ludovic Courtès  writes:

[...]

>> IIUC, your plan is to have a different tagging on 32-bit platforms,
>> without fixflos, right?  I’m curious to see how much complexity would
>> entail from that.
>
> Yes, although I'm avoiding the term "fixflos" because IEEE doubles are
> also fixed width, and thus the term "fixflos" wouldn't adequately
> distinguish them from IEEE doubles.

Right!

> Anyway, I agree that it's inconvenient to have different tags on
> different targets, and I've been working to minimize the differences.
>
> At present, I'm currently implementing an alternative strategy where
> pairs are tagged in their pointers instead of in their CARs, which
> enables us to separate the heap tags and immediate tags into two
> independent spaces.

At first this sounds rather radical :-), but maybe it’s preferable this
way.

> In this new approach, the heap tags are left unchanged, and the only
> tags that vary with target word size are the fixints, fixrats, iflos,
> and pair pointers.  All other tags will be uniform across targets,
> including the non-number immediates.  Here's the new version:
>
> ;; /* with iflos:   xxx:  iflo (000 < xxx < 110)
> ;;(64-bit) 0111:  fixrat
> ;; :  fixnum
> ;; 0110:  pair
> ;;  000:  tagged heap object (thob)
> ;; 1110:  other immediate
> ;;
> ;; without iflos: 1:  fixnum
> ;;(32-bit)  010:  fixrat
> ;;  100:  pair
> ;;  000:  tagged heap object (thob)
> ;; 1110:  other immediate
>
> This new approach brings its own complications, mainly two:
>
> (1) It breaks the long-standing assumptions in Guile that all
> non-immediates have a tag in their first word and that pointers are
> always untagged.  In my preliminary patch, I introduce a new concept
> called a "tagged heap object" or "thob", and most existing checks
> for SCM_NIMP or !SCM_IMP must be changed to use SCM_THOB_P.

Though an immediate, like a fixnum or an iflo, is still something
different from a tagged heap object like a pair, right?  So I would
expect SCM_THOB_P to be a different test, not a drop-in replacement for
SCM_NIMP, is that correct?

> (2) Our existing VM instructions almost invariably specify offsets with
> a granularity of whole words.  To support tagged pair pointers with
> good performance, I think we need a few new instructions that
> specify byte offsets, to avoid the expensive extra step of removing
> the tag before accessing the CAR or CDR of a pair.

So instead of a pointer dereference, SCM_CAR becomes mask + dereference,
right?

I think we disable GC “interior pointer” scanning.  With this scheme, an
SCM for a pair would actually point in the middle of a pair; could this
be an issue for GC?

Thank you!

Ludo’.