Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hi Arne, Arne Babenhauserheide writes: > I don’t understand the implications of this, but I realized that I would > love to have info about the sizes of different datastructures in Guile. Sure. > I recently started measuring how much memory is required for a record > compared to a list, so I could give people concrete guidelines which > datastructures to use for which algorithm, and for that such low-level > details would be great to have! (or just to know where to read them up — > or which keywords to look for) For core data structures implemented in C, the place to look is in the corresponding *.h and *.c files in libguile, for example struct.[ch], strings.[ch], numbers.[ch], etc. However, pairs receive special handling, and only part of their implementation is in pairs.[ch]. I will summarize the costs of some important structures below: In the following, a "word" is the size of a pointer (4 bytes on a 32-bit system, or 8 bytes on a 64-bit system). I will _not_ include the SCM word itself in the accounting of the size of the object represented by that SCM. That's because for heap-allocated objects, the SCM is simply a pointer to the object, and there can be many pointers to the same object. The cost of SCM words will instead be included in the cost of the aggregate objects that contain them. In this method of accounting, immediate objects cost nothing, because they are entirely stored within the SCM itself. Pairs cost 2 words. Lists cost 2 words per element. The empty list is immediate, and therefore costs nothing. In current 'master', records cost 1 word per field, plus another word, rounded up to the nearest multiple of 2 words. In 2.0 and 2.2, they cost 1 word per field, plus another _2_ words, rounded up Vectors cost 1 word per element, plus another word, rounded up to the nearest multiple of 2 words. Bytevectors cost 4 words plus the data itself, rounded up to the nearest multiple of 2 words. As a special case, empty bytevectors cost nothing. Non-immediate inexact real numbers cost 16 bytes. Non-immediate complex numbers cost 24 bytes on 32-bit systems, or 32 bytes on 64-bit systems. In the common case, strings currently cost 6 words plus the character data itself, rounded up to the nearest multiple of 2 words. The character data costs either 1 byte per character (if all code points are less than 256, i.e. the string is latin-1), or 4 bytes per character. (However, I may propose a new string representation for 3.0). As a special case, the empty string costs nothing. Finally, I should mention that the ability to share substructure in lists and pairs make them potentially more memory efficient than vectors, depending on the use case. For example, if you need a structure with 4 fields, and a common operation in your application will involve copying the structure except for one field which is changed, then you might be able save memory by using lists, even though a 4-element list requires more memory than a 4-element vector or struct. If you put the commonly-changed field in the CAR, then you can update that field by allocating only a single pair (2 words) and sharing the rest of the structure with the old version. This is not possible with vectors or structs, where you must allocate a fresh new object of 5 or 6 words to perform the same operation. Regards, Mark
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hi, Mark H Weaver skribis: > Ludovic Courtès writes: >> Though an immediate, like a fixnum or an iflo, is still something >> different from a tagged heap object like a pair, right? So I would >> expect SCM_THOB_P to be a different test, not a drop-in replacement for >> SCM_NIMP, is that correct? > > That's right. It's not possible to create a drop-in replacement for > SCM_NIMP, because it is being used to answer two different questions > which used to be effectively equivalent, but no longer are: > > (1) Is X a pointer to a heap object with a heap tag in the first word? > (2) Is X a reference to a heap object? > > Test (1) needs to be done before checking the heap tag, to implement > type predicates for heap objects. Test (2) is needed in relatively few > places, e.g. to decide whether to register disappearing links when > adding an entry to a weak hash table. > > Actually, in my current branch I've removed the SCM_IMP and SCM_NIMP > macros outright, because it seems to me they are likely to be misused. > > SCM_THOB_P implements test (1) and SCM_HEAP_OBJECT_P implements test (2). I see. > There's no masking involved. Rather, it is subtracted from the pointer, > which allows the tag to be fused with the field offset. For example, on > x86-64, whereas CAR and CDR were previously: > > 1c0: 48 8b 07mov(%rdi),%rax ;old car > and: > 1d0: 48 8b 47 08 mov0x8(%rdi),%rax ;old cdr > > Now they become: > > 1e0: 48 8b 47 fa mov-0x6(%rdi),%rax ;new car > and: > 1f0: 48 8b 47 02 mov0x2(%rdi),%rax ;new cdr Looks reasonable. :-) > Fortunately, BDW-GC provides GC_REGISTER_DISPLACEMENT, which allows us > to register a small offset K, such that BDW-GC should recognize pointers > that point K bytes into a heap block. We've been using this in both 2.0 > and 2.2 from scm_init_struct (), and in 'master' it's done in > scm_storage_prehistory (). > > This new approach entails registering one additional displacement. Cool; if it’s just one displacement, that’s OK. > What do you think? It all looks perfectly reasonable to me! Thanks for explaining, Ludo’.
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Ludovic Courtès writes: > Though an immediate, like a fixnum or an iflo, is still something > different from a tagged heap object like a pair, right? I should clarify that in this new approach, a pair is *not* a tagged heap object. Tagged heap objects are those which have a tag in their first word. In this new approach, pairs are not tagged in this sense, and it is no longer possible to distinguish a pair from a non-pair by looking at their raw heap blocks. Tagged heap objects (thobs) have 000 in the low 3 bits of the SCM. Pairs have 0110 in the low 4 bits of the SCM. Regards, Mark
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hi Ludovic, Ludovic Courtès writes: > Though an immediate, like a fixnum or an iflo, is still something > different from a tagged heap object like a pair, right? So I would > expect SCM_THOB_P to be a different test, not a drop-in replacement for > SCM_NIMP, is that correct? That's right. It's not possible to create a drop-in replacement for SCM_NIMP, because it is being used to answer two different questions which used to be effectively equivalent, but no longer are: (1) Is X a pointer to a heap object with a heap tag in the first word? (2) Is X a reference to a heap object? Test (1) needs to be done before checking the heap tag, to implement type predicates for heap objects. Test (2) is needed in relatively few places, e.g. to decide whether to register disappearing links when adding an entry to a weak hash table. Actually, in my current branch I've removed the SCM_IMP and SCM_NIMP macros outright, because it seems to me they are likely to be misused. SCM_THOB_P implements test (1) and SCM_HEAP_OBJECT_P implements test (2). >> (2) Our existing VM instructions almost invariably specify offsets with >> a granularity of whole words. To support tagged pair pointers with >> good performance, I think we need a few new instructions that >> specify byte offsets, to avoid the expensive extra step of removing >> the tag before accessing the CAR or CDR of a pair. > > So instead of a pointer dereference, SCM_CAR becomes mask + dereference, > right? There's no masking involved. Rather, it is subtracted from the pointer, which allows the tag to be fused with the field offset. For example, on x86-64, whereas CAR and CDR were previously: 1c0: 48 8b 07mov(%rdi),%rax ;old car and: 1d0: 48 8b 47 08 mov0x8(%rdi),%rax ;old cdr Now they become: 1e0: 48 8b 47 fa mov-0x6(%rdi),%rax ;new car and: 1f0: 48 8b 47 02 mov0x2(%rdi),%rax ;new cdr In the VM, I've added four new instructions: make-tagged-non-immediate dst:12 tag:12 offset:32 tagged-allocate-words/immediate dst:8 count:8 tag:8 tagged-scm-ref/immediate dst:8 obj:8 byte-offset:8 tagged-scm-set!/immediate obj:8 byte-offset:8 val:8 The last two instructions above are like 'scm-ref/immediate' and 'scm-set!/immediate' except that they accept byte offsets instead of word offsets. CAR and CDR become: (tagged-scm-ref/immediate DST SRC -6);CAR and: (tagged-scm-ref/immediate DST SRC 2) ;CDR (although at present the -6 prints as 250 in the disassembler). > I think we disable GC “interior pointer” scanning. I agree. > With this scheme, an SCM for a pair would actually point in the middle > of a pair; could this be an issue for GC? It is already the case that Guile has tagged pointers in the first word of every struct. The first word of a struct contains a pointer to the vtable, with scm_tc3_struct added. Fortunately, BDW-GC provides GC_REGISTER_DISPLACEMENT, which allows us to register a small offset K, such that BDW-GC should recognize pointers that point K bytes into a heap block. We've been using this in both 2.0 and 2.2 from scm_init_struct (), and in 'master' it's done in scm_storage_prehistory (). This new approach entails registering one additional displacement. In 2.0 and 2.2, we register displacements 0, 16, and 17. The last two are for struct vtables, which point 2 words into a heap block, even before adding scm_tc3_struct (which is 1). In current master, we register 0 and 1 (scm_tc3_struct). With this new approach, we would register 0, 1, and 6 (scm_pair_tag). What do you think? Thanks, Mark
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hello, Mark H Weaver skribis: > Ludovic Courtès writes: [...] >> IIUC, your plan is to have a different tagging on 32-bit platforms, >> without fixflos, right? I’m curious to see how much complexity would >> entail from that. > > Yes, although I'm avoiding the term "fixflos" because IEEE doubles are > also fixed width, and thus the term "fixflos" wouldn't adequately > distinguish them from IEEE doubles. Right! > Anyway, I agree that it's inconvenient to have different tags on > different targets, and I've been working to minimize the differences. > > At present, I'm currently implementing an alternative strategy where > pairs are tagged in their pointers instead of in their CARs, which > enables us to separate the heap tags and immediate tags into two > independent spaces. At first this sounds rather radical :-), but maybe it’s preferable this way. > In this new approach, the heap tags are left unchanged, and the only > tags that vary with target word size are the fixints, fixrats, iflos, > and pair pointers. All other tags will be uniform across targets, > including the non-number immediates. Here's the new version: > > ;; /* with iflos: xxx: iflo (000 < xxx < 110) > ;;(64-bit) 0111: fixrat > ;; : fixnum > ;; 0110: pair > ;; 000: tagged heap object (thob) > ;; 1110: other immediate > ;; > ;; without iflos: 1: fixnum > ;;(32-bit) 010: fixrat > ;; 100: pair > ;; 000: tagged heap object (thob) > ;; 1110: other immediate > > This new approach brings its own complications, mainly two: > > (1) It breaks the long-standing assumptions in Guile that all > non-immediates have a tag in their first word and that pointers are > always untagged. In my preliminary patch, I introduce a new concept > called a "tagged heap object" or "thob", and most existing checks > for SCM_NIMP or !SCM_IMP must be changed to use SCM_THOB_P. Though an immediate, like a fixnum or an iflo, is still something different from a tagged heap object like a pair, right? So I would expect SCM_THOB_P to be a different test, not a drop-in replacement for SCM_NIMP, is that correct? > (2) Our existing VM instructions almost invariably specify offsets with > a granularity of whole words. To support tagged pair pointers with > good performance, I think we need a few new instructions that > specify byte offsets, to avoid the expensive extra step of removing > the tag before accessing the CAR or CDR of a pair. So instead of a pointer dereference, SCM_CAR becomes mask + dereference, right? I think we disable GC “interior pointer” scanning. With this scheme, an SCM for a pair would actually point in the middle of a pair; could this be an issue for GC? Thank you! Ludo’.
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
I should mention that I'm very open to suggestions Andy might have about any of this. The new approach I'm currently working on with tagged pair pointers requires changes to both the VM and the compiler, and I'm not confident that I've made the best choices there. I've made some preliminary choices for now in order to get something working ASAP, but I fully expect that the final version will look different based on input from Andy and others. I also don't know whether the new approach (with tagged pair pointers) will be preferable to the earlier one (with most tc7 tags changed to tc11). My goal is to present a couple of working alternatives, and then we can decide among them. For now, I went ahead and pushed my new (not yet working) branch 'wip-new-tagging-bis-broken', in case you want to see the details of this new approach-in-progress. The new branch doesn't yet include the iflo- or fixrat-enabling patches, which I'll add later. Regards, Mark
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hi Ludovic, Ludovic Courtès writes: > As Dave Thompson wrote on IRC yesterday, this can make a significant > difference for applications such as graphics and game engines. Yes, it's especially important to be able to avoid heap allocation in games, where GC pauses can be painful. However, any application that needs to do a lot of inexact arithmetic should benefit from the fact that it's a lot cheaper to pack these immediate floats than to allocate and later reclaim a heap-allocated cell. > I hadn’t > read the message yet and thought “hey, why not make instructions for > things like trigonometric functions so you get unboxed floats” but > obviously, as Dave pointed out, that wouldn’t scale well. :-) The unboxing done by our compiler is still the best option where it can be done. However, there are many cases where floats must be boxed, and that seems unavoidable in a language like Scheme. > Fixrats can make rationals more practical in applications where one > would have avoided them for performance. Yes, exactly, that's my motivation for fixrats. I probably should have said so explicitly in my original message, so thanks for mentioning it. I've certainly avoided exact rationals in the past for performance reasons, although they would have made the code much clearer. I suspect I'm not alone. My hope is that in a future, we might be able to use exact rationals more frequently. > IIUC, your plan is to have a different tagging on 32-bit platforms, > without fixflos, right? I’m curious to see how much complexity would > entail from that. Yes, although I'm avoiding the term "fixflos" because IEEE doubles are also fixed width, and thus the term "fixflos" wouldn't adequately distinguish them from IEEE doubles. Anyway, I agree that it's inconvenient to have different tags on different targets, and I've been working to minimize the differences. At present, I'm currently implementing an alternative strategy where pairs are tagged in their pointers instead of in their CARs, which enables us to separate the heap tags and immediate tags into two independent spaces. In this new approach, the heap tags are left unchanged, and the only tags that vary with target word size are the fixints, fixrats, iflos, and pair pointers. All other tags will be uniform across targets, including the non-number immediates. Here's the new version: ;; /* with iflos: xxx: iflo (000 < xxx < 110) ;;(64-bit) 0111: fixrat ;; : fixnum ;; 0110: pair ;; 000: tagged heap object (thob) ;; 1110: other immediate ;; ;; without iflos: 1: fixnum ;;(32-bit) 010: fixrat ;; 100: pair ;; 000: tagged heap object (thob) ;; 1110: other immediate This new approach brings its own complications, mainly two: (1) It breaks the long-standing assumptions in Guile that all non-immediates have a tag in their first word and that pointers are always untagged. In my preliminary patch, I introduce a new concept called a "tagged heap object" or "thob", and most existing checks for SCM_NIMP or !SCM_IMP must be changed to use SCM_THOB_P. (2) Our existing VM instructions almost invariably specify offsets with a granularity of whole words. To support tagged pair pointers with good performance, I think we need a few new instructions that specify byte offsets, to avoid the expensive extra step of removing the tag before accessing the CAR or CDR of a pair. I've already implemented these changes, and am now in another painful round of debugging. I suspect it will be fine, and likely preferable to my earlier approach of changing all tc7 tags to tc11. I'll report back when I have it working. Thanks, Mark
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
> On 8 Jun 2019, at 11:08, Chris Vine wrote: > > On Sat, 08 Jun 2019 10:07:45 +0200 > Arne Babenhauserheide wrote: > [snip] >> Wow, I didn’t know that you could do that. >> >> However: "The details of that allocation are implementation-defined, and >> it's undefined behavior to read from the member of the union that wasn't >> most recently written." https://en.cppreference.com/w/cpp/language/union >> >> Can you guarantee that this works? > > This is C and not C++ and the provision to which you refer does not > apply. > > Reading from a member of a union other than the one last written to is > implementation defined in C89/90, and defined in C99 (with Technical > Corrigendum 3) and above, although it might include a trap > representation (but wouldn't on any platform supported by guile). You > might want to see in particular footnote 95 of C11 (which isn't > normative but is intended to explain the provisions of §6.2.6.1 which > are). > > gcc and clang have always supported type punning through unions. FYI, the site mentioned above also covers C (there is a link the bottom of the C++ page above): https://en.cppreference.com/w/c/language/union
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
On Sat, 08 Jun 2019 11:46:10 +0200 Arne Babenhauserheide wrote: > Chris Vine writes: > > On Sat, 08 Jun 2019 10:07:45 +0200 > > Arne Babenhauserheide wrote: > > [snip] > >> Wow, I didn’t know that you could do that. > >> > >> However: "The details of that allocation are implementation-defined, and > >> it's undefined behavior to read from the member of the union that wasn't > >> most recently written." https://en.cppreference.com/w/cpp/language/union > >> > >> Can you guarantee that this works? > > > > This is C and not C++ and the provision to which you refer does not > > apply. > > > > Reading from a member of a union other than the one last written to is > > implementation defined in C89/90, and defined in C99 (with Technical > > Corrigendum 3) and above > > Thank you for the correction and explanation! You have a good point though if visible type transformations were to appear in a header rather than a *.c file, because guile headers are (at the moment) intended to be in the common subset of C and C++ so that libguile.h can be included in a C++ programme. Having said that, gcc and clang support type punning through unions in C++ as well as C. I don't know if guile is supposed to compile with other compilers nowadays: but frankly it would be perverse for some other compiler which supports both C and C++ to invoke different behaviour for unions in such cases. Chris
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Chris Vine writes: > On Sat, 08 Jun 2019 10:07:45 +0200 > Arne Babenhauserheide wrote: > [snip] >> Wow, I didn’t know that you could do that. >> >> However: "The details of that allocation are implementation-defined, and >> it's undefined behavior to read from the member of the union that wasn't >> most recently written." https://en.cppreference.com/w/cpp/language/union >> >> Can you guarantee that this works? > > This is C and not C++ and the provision to which you refer does not > apply. > > Reading from a member of a union other than the one last written to is > implementation defined in C89/90, and defined in C99 (with Technical > Corrigendum 3) and above Thank you for the correction and explanation! Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken signature.asc Description: PGP signature
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
On Sat, 08 Jun 2019 10:07:45 +0200 Arne Babenhauserheide wrote: [snip] > Wow, I didn’t know that you could do that. > > However: "The details of that allocation are implementation-defined, and > it's undefined behavior to read from the member of the union that wasn't > most recently written." https://en.cppreference.com/w/cpp/language/union > > Can you guarantee that this works? This is C and not C++ and the provision to which you refer does not apply. Reading from a member of a union other than the one last written to is implementation defined in C89/90, and defined in C99 (with Technical Corrigendum 3) and above, although it might include a trap representation (but wouldn't on any platform supported by guile). You might want to see in particular footnote 95 of C11 (which isn't normative but is intended to explain the provisions of §6.2.6.1 which are). gcc and clang have always supported type punning through unions. Chris
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Mark H Weaver writes: > Mark H Weaver writes: > >> Here's the format of fixrats on 64-bit systems: … > This allows for an elegant packing operation: > > if (SCM_I_INUMP (numerator) && SCM_I_INUMP (denominator)) > { > union { double f; uint64_t u; } u; > u.f = SCM_I_INUM (numerator); > u.u += SCM_I_INUM (denominator); Wow, I didn’t know that you could do that. However: "The details of that allocation are implementation-defined, and it's undefined behavior to read from the member of the union that wasn't most recently written." https://en.cppreference.com/w/cpp/language/union Can you guarantee that this works? > if ((scm_t_inum) u.f == SCM_I_INUM (numerator)) > { > u.u += 0xc010; > u.u = (u.u << 6) | (u.u >> 58); > if ((u.u & 0x1f) == 0) > return SCM_PACK (u.u | scm_fixrat_tag); > } > } > > We start by converting the numerator into an IEEE double. We then use > unsigned integer addition to add the denominator to the 64-bit > representation of that double. We now check that this new IEEE double > has integer part equal to the numerator. If it doesn't, this indicates > either that the numerator is too large to be represented exactly, or > that the denominator is too large to fit in the remaining bits. > > The only thing that remains to be done at this point is to rotate left 6 > bits, so that the 5 highest exponent bits become the lowest 5 bits, > where the fixrat tag will go, and to add a value which adjusts the IEEE > biased exponent field to be 0 when the numerator is 1 or -1. It is really cool to read these deep details — is there a chance that when this lands you could re-vamp the emails you wrote here into a blog-post we can easily link to? Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken signature.asc Description: PGP signature
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Mark H Weaver writes: > Here's the format of fixrats on 64-bit systems: > > Srrx0111 > |\/\___/\__/ > | | | | > | rank (6 bits) data (53 bits) tag > sign I've since thought of another way to represent fixrats, which allow simpler packing and unpacking operations, although it would also mean sacrificing one bit of precision: rrS00111 \/\__/|\___/ | || | rank (6 bits) data (52 bits) sign tag In this new representation, the positions of the numerator and denominator are swapped. Now, the numerator occupies the most-significant data bits (with its most-significant bit removed), and the denominator occupies the lowest data bits. The 6-bit rank field is now 1 less than the integer-length of the numerator, i.e. the rank is equal to the number of data bits allocated to the numerator. This allows for an elegant packing operation: if (SCM_I_INUMP (numerator) && SCM_I_INUMP (denominator)) { union { double f; uint64_t u; } u; u.f = SCM_I_INUM (numerator); u.u += SCM_I_INUM (denominator); if ((scm_t_inum) u.f == SCM_I_INUM (numerator)) { u.u += 0xc010; u.u = (u.u << 6) | (u.u >> 58); if ((u.u & 0x1f) == 0) return SCM_PACK (u.u | scm_fixrat_tag); } } We start by converting the numerator into an IEEE double. We then use unsigned integer addition to add the denominator to the 64-bit representation of that double. We now check that this new IEEE double has integer part equal to the numerator. If it doesn't, this indicates either that the numerator is too large to be represented exactly, or that the denominator is too large to fit in the remaining bits. The only thing that remains to be done at this point is to rotate left 6 bits, so that the 5 highest exponent bits become the lowest 5 bits, where the fixrat tag will go, and to add a value which adjusts the IEEE biased exponent field to be 0 when the numerator is 1 or -1. In the code above, I make one final check to make sure the low 5 bits are zeroes, before applying the 5-bit tag. However, it's possible that this final test might be safely omitted. I'll need to take a careful look at that. On 32-bit systems, we can use the same approach, but using IEEE singles (C floats) instead of doubles, and with a 3-bit tag. This would allow fixrats that can be written with 24 binary digits or less. I went ahead and implemented this, and it is _marginally_ faster than the earlier version, but not as much as I hoped for. So, I'm not yet sure whether to make the switch, but I wanted to post about it anyway. Mark
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Hi Mark, Mark H Weaver skribis: > I've found a way to efficiently support both immediate IEEE binary-64 > doubles up to ~1.158e77 (with larger ones transparently allocated on the > heap), and also immediate exact rationals with up to 54 binary digits > (~16 decimal digits), without restricting the 64-bit pointer space at > all, and without any loss of arithmetic precision. Woow, impressive, and really clever! As Dave Thompson wrote on IRC yesterday, this can make a significant difference for applications such as graphics and game engines. I hadn’t read the message yet and thought “hey, why not make instructions for things like trigonometric functions so you get unboxed floats” but obviously, as Dave pointed out, that wouldn’t scale well. :-) > Here's the format of fixrats on 64-bit systems: > > Srrx0111 > |\/\___/\__/ > | | | | > | rank (6 bits) data (53 bits) tag > sign [...] > I chose this representation because it allows us to leverage existing > floating-point hardware to efficiently pack fixrats. Simply convert the > denominator to an IEEE double, and now the rank will be in the low bits > of the IEEE exponent field, immediately adjacent to the denominator with > its high bit removed. This simplifies the packing operation. Fun. :-) Fixrats can make rationals more practical in applications where one would have avoided them for performance. IIUC, your plan is to have a different tagging on 32-bit platforms, without fixflos, right? I’m curious to see how much complexity would entail from that. I don’t know what Andy thinks, but if there’s a good way forward for both 32-bit and 64-bit, it’d be a nice bonus for 3.0! Thanks, Ludo’.
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
On Thu, Jun 06, 2019 at 05:40:39AM -0400, Mark H Weaver wrote: > I've found a way to efficiently support both immediate IEEE binary-64 > doubles up to ~1.158e77 (with larger ones transparently allocated on the > heap), and also immediate exact rationals with up to 54 binary digits > (~16 decimal digits), without restricting the 64-bit pointer space at > all, and without any loss of arithmetic precision. [...] Impressive bitfiddling feat. Thanks for the enjoyable explanation, too! Cheers -- t signature.asc Description: Digital signature
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Mark H Weaver writes: > I have a working draft implementation that roughly doubles the speed of > a simple "substract 1.0 until negative" loop for inexact reals less than > 2^256, compared with current 'master' (near 2.9.2). The same loop for > exact rationals runs in ~70% of the time compared with current master. > More importantly, no heap allocation is required when working with these > immediate values. > More precisely, large finite doubles with magnitude >= 2^256 > (1.157920892373162e77) must be heap allocated. All other doubles, > including all Inf/NaNs, are immediate floats, or "iflos". > > I call this approach "high-exponent boxing", because instead of using > the space of ~2^53 NaNs for non-flonum purposes, I use the space of > finite doubles with binary exponent larger than 255. That's 3/8ths of > the total space of IEEE doubles. This gives us a total of (3 * 2^61) > SCM values to use for other things. … > I've also implemented immediate exact rationals, or "fixrats". The > precise rule is that on a 64-bit system, an exact non-integer rational > is immediate if and only if it can be written with 54 binary digits or > less. In terms of decimal digits, any rational that can be written with > 16 decimal digits is immediate, and some that require 17 or 18 decimal > digits are immediate. Wow, that’s awesome! > I should also mention that although the current patch set adds about 4 > bits to the size of all heap tags, e.g. changing most tc7 tags to tc11, > I can see a way to avoid this: we could tag pairs in the low bits of > their pointers, instead of in their CARs. > > More concretely, 1110 would become the "pair pointer" tag, thus > eliminating the need for the 1110 "NOT_SCM" tag (a.k.a. the non-pair > heap object tag). This would eliminate the need to change the heap tags > at all, and dramatically reduce the size of my patch set. Moreover, the > low bit would no longer need to be 1, so we would have many more tc7 > tags to work with. I don’t understand the implications of this, but I realized that I would love to have info about the sizes of different datastructures in Guile. I recently started measuring how much memory is required for a record compared to a list, so I could give people concrete guidelines which datastructures to use for which algorithm, and for that such low-level details would be great to have! (or just to know where to read them up — or which keywords to look for) Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken signature.asc Description: PGP signature
Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
Earlier I wrote: > There's also a nice way to extract the denominator from a fixrat: mask > out the sign bit, shift right 5 bits, and interpret it as an IEEE > double. The denominator will be the integer part of the resulting > value, with the numerator in the fraction bits. Simply cast this double > to an integer to discard the numerator bits. Sorry, I forgot a step in the description above. You must also set the most significant bit of the biased exponent field after shifting right. > Here are the tags used in my draft implementation: > > ;; /* with iflos: xxx: iflo (000 < xxx < 110) > ;;(64-bit) : fixnum > ;; 0111: fixrat > ;; > ;; 000: heap object > ;; 0110: immediate non-number > ;; 1110: [NOT_SCM] > ;;0: [NOT_SCM] struct tag > ;; t101110: [NOT_SCM] non-pair non-struct non-smob tag > ;; x1001110: [NOT_SCM] smob I should also mention that although the current patch set adds about 4 bits to the size of all heap tags, e.g. changing most tc7 tags to tc11, I can see a way to avoid this: we could tag pairs in the low bits of their pointers, instead of in their CARs. More concretely, 1110 would become the "pair pointer" tag, thus eliminating the need for the 1110 "NOT_SCM" tag (a.k.a. the non-pair heap object tag). This would eliminate the need to change the heap tags at all, and dramatically reduce the size of my patch set. Moreover, the low bit would no longer need to be 1, so we would have many more tc7 tags to work with. Mark