Re: A couple of questions about arrays and slices

2023-06-24 Thread Cecil Ward via Digitalmars-d-learn

On Saturday, 24 June 2023 at 16:42:45 UTC, Cecil Ward wrote:
On Saturday, 24 June 2023 at 15:12:14 UTC, Jonathan M Davis 
wrote:

[...]


Yeah, it would take me forever to get my head around that, and 
I only want a crude toy partial parser for certain portions of 
the grammar, and the parsing code is done now. A hand-written 
recursive descent type thing mainly dealing with things like 
comments and literal string that have to be taken account of as 
they prevent hazards to naive straight string searching for 
what you want to find, as comments and eg double-quoted strings 
could have things in them that are red-herrings or the things 
that you want to find items in, depending on circumstances.


[...]


I read an article about just that good strings trick many many 
years back, and the author called it ‘a string universe’, which I 
really liked.


Re: A couple of questions about arrays and slices

2023-06-24 Thread Cecil Ward via Digitalmars-d-learn

On Saturday, 24 June 2023 at 15:12:14 UTC, Jonathan M Davis wrote:
On Saturday, June 24, 2023 8:43:00 AM MDT Cecil Ward via 
Digitalmars-d-learn wrote:
I started out looking into a number of runtime library 
routines, but in the end it seemed quicker to roll my own code 
for a crude recursive descent parser/lexer that parses part of 
D’s grammar for expressions, and (again partial grammar) 
parser for string literal expressions and so on. I find 
certain special elements and execute actions which involve 
doing the AA lookup and replacing variable names with ordinal 
numbers in decimal in the output stream. Admission: The 
parsing is the thing that has to be fast, even though again 
the size of the D language text is not likely to be huge at 
all. But 40 years ago, I came from a world with 2k RAM and 0.9 
MHz clock rates so I have developed a habit of always thinking 
about speed before I do anything, needful or not, to be 
honest. I once wrote a program that took 35 mins to evaluate 
2+2 and print out the answer, so I’m now ashamed of writing 
slow code. Those were bad days, to be honest. 4 GHz+ and ILP 
is nicer.


Well, dmd is open source (and Boost-licensed, so it doesn't 
really have any restrictions), so depending on what you're 
doing, it might make sense to just take code from that (and 
it's very fast). IIRC, it pulls some fun tricks like replacing 
identical strings with pointers to the same string so that it 
can just compare pointers.


- Jonathan M Davis


Yeah, it would take me forever to get my head around that, and I 
only want a crude toy partial parser for certain portions of the 
grammar, and the parsing code is done now. A hand-written 
recursive descent type thing mainly dealing with things like 
comments and literal string that have to be taken account of as 
they prevent hazards to naive straight string searching for what 
you want to find, as comments and eg double-quoted strings could 
have things in them that are red-herrings or the things that you 
want to find items in, depending on circumstances.


I’m trying to get my head round the differences between OSX tools 
and those for Linux relating to LDC and GDC which seems slightly 
inferior in some situations. I’m a serious professional asm 
programmer of old, before compilers were of usable output quality 
for
git-hard applications. (‘a git’ a disreputable person, colloquial 
English English. ‘git hard’ - brain-meltingly hard, like quantum 
gravity.)


Re: A couple of questions about arrays and slices

2023-06-24 Thread Jonathan M Davis via Digitalmars-d-learn
On Saturday, June 24, 2023 8:43:00 AM MDT Cecil Ward via Digitalmars-d-learn 
wrote:
> I started out looking into a number of runtime library routines,
> but in the end it seemed quicker to roll my own code for a crude
> recursive descent parser/lexer that parses part of D’s grammar
> for expressions, and (again partial grammar) parser for string
> literal expressions and so on. I find certain special elements
> and execute actions which involve doing the AA lookup and
> replacing variable names with ordinal numbers in decimal in the
> output stream. Admission: The parsing is the thing that has to be
> fast, even though again the size of the D language text is not
> likely to be huge at all. But 40 years ago, I came from a world
> with 2k RAM and 0.9 MHz clock rates so I have developed a habit
> of always thinking about speed before I do anything, needful or
> not, to be honest. I once wrote a program that took 35 mins to
> evaluate 2+2 and print out the answer, so I’m now ashamed of
> writing slow code. Those were bad days, to be honest. 4 GHz+ and
> ILP is nicer.

Well, dmd is open source (and Boost-licensed, so it doesn't really have any
restrictions), so depending on what you're doing, it might make sense to
just take code from that (and it's very fast). IIRC, it pulls some fun
tricks like replacing identical strings with pointers to the same string so
that it can just compare pointers.

- Jonathan M Davis






Re: A couple of questions about arrays and slices

2023-06-24 Thread Cecil Ward via Digitalmars-d-learn

On Saturday, 24 June 2023 at 12:05:26 UTC, Jonathan M Davis wrote:
On Saturday, June 24, 2023 1:43:53 AM MDT Cecil Ward via 
Digitalmars-d-learn wrote:

On Saturday, 24 June 2023 at 07:36:26 UTC, Cecil Ward wrote:
> [...]

I just realised something, your point about altering the table 
and having to rehash, is well taken. I hadn’t considered that. 
The reason for my foolishness in failing to realise that I’m 
asking the impractical is my pattern of usage. I add all the 
entries into the mapping table and have no interest in any 
lookups until it is fully built. Then a second function starts 
to do lookups while the data remains unchanging and that usage 
pattern can be guaranteed. I could even idup it if that would 
help, as copying < 32 uints wouldn’t take forever. A typical 
value would be a mere 5 or less. I only picked 32 to be 
completely safely ott.


Well, if the key were a struct or a class, the hashing function 
would be opHash. For built-in types, the runtime has hashing 
functions that it uses. Either way, with AAs, you really don't 
worry about managing the memory, because it's completely 
outside of your control. You just put the elements in there 
using their associated keys, and if you want to try to speed it 
up after you've populated it, you use rehash so that the 
runtime can try to move the elements around within the 
container so that lookup speeds will be closer to optimal.


As such, for the most part, when dealing with AAs and worrying 
about efficiency, the question really becomes whether AAs are 
the correct solution rather than much of anything having to do 
with how you manage their memory.


With so few elements, it's also possible that using 
std.algorithm.searching.find would be faster - e.g. having a 
dynamic array of strings where the matching int is at the same 
index in a dynamic array of ints - or you could use 
std.typecons.Tuple!(string, int)[] with something like 
arr.find!(a => a[0] == key)() to find the tuple with the int 
you want.


Simply comparing a small number of strings like that might be 
faster than what goes on with hashing the string and then 
finding the corresponding element within the AA - or it might 
not be. You'd have to test that to know. The AA would 
definitely be faster with a large number of elements, but with 
a small number of elements, the algorithmic complexity doesn't 
really matter, and the extra overhad with the AA lookups could 
actually mean that the search through the dynamic array is 
faster even though it's O(n). But you can only know which is 
faster by testing it out with the actual data that you're 
dealing with.


Regardless, you need to remember that associative arrays are 
not arrays in the C sense. Rather, they're hash tables, so they 
function very differently from dynamic arrays, and the rehash 
function is the closest that you're going to get to affecting 
how the elements are laid out internally or how much memory the 
AA is using.


- Jonathan M Davis


I started out looking into a number of runtime library routines, 
but in the end it seemed quicker to roll my own code for a crude 
recursive descent parser/lexer that parses part of D’s grammar 
for expressions, and (again partial grammar) parser for string 
literal expressions and so on. I find certain special elements 
and execute actions which involve doing the AA lookup and 
replacing variable names with ordinal numbers in decimal in the 
output stream. Admission: The parsing is the thing that has to be 
fast, even though again the size of the D language text is not 
likely to be huge at all. But 40 years ago, I came from a world 
with 2k RAM and 0.9 MHz clock rates so I have developed a habit 
of always thinking about speed before I do anything, needful or 
not, to be honest. I once wrote a program that took 35 mins to 
evaluate 2+2 and print out the answer, so I’m now ashamed of 
writing slow code. Those were bad days, to be honest. 4 GHz+ and 
ILP is nicer.


Re: A couple of questions about arrays and slices

2023-06-24 Thread Jonathan M Davis via Digitalmars-d-learn
On Saturday, June 24, 2023 1:43:53 AM MDT Cecil Ward via Digitalmars-d-learn 
wrote:
> On Saturday, 24 June 2023 at 07:36:26 UTC, Cecil Ward wrote:
> > Jonathan, is it possible that I wanted one thing and got
> > another? My description in the earlier post was of the _aim_ of
> > the program. What I ended up with might be something else? I
> > wanted an array of uints whose values are the results/outputs
> > of the mapping function. Since it is keyed by strings I assumed
> > that the runtime generates some kind of hash for fast lookup
> > when I ask it to retrieve an entry by the string (key)
> > associated with it. I assumed that in some sense the hashing
> > was sort of separate with some degree of independence from the
> > underlying array, if that makes sense. The lookup is just
> > assumed to be fast but how it is done we don’t really care. I
> > just wanted to expand the array as I did successfully elsewhere
> > with reserve, as I built this structure by successive additions
> > of data. I have a number of strings and the map is meant to
> > output the ordinal number in which I first saw them,
> > zero-based. Then I want to come back and randomly look up one
> > ordinal given a string preferably with a very fast lookup. The
> > number of entries can not practically be more than 30, and even
> > that would be highly unusual, maybe ten is the practical limit
> > in my particular case, so it’s hardly MySQL.
>
> I just realised something, your point about altering the table
> and having to rehash, is well taken. I hadn’t considered that.
> The reason for my foolishness in failing to realise that I’m
> asking the impractical is my pattern of usage. I add all the
> entries into the mapping table and have no interest in any
> lookups until it is fully built. Then a second function starts to
> do lookups while the data remains unchanging and that usage
> pattern can be guaranteed. I could even idup it if that would
> help, as copying < 32 uints wouldn’t take forever. A typical
> value would be a mere 5 or less. I only picked 32 to be
> completely safely ott.

Well, if the key were a struct or a class, the hashing function would be
opHash. For built-in types, the runtime has hashing functions that it uses.
Either way, with AAs, you really don't worry about managing the memory,
because it's completely outside of your control. You just put the elements
in there using their associated keys, and if you want to try to speed it up
after you've populated it, you use rehash so that the runtime can try to
move the elements around within the container so that lookup speeds will be
closer to optimal.

As such, for the most part, when dealing with AAs and worrying about
efficiency, the question really becomes whether AAs are the correct solution
rather than much of anything having to do with how you manage their memory.

With so few elements, it's also possible that using
std.algorithm.searching.find would be faster - e.g. having a dynamic array
of strings where the matching int is at the same index in a dynamic array of
ints - or you could use std.typecons.Tuple!(string, int)[] with something
like arr.find!(a => a[0] == key)() to find the tuple with the int you want.

Simply comparing a small number of strings like that might be faster than
what goes on with hashing the string and then finding the corresponding
element within the AA - or it might not be. You'd have to test that to know.
The AA would definitely be faster with a large number of elements, but with
a small number of elements, the algorithmic complexity doesn't really
matter, and the extra overhad with the AA lookups could actually mean that
the search through the dynamic array is faster even though it's O(n). But
you can only know which is faster by testing it out with the actual data
that you're dealing with.

Regardless, you need to remember that associative arrays are not arrays in
the C sense. Rather, they're hash tables, so they function very differently
from dynamic arrays, and the rehash function is the closest that you're
going to get to affecting how the elements are laid out internally or how
much memory the AA is using.

- Jonathan M Davis






Re: A couple of questions about arrays and slices

2023-06-24 Thread Cecil Ward via Digitalmars-d-learn

On Saturday, 24 June 2023 at 07:36:26 UTC, Cecil Ward wrote:
Jonathan, is it possible that I wanted one thing and got 
another? My description in the earlier post was of the _aim_ of 
the program. What I ended up with might be something else? I 
wanted an array of uints whose values are the results/outputs 
of the mapping function. Since it is keyed by strings I assumed 
that the runtime generates some kind of hash for fast lookup 
when I ask it to retrieve an entry by the string (key) 
associated with it. I assumed that in some sense the hashing 
was sort of separate with some degree of independence from the 
underlying array, if that makes sense. The lookup is just 
assumed to be fast but how it is done we don’t really care. I 
just wanted to expand the array as I did successfully elsewhere 
with reserve, as I built this structure by successive additions 
of data. I have a number of strings and the map is meant to 
output the ordinal number in which I first saw them, 
zero-based. Then I want to come back and randomly look up one 
ordinal given a string preferably with a very fast lookup. The 
number of entries can not practically be more than 30, and even 
that would be highly unusual, maybe ten is the practical limit 
in my particular case, so it’s hardly MySQL.


I just realised something, your point about altering the table 
and having to rehash, is well taken. I hadn’t considered that. 
The reason for my foolishness in failing to realise that I’m 
asking the impractical is my pattern of usage. I add all the 
entries into the mapping table and have no interest in any 
lookups until it is fully built. Then a second function starts to 
do lookups while the data remains unchanging and that usage 
pattern can be guaranteed. I could even idup it if that would 
help, as copying < 32 uints wouldn’t take forever. A typical 
value would be a mere 5 or less. I only picked 32 to be 
completely safely ott.


Re: A couple of questions about arrays and slices

2023-06-24 Thread Cecil Ward via Digitalmars-d-learn

On Saturday, 24 June 2023 at 01:28:03 UTC, Jonathan M Davis wrote:
On Friday, June 23, 2023 7:02:12 PM MDT Cecil Ward via 
Digitalmars-d-learn wrote:
I just had a fight with LDC over the following code when I 
tried out reserve. I have an associative array that maps 
strings to ‘ordinals’ ie uints that are unique, and the 
compiler hates the call to reserve.


==



struct decls_t
  {
  uintn_entries = 0;
  uint[ dstring ] ordinals;   // Associative array maps 
variable

names to ordinals
  }

static decls_t Decls;

enum NPreAllocEntries = 32;
Decls.ordinals.reserve( NPreAllocEntries );

source>(82): Error: none of the overloads of template
`object.reserve` are callable using argument types
`!()(uint[dstring], ulong)`
/opt/compiler-explorer/ldc1.32.1/ldc2-1.32.1-linux-x86_64/bin/../import/obje
ct.d(3983):Candidate is: `reserve(T)(ref T[] arr, 
size_t

newcapacity)` Compiler returned: 1


Associative arrays and dynamic arrays are completely different 
things. Associative arrays are hash tables, and reserve really 
doesn't make sense for them. reserve is for telling the GC to 
make sure that a dynamic array has at least a specific amount 
of room to grow into before the GC needs to do a reallocation 
so that the dynamic array refers to a different memory block 
with enough memory to hold the data, whereas if and when 
associative arrays have to reallocate any of their internals is 
largely implementation-defined.


Any time that you add or remove elements from an AA, it might 
reallocate some of its internals depending on its current state 
and what the key of the element is - and that could be 
different between different compiler releases (though it's 
unlikely to change very often, since I don't think that the AA 
implementation gets messed with much).


You can use the rehash function on AAs to tell the GC to try to 
reorder how it's structured all of its buckets so that lookups 
are more efficient with the data that's currently in there, and 
you can call clear to remove all its elements, but in general, 
you don't do much to manage an AA's memory. It's a much more 
complicated data structure than an array.


https://dlang.org/spec/hash-map.html

- Jonathan M Davis


Jonathan, is it possible that I wanted one thing and got another? 
My description in the earlier post was of the _aim_ of the 
program. What I ended up with might be something else? I wanted 
an array of uints whose values are the results/outputs of the 
mapping function. Since it is keyed by strings I assumed that the 
runtime generates some kind of hash for fast lookup when I ask it 
to retrieve an entry by the string (key) associated with it. I 
assumed that in some sense the hashing was sort of separate with 
some degree of independence from the underlying array, if that 
makes sense. The lookup is just assumed to be fast but how it is 
done we don’t really care. I just wanted to expand the array as I 
did successfully elsewhere with reserve, as I built this 
structure by successive additions of data. I have a number of 
strings and the map is meant to output the ordinal number in 
which I first saw them, zero-based. Then I want to come back and 
randomly look up one ordinal given a string preferably with a 
very fast lookup. The number of entries can not practically be 
more than 30, and even that would be highly unusual, maybe ten is 
the practical limit in my particular case, so it’s hardly MySQL.


Re: A couple of questions about arrays and slices

2023-06-24 Thread Ali Çehreli via Digitalmars-d-learn

On 6/20/23 19:09, Cecil Ward wrote:

> 2.) I have a dynamic array and I wish to preinitialise its alloc cell to
> be a certain large size so that I don’t need to reallocate often

To be complete, 'assumeSafeAppend' must be mentioned here as well. 
Without it, there will be cases where the GC cannot guarantee that there 
are no slices to this particular one; so it has to reallocate:


import std;

void main() {
// An array with room for 100 elements
int[] arr;
arr.reserve(100);

// Take note of current address of the elements
auto ptr = arr.ptr;

foreach (i; 0 .. 80) {
// Add elements
arr ~= i;

// Was there a reallocation?
if (arr.ptr != ptr) {
writeln("relocated to ", arr.ptr, " at ", i);
ptr = arr.ptr;
}

// Let's say our algorithm shrinks the array
if (i == 50) {
arr.length = 0;
// assumeSafeAppend(arr);
}
}
}

Although the array has room for 100 elements, the program will print 
something similar to the following:


relocated to 7F058B02B000 at 51
relocated to 7F058B02C000 at 54
relocated to 7F058B02D000 at 58
relocated to 7F058B02E000 at 62
relocated to 7F058B02F000 at 66
relocated to 7F058B03 at 74

When it's known that there is no other slice to the old elements, the 
programmer calls assumeSafeAppend() by uncommenting that line :o). Now 
there are no relocations. Sweet!


Ali



Re: A couple of questions about arrays and slices

2023-06-23 Thread Jonathan M Davis via Digitalmars-d-learn
On Friday, June 23, 2023 7:02:12 PM MDT Cecil Ward via Digitalmars-d-learn 
wrote:
> I just had a fight with LDC over the following code when I tried
> out reserve. I have an associative array that maps strings to
> ‘ordinals’ ie uints that are unique, and the compiler hates the
> call to reserve.
>
> ==
>
>
>
> struct decls_t
>   {
>   uintn_entries = 0;
>   uint[ dstring ] ordinals;   // Associative array maps variable
> names to ordinals
>   }
>
> static decls_t Decls;
>
> enum NPreAllocEntries = 32;
> Decls.ordinals.reserve( NPreAllocEntries );
>
> source>(82): Error: none of the overloads of template
> `object.reserve` are callable using argument types
> `!()(uint[dstring], ulong)`
> /opt/compiler-explorer/ldc1.32.1/ldc2-1.32.1-linux-x86_64/bin/../import/obje
> ct.d(3983):Candidate is: `reserve(T)(ref T[] arr, size_t
> newcapacity)` Compiler returned: 1

Associative arrays and dynamic arrays are completely different things.
Associative arrays are hash tables, and reserve really doesn't make sense
for them. reserve is for telling the GC to make sure that a dynamic array
has at least a specific amount of room to grow into before the GC needs to
do a reallocation so that the dynamic array refers to a different memory
block with enough memory to hold the data, whereas if and when associative
arrays have to reallocate any of their internals is largely
implementation-defined.

Any time that you add or remove elements from an AA, it might reallocate
some of its internals depending on its current state and what the key of the
element is - and that could be different between different compiler releases
(though it's unlikely to change very often, since I don't think that the AA
implementation gets messed with much).

You can use the rehash function on AAs to tell the GC to try to reorder how
it's structured all of its buckets so that lookups are more efficient with
the data that's currently in there, and you can call clear to remove all its
elements, but in general, you don't do much to manage an AA's memory. It's a
much more complicated data structure than an array.

https://dlang.org/spec/hash-map.html

- Jonathan M Davis






Re: A couple of questions about arrays and slices

2023-06-23 Thread Cecil Ward via Digitalmars-d-learn

On Thursday, 22 June 2023 at 05:21:52 UTC, Cecil Ward wrote:
On Thursday, 22 June 2023 at 01:44:22 UTC, Jonathan M Davis 
wrote:
On Wednesday, June 21, 2023 7:05:28 PM MDT Paul Backus via 
Digitalmars-d-learn wrote:

[...]


To add to that, it _has_ to know the element type, because 
aside from anything related to a type's size, it bit-blits the 
type's init value onto the new elements when it increases the 
length of the dynamic array.


You'd probably be dealing with bytes if you were explicitly 
asking for memory and the like (e.g. with malloc), but a 
dynamic array is properly typed, and everything you do with it 
in @safe code is going to deal with it as properly typed. For 
it to be otherwise would require @system casts.


- Jonathan M Davis


Thankyou Jonathan!


I just had a fight with LDC over the following code when I tried 
out reserve. I have an associative array that maps strings to 
‘ordinals’ ie uints that are unique, and the compiler hates the 
call to reserve.


==



struct decls_t
{
uintn_entries = 0;
	uint[ dstring ]		ordinals;	// Associative array maps variable 
names to ordinals

}

static decls_t Decls;

enum NPreAllocEntries = 32;
Decls.ordinals.reserve( NPreAllocEntries );

source>(82): Error: none of the overloads of template 
`object.reserve` are callable using argument types 
`!()(uint[dstring], ulong)`

/opt/compiler-explorer/ldc1.32.1/ldc2-1.32.1-linux-x86_64/bin/../import/object.d(3983):
Candidate is: `reserve(T)(ref T[] arr, size_t newcapacity)`
Compiler returned: 1



Re: A couple of questions about arrays and slices

2023-06-21 Thread Cecil Ward via Digitalmars-d-learn

On Thursday, 22 June 2023 at 01:44:22 UTC, Jonathan M Davis wrote:
On Wednesday, June 21, 2023 7:05:28 PM MDT Paul Backus via 
Digitalmars-d-learn wrote:

[...]


To add to that, it _has_ to know the element type, because 
aside from anything related to a type's size, it bit-blits the 
type's init value onto the new elements when it increases the 
length of the dynamic array.


You'd probably be dealing with bytes if you were explicitly 
asking for memory and the like (e.g. with malloc), but a 
dynamic array is properly typed, and everything you do with it 
in @safe code is going to deal with it as properly typed. For 
it to be otherwise would require @system casts.


- Jonathan M Davis


Thankyou Jonathan!


Re: A couple of questions about arrays and slices

2023-06-21 Thread Cecil Ward via Digitalmars-d-learn

On Thursday, 22 June 2023 at 01:05:28 UTC, Paul Backus wrote:

On Thursday, 22 June 2023 at 00:10:19 UTC, Cecil Ward wrote:
Is .reserve()’s argument scaled by the entry size after it is 
supplied, that is it is quoted in elements or is it in bytes? 
I’m not sure whether the runtime has a knowledge of the 
element type so maybe it doesn’t know anything about scale 
factors, not sure.


length, reserve, and capacity all use the same unit, which is 
elements.


reserve passes a TypeInfo instance to the runtime so that it 
knows the size of the elements. You can see the implementation 
here:


https://github.com/dlang/dmd/blob/v2.104.0/druntime/src/object.d#L3910


Many thanks Paul, I should have found that!


Re: A couple of questions about arrays and slices

2023-06-21 Thread Jonathan M Davis via Digitalmars-d-learn
On Wednesday, June 21, 2023 7:05:28 PM MDT Paul Backus via Digitalmars-d-learn 
wrote:
> On Thursday, 22 June 2023 at 00:10:19 UTC, Cecil Ward wrote:
> > Is .reserve()’s argument scaled by the entry size after it is
> > supplied, that is it is quoted in elements or is it in bytes?
> > I’m not sure whether the runtime has a knowledge of the element
> > type so maybe it doesn’t know anything about scale factors, not
> > sure.
>
> length, reserve, and capacity all use the same unit, which is
> elements.
>
> reserve passes a TypeInfo instance to the runtime so that it
> knows the size of the elements. You can see the implementation
> here:
>
> https://github.com/dlang/dmd/blob/v2.104.0/druntime/src/object.d#L3910

To add to that, it _has_ to know the element type, because aside from
anything related to a type's size, it bit-blits the type's init value onto
the new elements when it increases the length of the dynamic array.

You'd probably be dealing with bytes if you were explicitly asking for
memory and the like (e.g. with malloc), but a dynamic array is properly
typed, and everything you do with it in @safe code is going to deal with it
as properly typed. For it to be otherwise would require @system casts.

- Jonathan M Davis






Re: A couple of questions about arrays and slices

2023-06-21 Thread Paul Backus via Digitalmars-d-learn

On Thursday, 22 June 2023 at 00:10:19 UTC, Cecil Ward wrote:
Is .reserve()’s argument scaled by the entry size after it is 
supplied, that is it is quoted in elements or is it in bytes? 
I’m not sure whether the runtime has a knowledge of the element 
type so maybe it doesn’t know anything about scale factors, not 
sure.


length, reserve, and capacity all use the same unit, which is 
elements.


reserve passes a TypeInfo instance to the runtime so that it 
knows the size of the elements. You can see the implementation 
here:


https://github.com/dlang/dmd/blob/v2.104.0/druntime/src/object.d#L3910


Re: A couple of questions about arrays and slices

2023-06-21 Thread Cecil Ward via Digitalmars-d-learn

On Wednesday, 21 June 2023 at 15:48:56 UTC, H. S. Teoh wrote:
On Wed, Jun 21, 2023 at 02:09:26AM +, Cecil Ward via 
Digitalmars-d-learn wrote:

First is an easy one:

1.) I have a large array and a sub-slice which I want to set 
up to be pointing into a sub-range of it. What do I write if I 
know the start and end indices ? Concerned about an off-by-one 
error, I have start_index and past_end_index (exclusive).


array[start_idx .. one_past_end_idx]


2.) I have a dynamic array and I wish to preinitialise its 
alloc cell to be a certain large size so that I don’t need to 
reallocate often initially. I tell myself that I can set the 
.length property. Is that true?


You can use `array.reserve(preinitSize);`.


2a.) And what happens when the cell is extended, is the 
remainder zero-filled or remaining full of garbage, or is the 
size of the alloc cell something separate from the dynamic 
array’s knowledge of the number of valid elements in it ?


The size of the allocated cell is managed by druntime. On the 
user code side, all you know is the slice (start pointer + 
length).  The allocated region outside the current array length 
is not initialized.  Assigning to array.length initializes the 
area that the array occupies after the length has been extended.



T


Is .reserve()’s argument scaled by the entry size after it is 
supplied, that is it is quoted in elements or is it in bytes? I’m 
not sure whether the runtime has a knowledge of the element type 
so maybe it doesn’t know anything about scale factors, not sure.


Re: A couple of questions about arrays and slices

2023-06-21 Thread Cecil Ward via Digitalmars-d-learn
On Wednesday, 21 June 2023 at 04:52:06 UTC, Jonathan M Davis 
wrote:
On Tuesday, June 20, 2023 8:09:26 PM MDT Cecil Ward via 
Digitalmars-d-learn wrote:

[...]


When slicing, the end index is exclusive. e.g.

[...]


Actually concerning the garbage, I was rather hoping that it 
_would_ be garbage so as to avoid the cost of a zero-fill in time 
and also possibly in cache pollution, unless it uses the 
non-temporal cache-friendliness management tech that is found on 
eg x86.


Re: A couple of questions about arrays and slices

2023-06-21 Thread Cecil Ward via Digitalmars-d-learn
On Wednesday, 21 June 2023 at 04:52:06 UTC, Jonathan M Davis 
wrote:
On Tuesday, June 20, 2023 8:09:26 PM MDT Cecil Ward via 
Digitalmars-d-learn wrote:

[...]


When slicing, the end index is exclusive. e.g.

[...]


Thankyou to both posters for your exceptionally helpful and 
generous replies, which will serve as a help to others if I made 
the title searchable. I need to look at the appended and try to 
find some (more) example code.


Re: A couple of questions about arrays and slices

2023-06-21 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Jun 21, 2023 at 02:09:26AM +, Cecil Ward via Digitalmars-d-learn 
wrote:
> First is an easy one:
> 
> 1.) I have a large array and a sub-slice which I want to set up to be
> pointing into a sub-range of it. What do I write if I know the start
> and end indices ? Concerned about an off-by-one error, I have
> start_index and past_end_index (exclusive).

array[start_idx .. one_past_end_idx]


> 2.) I have a dynamic array and I wish to preinitialise its alloc cell
> to be a certain large size so that I don’t need to reallocate often
> initially. I tell myself that I can set the .length property. Is that
> true?

You can use `array.reserve(preinitSize);`.


> 2a.) And what happens when the cell is extended, is the remainder
> zero-filled or remaining full of garbage, or is the size of the alloc
> cell something separate from the dynamic array’s knowledge of the
> number of valid elements in it ?

The size of the allocated cell is managed by druntime. On the user code
side, all you know is the slice (start pointer + length).  The allocated
region outside the current array length is not initialized.  Assigning
to array.length initializes the area that the array occupies after the
length has been extended.


T

-- 
Ph.D. = Permanent head Damage


Re: A couple of questions about arrays and slices

2023-06-20 Thread Jonathan M Davis via Digitalmars-d-learn
On Tuesday, June 20, 2023 8:09:26 PM MDT Cecil Ward via Digitalmars-d-learn 
wrote:
> First is an easy one:
>
> 1.) I have a large array and a sub-slice which I want to set up
> to be pointing into a sub-range of it. What do I write if I know
> the start and end indices ? Concerned about an off-by-one error,
> I have start_index and past_end_index (exclusive).

When slicing, the end index is exclusive. e.g.

auto a = "0123456789abcdef";
assert(a[3 .. 8] == "34567");

As a consequence of that, the end index minus the start index is the length
of the resulting slice. It also means that $ always refers to one past the
end of the array.

> 2.) I have a dynamic array and I wish to preinitialise its alloc
> cell to be a certain large size so that I don’t need to
> reallocate often initially. I tell myself that I can set the
> .length property. Is that true?

Well, dynamic arrays have a length and a capacity. The length is the actual
length of the slice that you're operating on. e.g.

auto a = new int[](100);
assert(a.length == 100);

or

int[] a;
a.length = 100;
assert(a.length == 100);

However, the underlying memory that the dynamic array is a slice of will
generally be larger than the length of the array - in part because of how
the allocator works and in part so that appending to the array will be more
efficient. So, you can see how much space the dynamic array has to grow
before it has to be reallocated when appending to it by using the capacity
property. E.G. When I run this on my system

auto a = new int[](100);
writeln(a.length);
writeln(a.capacity);

I get

100
127

This means that I could append 27 more elements to the array (or otherwise
increase its length by up to 27 elements) without needing a reallocation.
But if the array tries to grow beyond that, then the GC will allocate a new
block of memory, copy the elements over to that, and adjust the slice to
point to the new block of memory (of course leaving any other slices
pointing to the old block of memory).

One thing to note about that is that a dynamic array can only have a
capacity that's larger than its length if it's the furthest slice into that
block of memory, and no other slices have gone further into it (since it's
only safe for the runtime to allow you to append to a dynamic array in-place
when no other dynamic array can possibly refer to the memory after the array
that you're trying to append to). If the dynamic array is not at the end,
then it ends up with a capacity of 0. E.G.

auto a = new int[](100);
auto b = a[0 .. $ - 1];
assert(b.length == 99);
assert(b.capacity == 0);

So, capacity is a bit more complex than the max length that the array could
grow to without needing a reallocation, but you can use it see if appending
to a dynamic array will result in a reallocation.

Another part of this that you of course need to remember is that dynamic
arrays can refer to memory that is not GC-allocated for dynamic arrays (be
it stack-allocated or malloc-allocated or even GC-allocated for a different
purpose), in which case, the capacity will be 0, because the underlying
memory block is not a GC-allocated memory block for dynamic arrays, and
appending anything to the array then has to result in a reallocation so that
it then does point to a GC-allocated block of memory for dynamic arrays.

If you want to create a dynamic array of a specific length while also
ensuring that that there is at least a specific amount of memory in the
underlying memory block for it to grow into, then you need the reserve
function. It allows you to tell the GC how much memory you would like to be
allocated underneath the hood. E.G.

int[] a;
a.reserve(500);
assert(a.length == 0);
writeln(a.capacity);

prints 511 on my system.

https://dlang.org/spec/arrays.html#capacity-reserve

All that being said, if you're trying to minimize reallocations when
appending to a dynamic array, you should consider using
https://dlang.org/phobos/std_array.html#Appender since it has some other
tricks to make it so that it queries the GC less and speeds the whole
process up. But the basic idea of reserving capacity is the same, and when
you're done appending, you get a normal dynamic array out of the deal.

> 2a.) And what happens when the cell is extended, is the remainder
> zero-filled or remaining full of garbage, or is the size of the
> alloc cell something separate from the dynamic array’s knowledge
> of the number of valid elements in it ?

A dynamic array is basically this:

struct DynamicArray(T)
{
size_t length;
T* ptr;
}

It's just a slice of memory and has no clue whatsoever about what it points
to. All of the logic for that comes from the druntime functions that you
call to operate on dynamic arrays (e.g. ~= or capacity). It could be a slice
of a static array, GC-allocated memory, malloced memory, etc. And the GC
doesn't do anything to keep track of all of the dynamic arrays. They're
basically just simple structs sitting on the stack or inside of the memory

A couple of questions about arrays and slices

2023-06-20 Thread Cecil Ward via Digitalmars-d-learn

First is an easy one:

1.) I have a large array and a sub-slice which I want to set up 
to be pointing into a sub-range of it. What do I write if I know 
the start and end indices ? Concerned about an off-by-one error, 
I have start_index and past_end_index (exclusive).


2.) I have a dynamic array and I wish to preinitialise its alloc 
cell to be a certain large size so that I don’t need to 
reallocate often initially. I tell myself that I can set the 
.length property. Is that true?


2a.) And what happens when the cell is extended, is the remainder 
zero-filled or remaining full of garbage, or is the size of the 
alloc cell something separate from the dynamic array’s knowledge 
of the number of valid elements in it ?