Re: byte and short data types use cases

2023-06-09 Thread Cecil Ward via Digitalmars-d-learn

On Friday, 9 June 2023 at 15:07:54 UTC, Murloc wrote:

On Friday, 9 June 2023 at 12:56:20 UTC, Cecil Ward wrote:

On Friday, 9 June 2023 at 11:24:38 UTC, Murloc wrote:

If you have four ubyte variables in a struct and then
an array of them, then you are getting optimal memory usage.


Is this some kind of property? Where can I read more about this?

So you can optimize memory usage by using arrays of things 
smaller than `int` if these are enough for your purposes, but 
what about using these instead of single variables, for example 
as an iterator in a loop, if range of such a data type is 
enough for me? Is there any advantages on doing that?


Read up on ‘structs’ and the ‘align’ attribute in the main d 
docs, on this website. Using smaller fields in a struct that is 
in memory saves RAM if there is an array of such structs. Even in 
the case where there is only one struct, let’s say that you are 
returning a struct by value from some function. If the struct is 
fairly small in total, and the compiler is good (ldc or gdc, not 
dmd - see godbolt.org) then the returned struct can fit into a 
register sometimes, rather than being placed in RAM, when it is 
returned to the function’s caller. Yesterday I returned a struct 
containing four uint32_t fields from a function and it came back 
to the caller in two 64-bit registers, not in RAM. Clearly using 
smaller fields if possible might make it possible for the whole 
struct to be under the size limit for being returned in registers.


As for your question about single variables. The answer is very 
definitely no. Rather, the opposite: always use primary 
CPU-‘natural’ types, widths that are most natural to the 
processor in question. 64-bit cpus will sometimes favour 32-bit 
types an example being x86-64/AMD64, where code handling 32-bit 
ints generates less code (saves bytes in the code segment) but 
the speed and number of instructions is the same on such a 64-bit 
processor where you’re dealing with 32- or 64- bit types. Always 
use size_t for index variables into arrays or the size of 
anything in bytes, never int or uint. On a 64-bit machine such as 
x86-64, size_t is 64-bit, not 32. By using int/uint when you 
should have used size_t you could in theory get a very rare bug 
when dealing with eg file sizes or vast amounts of (virtual) 
memory, say bigger than 2GB (int limit) or 4GB (uint limit) when 
the 32-bit types overflow. There is also a ptrdiff_t which is 
64-bit on a 64-bit cpu, probably not worth bothering with as its 
raison d’être was historical (early 80s 80286 segmented 
architecture, before the 32-bit 386 blew it away).


Re: byte and short data types use cases

2023-06-09 Thread Ali Çehreli via Digitalmars-d-learn

On 6/9/23 08:07, Murloc wrote:

> Where can I read more about this?

I had written something related:

  http://ddili.org/ders/d.en/memory.html#ix_memory..offsetof

The .offsetof appears at that point. The printObjectLayout() function 
example there attempts to visualize the layout of the members of a struct.


Ali



Re: byte and short data types use cases

2023-06-09 Thread Basile B. via Digitalmars-d-learn

On Friday, 9 June 2023 at 15:07:54 UTC, Murloc wrote:

On Friday, 9 June 2023 at 12:56:20 UTC, Cecil Ward wrote:

On Friday, 9 June 2023 at 11:24:38 UTC, Murloc wrote:

If you have four ubyte variables in a struct and then
an array of them, then you are getting optimal memory usage.


Is this some kind of property? Where can I read more about this?


Yes, a classsic resource is 
http://www.catb.org/esr/structure-packing/


So you can optimize memory usage by using arrays of things 
smaller than `int` if these are enough for your purposes,


It's not for arrays, it's also for members

```d
struct S1
{
ubyte a; // offs 0
ulong b; // offs 8
ubyte c; // offs 16
}

struct S2
{
ubyte a; // offs 0
ubyte c; // offs 1
ulong b; // offs 8
}

static assert(S1.sizeof > S2.sizeof); // 24 VS 16
```

this is because you cant do unaligned reads for `b`, but you can 
for `a` and `c`.


but what about using these instead of single variables, for 
example as an iterator in a loop, if range of such a data type 
is enough for me? Is there any advantages on doing that?


Not really the loop variable takes a marginal part of the stack 
space in the current function. You can just use `auto` and let 
the compiler choose the best type.





Re: byte and short data types use cases

2023-06-09 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jun 09, 2023 at 11:24:38AM +, Murloc via Digitalmars-d-learn wrote:
[...]
> Which raised another question: since objects of types smaller than
> `int` are promoted to `int` to use integer arithmetic on them anyway,
> is there any point in using anything of integer type less than `int`
> other than to limit the range of values that can be assigned to a
> variable at compile time?

Not just at compile time, at runtime they will also be fixed to that
width (mapped to a hardware register of that size) and will not be able
to contain a larger value.


[...]
> People say that there is no advantage for using `byte`/`short` type
> for integer objects over an int for a single variable, however, as
> they say, this is not true for arrays, where you can save some memory
> space by using `byte`/`short` instead of `int`.

That's correct.


> But isn't any further manipulations with these array objects will
> produce results of type `int` anyway? Don't you have to cast these
> objects over and over again after manipulating them to write them back
> into that array or for some other manipulations with these smaller
> types objects?

Yes you will have to cast them back.  Casting often translates to a
no-op or just a single instruction in the machine code; you just write
part of a 32-bit register back to memory instead of the whole thing, and
this automatically truncates the value to the narrow int.

The general advice is, perform computations with int or wider, then
truncate when writing back to storage for storage efficiency. So
generally you wouldn't cast the value to short/byte until the very end
when you're about to store the final result back to the array.  At that
point you'd probably also want to do a range check to catch any
potential overflows.


> Some people say that these promoting and casting operations in summary
> may have an even slower overall effect than simply using int, so I'm
> kind of confused about the use cases of these data types... (I think
> that my misunderstanding comes from not knowing how things happen at a
> slightly lower level of abstractions, like which operations require
> memory allocation, which do not, etc. Maybe some resource
> recommendations on that?) Thanks!

I highly recommend taking an introductory course to assembly language,
or finding a book / online tutorial on the subject.  Understanding how
the machine actually works under the hood will help answer a lot of
these questions, even if you'll never actually write a single line of
assembly code.

But in a nutshell: integer data types do not allocate, unless you
explicitly ask for it (e.g. `int* p = new int;` -- but you almost never
want to do this). They are held in machine registers or stored on the
runtime stack, and always occupy a fixed size, so almost no memory
management is needed for them. (Which is also why they're preferred when
you don't need anything more fancy, because they're also super-fast.)
Promoting an int takes at most 1 machine instruction, or, in the case of
unsigned values, sometimes zero instructions. Casting back to a narrow
int is often a no-op (the subsequent code just ignores the upper bits).
The performance difference is negligible, unless you're doing expensive
things like range checking after every operation (generally you don't
need to anyway, usually it's sufficient to check range at the end of a
computation, not at every intermediate step -- unless you have reason to
believe that an intermediate step is liable to overflow or wrap around).


T

-- 
People who are more than casually interested in computers should have at
least some idea of what the underlying hardware is like. Otherwise the
programs they write will be pretty weird. -- D. Knuth


Re: byte and short data types use cases

2023-06-09 Thread Murloc via Digitalmars-d-learn

On Friday, 9 June 2023 at 12:56:20 UTC, Cecil Ward wrote:

On Friday, 9 June 2023 at 11:24:38 UTC, Murloc wrote:

If you have four ubyte variables in a struct and then
an array of them, then you are getting optimal memory usage.


Is this some kind of property? Where can I read more about this?

So you can optimize memory usage by using arrays of things 
smaller than `int` if these are enough for your purposes, but 
what about using these instead of single variables, for example 
as an iterator in a loop, if range of such a data type is enough 
for me? Is there any advantages on doing that?


Re: byte and short data types use cases

2023-06-09 Thread Cecil Ward via Digitalmars-d-learn

On Friday, 9 June 2023 at 11:24:38 UTC, Murloc wrote:
Hi, I was interested why, for example, `byte` and `short` 
literals do not have their own unique suffixes (like `L` for 
`long` or `u` for `unsigned int` literals) and found the 
following explanation:


- "I guess short literal is not supported solely due to the 
fact that anything less than `int` will be "promoted" to `int` 
during evaluation. `int` has the most natural size. This is 
called integer promotion in C++."


Which raised another question: since objects of types smaller 
than `int` are promoted to `int` to use integer arithmetic on 
them anyway, is there any point in using anything of integer 
type less than `int` other than to limit the range of values 
that can be assigned to a variable at compile time? Are these 
data types there because of some historical reasons (maybe 
`byte` and/or `short` were "natural" for some architectures 
before)?


People say that there is no advantage for using `byte`/`short` 
type for integer objects over an int for a single variable, 
however, as they say, this is not true for arrays, where you 
can save some memory space by using `byte`/`short` instead of 
`int`. But isn't any further manipulations with these array 
objects will produce results of type `int` anyway? Don't you 
have to cast these objects over and over again after 
manipulating them to write them back into that array or for 
some other manipulations with these smaller types objects? Or 
is this only useful if you're storing some array of constants 
for reading purposes?


Some people say that these promoting and casting operations in 
summary may have an even slower overall effect than simply 
using int, so I'm kind of confused about the use cases of these 
data types... (I think that my misunderstanding comes from not 
knowing how things happen at a slightly lower level of 
abstractions, like which operations require memory allocation, 
which do not, etc. Maybe some resource recommendations on 
that?) Thanks!


For me there are two use cases for using byte and short, ubyte 
and ushort.


The first is simply to save memory in a large array or neatly fit 
into a ‘hole’ in a struct, say next to a bool which is also a 
byte. If you have four ubyte variables in a struct and then an 
array of them, then you are getting optimal memory usage. In the 
x86 for example the casting operations for ubyte to uint use 
instructions that have zero added cost compared to a normal uint 
fetch. And casting to a ubyte generates no code at all. So the 
costs of casting in total are zero.


The second use-case is where you need to interface to external 
specifications that deman uint8_t (ubyte), or uint16_t (ushort) 
where I am using the standard definitions from std.stdint. These 
types are the in C. If you are interfacing to externally defined 
struct in data structures in ram or in messages, that’s one 
example. The second example is where you need to interface to 
machine code that has registers or operands of 8-bit or 16-bit 
types. I like to use the stdint types for the purposes of 
documentation as it rams home the point that these are truly 
fixed width types and can not change. (And I do know that in D, 
unlike C, int, long etc are of defined fixed widths. Since C 
doesn’t have those guarantees that’s why the C stdint.h is needed 
in C too.) As well as machine code, we could add other high-level 
languages where interfaces are defined in the other language and 
you have to hope that the other language’s type widths don’t 
change.


byte and short data types use cases

2023-06-09 Thread Murloc via Digitalmars-d-learn
Hi, I was interested why, for example, `byte` and `short` 
literals do not have their own unique suffixes (like `L` for 
`long` or `u` for `unsigned int` literals) and found the 
following explanation:


- "I guess short literal is not supported solely due to the fact 
that anything less than `int` will be "promoted" to `int` during 
evaluation. `int` has the most natural size. This is called 
integer promotion in C++."


Which raised another question: since objects of types smaller 
than `int` are promoted to `int` to use integer arithmetic on 
them anyway, is there any point in using anything of integer type 
less than `int` other than to limit the range of values that can 
be assigned to a variable at compile time? Are these data types 
there because of some historical reasons (maybe `byte` and/or 
`short` were "natural" for some architectures before)?


People say that there is no advantage for using `byte`/`short` 
type for integer objects over an int for a single variable, 
however, as they say, this is not true for arrays, where you can 
save some memory space by using `byte`/`short` instead of `int`. 
But isn't any further manipulations with these array objects will 
produce results of type `int` anyway? Don't you have to cast 
these objects over and over again after manipulating them to 
write them back into that array or for some other manipulations 
with these smaller types objects? Or is this only useful if 
you're storing some array of constants for reading purposes?


Some people say that these promoting and casting operations in 
summary may have an even slower overall effect than simply using 
int, so I'm kind of confused about the use cases of these data 
types... (I think that my misunderstanding comes from not knowing 
how things happen at a slightly lower level of abstractions, like 
which operations require memory allocation, which do not, etc. 
Maybe some resource recommendations on that?) Thanks!