Re: Range handling difficulties

2024-04-24 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Apr 24, 2024 at 08:08:06AM +, Menjanahary R. R. via 
Digitalmars-d-learn wrote:
> I tried to solve Project Euler [problem
> #2](https://projecteuler.net/problem=2) using
> [Recurrence/recurrence](https://dlang.org/library/std/range/recurrence.html).
> 
> Assuming `genEvenFibonacci` is the appropriate funtion in Explicit
> form, I got what I need like so:
> 
> ```
> auto evenfib = recurrence!genEvenFibonacci(2uL, 8uL);
> 
> writeln;
> evenfib.take(11).sum.writeln;
> ```
> 
> But that's like cheating because there is no prior knowledge of `11`.
> 
> I just got it manually by peeking at the sequence `[2, 8, 34, 144,
> 610, 2584, 10946, 46368, 196418, 832040, 3524578, 14930352]`.
> 
> `14930352` must be filtered out because beyond the limit set!
> 
> How to fix that properly using all the standard library capabilities
> programatically?
> 
> I'm thinking of Range and/or std.algorithm.

evenfib.until!(n => n > 4_000_000).sum.writeln;


T

-- 
The trouble with TCP jokes is that it's like hearing the same joke over and 
over.


Re: Unittests pass, and then an invalid memory operation happens after?

2024-04-06 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Apr 03, 2024 at 09:57:00PM +, Liam McGillivray via 
Digitalmars-d-learn wrote:
> On Friday, 29 March 2024 at 01:18:22 UTC, H. S. Teoh wrote:
> > Take a look at the docs for core.memory.GC.  There *is* a method
> > GC.free that you can use to manually deallocate GC-allocated memory
> > if you so wish.  Keep in mind, though, that manually managing memory
> > in this way invites memory-related errors. That's not something I
> > recommend unless you're adamant about doing everything the manual
> > way.
> 
> Was this function removed from the library? I don't see it in [the
> document](https://dlang.org/phobos/core_memory.html).

https://dlang.org/phobos/core_memory.html#.GC.free


> How is `GC.free` different from `destroy`?

GC.free is low-level. It does not invoke dtors.


[...]
> When you mention a "flag" to indicate whether they are "live", do you
> mean like a boolean member variable for the `Unit` object? Like `bool
> alive;`?

Yes.


> > My advice remains the same: just let the GC do its job. Don't
> > "optimize" prematurely.  Use a profiler to test your program and
> > identify its real bottlenecks before embarking on these often
> > needlessly complicated premature optimizations that may turn out to
> > be completely unnecessary.
> 
> Alright. I suppose that some of the optimization decisions I have made
> so far may have resulted in less readable code for little performance
> benefit.  Now I'm trying to worry less about optimization. Everything
> has been very fast so far.
> 
> I haven't used a profiler yet, but I may like to try it.

Never make any optimization decisions without a profiler. I learned the
hard way that more often than not, I'm wrong about where my program's
bottleneck is, and that I spend far too much time and effort
"optimizing" things that don't need to be optimized, while totally
missing optimizations where it really matters.  Life is too short to be
wasted on optimizing things that don't really matter.  When it comes to
optimizations, always profile, profile, profile!


[...]
> It's unlikely that I will have multiple maps running simultaneously,
> unless if I do the AI thing mentioned above. I've had a dilemma of
> passing around references to the tile object vs passing around the
> coordinates, as is mentioned in an earlier thread that I started. In
> what way do references slow down performance? Would passing around a
> pair of coordinates to functions be better?

It's not references themselves that slow things down; it's the
likelihood that using reference types when you don't need to can lead to
excessive GC allocations, which in turn causes longer GC pauses.  Well,
excessive dereferencing can also reduce cache coherence, but if you're
already at the level where this actually makes a difference, you don't
my advice anymore. :-D

Generally, if a piece of data is transient and not expected to last very
long (e.g., past the current frame), it probably should be a struct
rather than a class.  There are exceptions, of course, but generally
that's how I'd decide whether something should be a by-value type or a
by-reference type.


T

-- 
"The number you have dialed is imaginary. Please rotate your phone 90 degrees 
and try again."


Re: Inconsistent chain (implicitly converts to int)

2024-04-05 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Apr 05, 2024 at 03:18:09PM +, Salih Dincer via Digitalmars-d-learn 
wrote:
> Hi everyone,
> 
> Technically r1 and r2 are different types of range. Isn't it
> inconsistent to chain both? If not, why is the char type converted to
> int?
[...]

It's not inconsistent if there exists a common type that both range
element types implicit convert to.

The real problem is the implicit conversion of char to int, which I have
been against for a long time.  Walter, however, disagrees.


T

-- 
What's worse than raining cats and dogs?  Hailing taxis!


Re: Unittests pass, and then an invalid memory operation happens after?

2024-03-28 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Mar 28, 2024 at 11:49:19PM +, Liam McGillivray via 
Digitalmars-d-learn wrote:
> On Thursday, 28 March 2024 at 04:46:27 UTC, H. S. Teoh wrote:
> > The whole point of a GC is that you leave everything up to it to
> > clean up.  If you want to manage your own memory, don't use the GC.
> > D does not force you to use it; you can import core.stdc.stdlib and
> > use malloc/free to your heart's content.
> > 
> > Unpredictable order of collection is an inherent property of GCs.
> > It's not going away.  If you don't like it, use malloc/free instead.
> > (Or write your own memory management scheme.)
> 
> I disagree with this attitude on how the GC should work. Having to
> jump immediately from leaving everything behind for the GC to fully
> manual memory allocation whenever the GC becomes a problem is a
> problem, which gives legitimacy to the common complaint of D being
> "garbage-collected". It would be much better if the garbage collector
> could be there as a backup for when it's needed, while allowing the
> programmer to write code for object destruction when they want to
> optimize.

Take a look at the docs for core.memory.GC.  There *is* a method GC.free
that you can use to manually deallocate GC-allocated memory if you so
wish.  Keep in mind, though, that manually managing memory in this way
invites memory-related errors. That's not something I recommend unless
you're adamant about doing everything the manual way.


> > > Anyway, I suppose I'll have to experiment with either manually
> > > destroying every object at the end of every unittest, or just
> > > leaving more to the GC. Maybe I'll make a separate `die` function
> > > for the units, if you think it's a good idea.
> > 
> > I think you're approaching this from a totally wrong angle. (Which I
> > sympathize with, having come from a C/C++ background myself.)  The whole
> > point of having a GC is that you *don't* worry about when an object is
> > collected.  You just allocate whatever you need, and let the GC worry
> > about cleaning up after you. The more you let the GC do its job, the
> > better it will be.
> 
> Now you're giving me conflicting advice. I was told that my current
> destructor functions aren't acceptable with the garbage collector, and you
> specifically tell me to leave things to the GC. But then I suggest that I
> "leave more to the GC" and move everything from the Unit destructor to a
> specialized `die` function that can be called instead of `destroy` whenever
> they must be removed from the game, which as far as I can see is the only
> way to achieve the desired game functionality while following your and
> Steve's advice and not having dangling references. But in response to that,
> you tell me "I think you're approaching this from the wrong angle". And then
> right after that, you *again* tell me to "just let the GC worry about
> cleaning up after you"? Even if I didn't call `destroy` at all during my
> program, as far as I can see, I would still need the `die` function
> mentioned to remove a unit on death.

I think you're conflating two separate concepts, and it would help to
distinguish between them.  There's the lifetime of a memory-allocated
object, which is how long an object remains in the part of the heap
that's allocated to it.  It begins when you allocate the object with
`new`, and ends with the GC finds that it's no longer referenced and
collects it.

There's a different lifetime that you appear to be talking about: the
logical lifetime of an in-game object (not to be confused with an
"object" in the OO sense, though the two may overlap).  The (game)
object gets created (comes into existence in the simulated game world)
at a certain point in game time, until something in the game simulation
decides that it should no longer exist (it got destroyed, replaced with
another object, whatever). At that point, it should be removed from the
game simulation, and that's probably also what you have in mind when you
mentioned your "die" function.

And here's the important point: the two *do not need to coincide*.
Here's a concrete example of what I mean. Suppose in your game there's
some in-game mechanic that's creating N objects per M turns, and another
mechanic that's destroying some of these objects every L turns.  If you
map these creations/destructions with the object lifetime, you're
looking at a *lot* of memory allocations and deallocations throughout
the course of your game.  Memory allocations and deallocations can be
costly; this can become a problem if you're talking about a large number
of objects, or if they're being created/destroyed very rapidly (e.g.,
they are fragments flying out from explosions).  Since most of these
objects are identical in type, one way of optimizing the code is to
preallocate them: before starting your main loop, say you allocate an
array of say, 100 objects. Or 1000 or 1, however many you anticipate
you'll need. These objects aren't actually in the game world yet; you're

Re: Difference between chunks(stdin, 1) and stdin.rawRead?

2024-03-28 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Mar 28, 2024 at 10:10:43PM +, jms via Digitalmars-d-learn wrote:
> On Thursday, 28 March 2024 at 02:30:11 UTC, jms wrote:
[...]
> I think I figured it out and the difference is probably in the mode.
> This documentation
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fread?view=msvc-170
> mentions that "If the given stream is opened in text mode,
> Windows-style newlines are converted into Unix-style newlines. That
> is, carriage return-line feed (CRLF) pairs are replaced by single line
> feed (LF) characters."
> 
> And rawRead's documention mentions that "rawRead always reads in
> binary mode on Windows.", which I guess should have given me a clue.
> chunks must be using text-mode.

It's not so much that chunks is using text-mode, but that you opened the
file in text mode.  On Windows, if you don't want crlf translation you
need to open your file with File(filename, "rb"), not just File(filename
"r"), because the latter defaults to text mode.


T

-- 
There's light at the end of the tunnel. It's the oncoming train.


Re: Opinions on iterating a struct to absorb the decoding of a CSV?

2024-03-28 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Mar 28, 2024 at 05:23:39PM +, Andy Valencia via Digitalmars-d-learn 
wrote:
[...]
> auto t = T();
> foreach (i, ref val; t.tupleof) {
> static if (is(typeof(val) == int)) {
> val = this.get_int();
> } else {
> val = this.get_str();
> }
> }
> return t;
> 
> So you cue off the type of the struct field, and decode the next CSV
> field, and put the value into the new struct.
> 
> Is there a cleaner way to do this?  This _does_ work, and gives me
> very compact code.

This is pretty clean, and is a good example of DbI. I use the same
method in my fastcsv experimental module to transcribe csv to an array
of structs:

https://github.com/quickfur/fastcsv


T

-- 
Today's society is one of specialization: as you grow, you learn more and more 
about less and less. Eventually, you know everything about nothing.


Re: Unittests pass, and then an invalid memory operation happens after?

2024-03-27 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Mar 28, 2024 at 03:56:10AM +, Liam McGillivray via 
Digitalmars-d-learn wrote:
[...]
> I may be now starting to see why the use of a garbage collector is
> such a point of contention for D. Not being able to predict how the
> garbage collection process will happen seems like a major problem.

If you want it to be predictable, simply:

import core.memory;
GC.disable();
... // insert your code here
if (timeToCleanup()) {
GC.collect();   // now you know exactly when this happens
}

Of course, you'll have to know exactly how timeToCleanup() should decide
when it's time to collect.  Simple possibilities are once every N units
of time, once every N iterations of some main loop, etc..  Or use a
profiler to decide.


> > As mentioned, GCs do not work this way -- you do not need to worry
> > about cascading removal of anything.
> 
> Wanting to avoid the GC pauses that I hear about, I was trying to
> optimize object deletion so that the GC doesn't have to look for every
> object individually. It sounds like what I'm hearing is that I should
> just leave everything to the GC. While I can do this without really
> hurting the performance of my program (for now), I don't like this.

The whole point of a GC is that you leave everything up to it to clean
up.  If you want to manage your own memory, don't use the GC. D does not
force you to use it; you can import core.stdc.stdlib and use malloc/free
to your heart's content.


> I hope that solving the unpredictable destruction pattern is a
> priority for the developers of the language. This problem in my
> program wouldn't be happening if either *all* of the objects had their
> destructors called or *none* of them did.

Unpredictable order of collection is an inherent property of GCs. It's
not going away.  If you don't like it, use malloc/free instead. (Or
write your own memory management scheme.)


> Anyway, I suppose I'll have to experiment with either manually
> destroying every object at the end of every unittest, or just leaving
> more to the GC.  Maybe I'll make a separate `die` function for the
> units, if you think it's a good idea.
[...]

I think you're approaching this from a totally wrong angle. (Which I
sympathize with, having come from a C/C++ background myself.)  The whole
point of having a GC is that you *don't* worry about when an object is
collected.  You just allocate whatever you need, and let the GC worry
about cleaning up after you. The more you let the GC do its job, the
better it will be.

Now of course there are situations where you need deterministic
destruction, such as freeing up system resources as soon as they're no
longer needed (file descriptors, OS shared memory segments allocations,
etc.). For these you would manage the memory manually (e.g. with a
struct that implements reference counting or whatever is appropriate).

As far as performance is concerned, a GC actually has higher throughput
than manually freeing objects, because in a fragmented heap situation,
freeing objects immediately when they go out of use incurs a lot of
random access RAM roundtrip costs, whereas a GC that scans memory for
references can amortize some of this cost to a single period of time.

Now somebody coming from C/C++ would immediately cringe at the thought
that a major GC collection might strike at the least opportune time. For
that, I'd say:

(1) don't fret about it until it actually becomes a problem. I.e., your
program is slow and/or has bad response times, and the profiler is
pointing to GC collections as the cause. Then you optimize appropriately
with the usual practices for GC optimization: preallocate before your
main loop, avoid frequent allocations of small objects (prefer to use
structs rather than classes), reuse previous allocations instead of
allocating new memory when you know that an existing object is no longer
used.  In D, you can also selectively allocate certain troublesome
objects with malloc/free instead (mixing both types of allocations is
perfectly fine in D; we are not Java where you're forced to use the GC
no matter what).

(2) Use D's GC control mechanisms to exercise some control over when
collections happen. By default, collections ONLY ever get triggered if
you try to allocate something and the heap has run out of memory.  Ergo,
if you don't allocate anything, GC collections are guaranteed not to
happen.  Use GC.disable and GC.collect to control when collections
happen.  In one of my projects, I got a 40% performance boost by using
GC.disable and using my own schedule of GC.collect, because the profiler
revealed that collections were happening too frequently.  The exact
details how what to do will depend on your project, of course, but my
point is, there are plenty of tools at your disposal to exercise some
degree of control.

Or if (1) and (2) are not enough for your particular case, you can
always resort to the nuclear option: slap @nogc on main() and use

Re: Unittests pass, and then an invalid memory operation happens after?

2024-03-27 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Mar 27, 2024 at 09:43:48PM +, Liam McGillivray via 
Digitalmars-d-learn wrote:
[...]
> ```
> ~this() {
> this.alive = false;
> if (this.map !is null) this.map.removeUnit(this);
> if (this.faction !is null) this.faction.removeUnit(this);
> if (this.currentTile !is null) this.currentTile.occupant = null;
> }
> ```
[...]

What's the definition of this.map, this.faction, and this.currentTile?

If any of them are class objects, this would be the cause of your
problem.  Basically, when the dtor runs, there is no guarantee that any
referenced classed objects haven't already been collected by the GC. So
if you try to access them, it will crash with an invalid memory access.

In general, it's a bad idea to do anything that relies on the dtor being
run in a particular order, because the GC can collect dead objects in
any order.  It's also illegal to perform any GC memory-related
operations inside a dtor (like allocate memory, free memory, etc.)
because the GC is not reentrant.

If you need deterministic clean up of your objects, you should do it
before the last reference to the object is deleted.


T

-- 
He who does not appreciate the beauty of language is not worthy to bemoan its 
flaws.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-16 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Mar 16, 2024 at 09:16:51PM +, Liam McGillivray via 
Digitalmars-d-learn wrote:
> On Friday, 15 March 2024 at 00:21:42 UTC, H. S. Teoh wrote:
[...]
> > When dealing with units of data smaller than a byte, you generally
> > need to do it manually, because memory is not addressable by
> > individual bits, making it difficult to implement things like
> > slicing an array of bool.
[...]
> I'm curious as to what "manual implementation" would mean, since
> clearly making my own struct with `bool[3]` doesn't count. Does D have
> features for precise memory manipulation?

Manual implementation as in you would deal with the machine
representation in terms of bytes, or more likely, uints (on modern CPUs
even though bytes are individually addressible, the hardware actually
works in terms of a larger unit, typically an 4-byte 32-bit unit, or an
8-byte 64-bit unit), using bitwise operators to manipulate the bits the
way you want to.


> Anyway, I'm surprised that D has a special operator `&=` for doing bit
> manipulation on integers, especially given that the steps to convert
> an int into a bool array is more complicated. I would imagine the
> former would be a rather niche thing.

You should understand that bitwise operators are directly implemented in
hardware, and thus operators like &, |, ^, <<, >>, ~, etc., typically
map directly to individual CPU instructions. As such, they are very
fast, and preferred when you're doing bit-level manipulations.  At this
level, you typically do not work with individual bits per se, but with
machine words (typically 32-bit or 64-bit units).  Bitwise operators
operate on all 32 or 64 bits at once, so performance-aware code
typically manipulates all these bits simultaneously rather than
individually.  Of course, using suitable bit-masking you *can* address
individual bits, but the hardware instructions themselves typically work
with all 32/64 bits at once.

Here's a simple example. Suppose you have 3 bits you want to store.
Since the machine doesn't have a 3-bit built-in type, you typically just
use the next larger available size, either a ubyte (8 bits) if you want
compact storage, or if compactness isn't a big issue just a uint (32
bits, you just ignore the other 29 bits that you don't need). So you'd
declare the storage something like this:

uint myBits;

Bits are usually indexed from 0, so bit 0 is the first position, bit 1
is the second position, and so on.  So to set the first bit to 1, you'd
do:

myBits |= 0b001;

Note that at the machine level, this operator works on all 32 bits at
the same time. Most of the bits remain unchanged, though, because
bitwise OR does not change the original value if the operand is 0. So
the overall effect is that the first bit is set.

To set the first bit to 0, there isn't a direct operator that does that,
but you can take advantage of the behaviour of bitwise AND, in which
any bit which is 0 in the operand will get cleared, everything else
remains unchanged. So you'd do this:

myBits &= 0b110;

Now, since we don't really care about the other 29 bits, we could write
this as follows instead, to make our intent clearer:

myBits &= ~0b001;

The ~ operator flips all the bits, so this is equivalent to writing:

myBits &= ~0b_______1110;

Writing it with ~ also has the advantage that should we later decide to
add another bit to our "bit array", we don't have to update the code;
whereas if we'd used `myBits &= 0b110;` then we'd need to change it to
`myBits &= 0b1110;` otherwise our new 4th bit may get unexpectedly
cleared when we only wanted to clear the first bit.

Now, what if we wanted to set both the 1st and 3rd bits?  In a
hypothetical bit array implementation, we'd do the equivalent of:

bool[3] myBits;
myBits[0] = 1;
myBits[2] = 1;

However, in our uint approach, we can cut the number of operations by
half, because the CPU is already operating on the entire 32 bits of the
uint at once -- so there's no need to have two instructions to set two
individual bits when we could just do it all in one:

myBits |= 0b101; // look, ma! both bits set at once!

Similarly, to clear the 1st and 3rd bits simultaneously, we simply
write:

myBits &= ~0b101; // clear both bits in 1 instruction!

Of course, when we only have 3 bits to work with, the savings isn't that
significant.  However, if you have a larger bit array, say you need an
array of 32 bits, this can speed your code up by 32x, because you're
taking advantage of the fact that the hardware is already operating on
all 32 bits at the same time.  On 64-bit CPUs, you can speed it up by
64x because the CPU operates on all 64 bits simultaneously, so you can
manipulate an entire array of 64 bits in a single instruction, which is
64x faster than if you looped over an array of bool with 64 iterations.


T

-- 
Without outlines, life would be pointless.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-14 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Mar 14, 2024 at 11:39:33PM +, Liam McGillivray via 
Digitalmars-d-learn wrote:
[...]
> I tried to rework the functions to use bitwise operations, but it was
> difficult to figure out the correct logic. I decided that it's not
> worth the hassle, so I just changed the value storage from `bool[3]`
> to `ubyte`.
[...]

Just wanted to note that while in theory bool[3] could be optimized by
the compiler for compact storage, what you're most likely to get is 3
bytes, one for each bool, or perhaps even 3 ints (24 bytes). When
dealing with units of data smaller than a byte, you generally need to do
it manually, because memory is not addressable by individual bits,
making it difficult to implement things like slicing an array of bool.
So the compiler is most likely to simplify things by making it an array
of bytes rather than emit complex bit manipulation code to make up for
the lack of bit-addressability in the underlying hardware.

Using bit operators like others have pointed out in this thread is
probably the best way to implement what you want.


T

-- 
LINUX = Lousy Interface for Nefarious Unix Xenophobes.


Re: varargs when they're not all the same type?

2024-03-14 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Mar 14, 2024 at 08:58:21PM +, Andy Valencia via Digitalmars-d-learn 
wrote:
> On Thursday, 14 March 2024 at 18:05:59 UTC, H. S. Teoh wrote:
> > ...
> > The best way to do multi-type varags in D is to use templates:
> > 
> > import std;
> > void myFunc(Args...)(Args args) {
> 
> Thank you.  The first parenthetical list is of types, is it not?  I
> can't find anywhere which says what "type" is inferred for "Args..."?
> (gdb pretends like "arg" is not a known symbol.)  Is it basically a
> tuple of the suitable type?
[...]

The first set of parenthesis specify compile-time arguments. The
specification `Args...` means "zero or more types".  So it could be any
list of types, which naturally would be chosen according to the
arguments given. For example, to pass an int and a float, you'd do
something like:

myFunc!(int, float)(123, 3.14159f);

and to pass a string, two ints, and a char, you'd write:

myFunc!(string, int, int, char)("abc", 123, 456, 'z');

Having to specify types manually, of course, is a lot of unnecessary
typing, since the compiler already knows what the types are based on
what you write in the second pair of parentheses.  For this reason,
typical D code will omit the first pair of parentheses (the `!(...)`,
that is, the compile-time arguments) and just let the compiler infer the
types automatically:

myFunc(123, 3.14159f); // compiler figures out Args = (int, float)
myFunc("abc", 123, 456, 'z'); // compiler figures out Args = (string, 
int, int, char)


T

-- 
A program should be written to model the concepts of the task it performs 
rather than the physical world or a process because this maximizes the 
potential for it to be applied to tasks that are conceptually similar and, more 
important, to tasks that have not yet been conceived. -- Michael B. Allen


Re: varargs when they're not all the same type?

2024-03-14 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Mar 14, 2024 at 05:57:21PM +, Andy Valencia via Digitalmars-d-learn 
wrote:
> Can somebody give me a starting point for understanding varadic
> functions?  I know that we can declare them
> 
>   int[] args...
> 
> and pick through whatever the caller provided.  But if the caller
> wants to pass two int's and a _string_?  That declaration won't permit
> it.
> 
> I've looked into the formatter, and also the varargs implementation.
> But it's a bit of a trip through a funhouse full of mirrors.  Can
> somebody describe the basic language approach to non-uniform varargs,
> and then I can take it the rest of the way reading the library.
[...]

The best way to do multi-type varags in D is to use templates:

import std;
void myFunc(Args...)(Args args) {
foreach (i, arg; args) {
writefln("parameter %d is a %s with value %s",
i, typeof(arg), arg);
}
}

void main() {
myFunc(123, 3.14159, "blah blah", [ 1, 2, 3 ], new Object());
}

D also supports C-style varags (without templates), but it's not
recommended because it's not type-safe. You can find the description in
the language docs.


T

-- 
"Maybe" is a strange word.  When mom or dad says it it means "yes", but when my 
big brothers say it it means "no"! -- PJ jr.


Re: Hidden members of Class objects

2024-03-06 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Mar 06, 2024 at 11:39:13PM +, Carl Sturtivant via 
Digitalmars-d-learn wrote:
> I notice that a class with no data members has a size of two words (at
> 64 bits). Presumably there's a pointer to a table of virtual
> functions, and one more. Is the Vtable first?
[...]
> What is actually in these objects using that space?

In D, there's a pointer to the vtable and another pointer to a Monitor
object (used for synchronized methods).  There was talk about getting
rid of the Monitor field years ago, but nothing has happened yet.


T

-- 
MAS = Mana Ada Sistem?


Re: Array types and index types

2024-02-27 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Feb 28, 2024 at 03:00:55AM +, Liam McGillivray via 
Digitalmars-d-learn wrote:
> In D, it appears that dynamic arrays (at least by default) use a ulong
> as their key type. They are declared like this:
> ```
> string[] dynamicArray;
> ```
> 
> I imagine that using a 64-bit value as the key would be slower than
> using 32 bits or 16 bits,

Wrong. The machine uses 64 bits internally anyway.  Well, 48 on i386.
But the point is that there is no speed difference.

Also, on 32-bit architectures size_t is aliased to uint, which is 32
bits.


[...]
> So I have some questions:
> 
> Is there a way to declare a dynamic array with a uint, ushort, or ubyte key?

No.


> If there was, would it really be faster?

No.


> Is an associative array with a ushort key faster than a dynamic array
> with a ulong key?

No.


T

-- 
Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG


Re: what was the problem with the old post blit operator already ?

2024-02-14 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Feb 15, 2024 at 02:17:15AM +, Basile B. via Digitalmars-d-learn 
wrote:
> From what I remember, it was that there was no reference to the
> source.  Things got blitted and you had to fix the copy, already
> blitted. Was that the only issue ?

I don't quite remember all of the reasons now. But yeah, one of the
problems with postblit was that you don't have access to the original
copy. That precludes some applications where you need to look up data
from the original or update the original.

And if you have immutable fields they've already been blitted and you
can't fix them anymore, not without casting away immutable and putting
yourself in UB zone.

There may have been other issues with postblit, I don't quite remember
now.


T

-- 
Beware of bugs in the above code; I have only proved it correct, not tried it. 
-- Donald Knuth


Re: length's type.

2024-02-13 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Feb 13, 2024 at 06:36:22PM +, Nick Treleaven via 
Digitalmars-d-learn wrote:
> On Monday, 12 February 2024 at 18:22:46 UTC, H. S. Teoh wrote:
[...]
> > Honestly, I think this issue is blown completely out of proportion.
> > The length of stuff in any language needs to be some type. D decided
> > on an unsigned type. You just learn that and adapt your code
> > accordingly, end of story.  Issues like these can always be argued
> > both ways, and the amount of energy spent in these debates far
> > outweigh the trivial workarounds in code, of which there are many
> > (use std.conv.to for bounds checks, just outright cast it if you
> > know what you're doing (or just foolhardy), use CheckedInt, etc.).
> > And the cost of any change to the type now also far, far outweighs
> > any meager benefits it may have brought.  It's just not worth it,
> > IMNSHO.
> 
> I don't want the type of .length to change, that indeed would be too
> disruptive.  What I want is proper diagnostics like any well-regarded
> C compiler when I mix/implicit convert unsigned and signed types.

I agree, mixing signed/unsigned types in the same expression ought to
require a cast, and error out otherwise. Allowing them to be freely
mixed, or worse, implicitly convert to each other, is just too
error-prone.


> Due to D's generic abilities, it's easier to make wrong assumptions
> about whether some integer is signed or unsigned. But even without
> that, C compilers accepted that this is a task for the compiler to
> diagnose rather than humans, because it is too bug-prone for humans.

Indeed.


T

-- 
Живёшь только однажды.


Re: length's type.

2024-02-12 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Feb 12, 2024 at 07:34:36PM +, bachmeier via Digitalmars-d-learn 
wrote:
> On Monday, 12 February 2024 at 18:22:46 UTC, H. S. Teoh wrote:
> 
> > Honestly, I think this issue is blown completely out of proportion.
> 
> Only for people that don't have to deal with the problems it causes.

I've run into size_t vs int issues many times.  About half the time it
exposed fallacious assumptions on my part about value types. The other
half of the time a simple cast or std.conv.to invocation solved the
problem.

My guess is that most common use of .length in your typical D code is in
(1) passing it to code that expect a length for various reasons, and (2)
in loop conditions to avoid overrunning a buffer or overshooting some
range. (1) is a non-problem, 90% of (2) is solved by using constructs
like foreach() and/or ranges instead of overly-clever arithmetic
involving length, which is almost always wrong or unnecessary.  If you
need to do subtraction with lengths, that's a big red flag that you're
approaching your problem from the wrong POV. About the only time you
need to do arithmetic with lengths is in low-level code like allocators
or array copying, for which you really should be using higher-level
constructs instead.


> > D decided on an unsigned type. You just learn that and adapt your
> > code accordingly, end of story.  Issues like these can always be
> > argued both ways, and the amount of energy spent in these debates
> > far outweigh the trivial workarounds in code, of which there are
> > many (use std.conv.to for bounds checks, just outright cast it if
> > you know what you're doing (or just foolhardy), use CheckedInt,
> > etc.).
> 
> A terrible language is one that makes you expend your energy thinking
> about workarounds rather than solving your problems. The default
> should be code that works. The workarounds should be for cases where
> you want to do something extremely unusual like subtracting from an
> unsigned type and having it wrap around.

Yes, if I had my way, implicit conversions to/from unsigned types should
be a compile error. As should comparisons between signed/unsigned
values.

But regardless, IMNSHO any programmer worth his wages ought to learn
what an unsigned type is and how it works. A person should not be
writing code if he can't even be bothered to learn how the machine
that's he's programming actually works.  To quote Knuth:

People who are more than casually interested in computers should
have at least some idea of what the underlying hardware is like.
Otherwise the programs they write will be pretty weird. -- D.
Knuth

One of the reasons Walter settled on size_t being unsigned is that this
reflects how the hardware actually works.  Computer arithmetic is NOT
highschool arithmetic; you do not have infinite width nor infinite
precision, and you're working with binary, not decimal. This has
consequences, and having the language pretend the distinction doesn't
exist does not solve any problems.

If an architectural astronaut works at such a high level of abstraction
that he doesn't even understand how basic things about the hardware,
like how uint or ulong work and how to use them correctly, maybe he
should be promoted to a managerial role instead of writing code.


T

-- 
You are only young once, but you can stay immature indefinitely. -- azephrahel


Re: length's type.

2024-02-12 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Feb 12, 2024 at 05:26:25PM +, Nick Treleaven via 
Digitalmars-d-learn wrote:
> On Friday, 9 February 2024 at 15:19:32 UTC, bachmeier wrote:
> > It's been discussed many, many times. The behavior is not going to
> > change - there won't even be a compiler warning. (You'll have to
> > check with the leadership for their reasons.)
> 
> Was (part of) the reason because it would disrupt existing code? If
> that was the blocker then editions are the solution.

Honestly, I think this issue is blown completely out of proportion. The
length of stuff in any language needs to be some type. D decided on an
unsigned type. You just learn that and adapt your code accordingly, end
of story.  Issues like these can always be argued both ways, and the
amount of energy spent in these debates far outweigh the trivial
workarounds in code, of which there are many (use std.conv.to for bounds
checks, just outright cast it if you know what you're doing (or just
foolhardy), use CheckedInt, etc.). And the cost of any change to the
type now also far, far outweighs any meager benefits it may have
brought.  It's just not worth it, IMNSHO.


T

-- 
Verbing weirds language. -- Calvin (& Hobbes)


Re: The difference between the dates in years

2024-02-10 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Feb 10, 2024 at 03:53:09PM +, Alexander Zhirov via 
Digitalmars-d-learn wrote:
> Is it possible to calculate the difference between dates in years
> using regular means? Something like that
> 
> 
> ```
> writeln(Date(1999, 3, 1).diffMonths(Date(1999, 1, 1)));
> ```
> 
> At the same time, keep in mind that the month and day matter, because
> the difference between the year, taking into account the month that
> has not come, will be less.
> 
> My abilities are not yet enough to figure it out more elegantly.

IIRC you can just subtract two DateTime's to get a Duration that you can
then convert into whatever units you want.  Only thing is, in this case
conversion to months may not work because months don't have a fixed
duration (they can vary from 28 days to 31 days) so there is no
"correct" way of computing it, you need to program it yourself according
to the exact calculation you want.


T

-- 
PNP = Plug 'N' Pray


Re: std.uni CodepointSet toString

2024-02-08 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Feb 08, 2024 at 06:22:29PM +, Carl Sturtivant via 
Digitalmars-d-learn wrote:
> On Wednesday, 7 February 2024 at 17:11:30 UTC, H. S. Teoh wrote:
> > Do we know why the compiler isn't getting it right?  Shouldn't we be
> > fixing it instead of just turning off elision completely?
> 
> This matter seems to have been an issue for some time.
> https://forum.dlang.org/post/l5e5hm$1177$1...@digitalmars.com

11 years and we still haven't fixed all the problems?!  That's ... wow.

I've recently run into the same problem myself and had to use -allinst
in order to to compile my project.  Maybe I should dustmite it and
submit a report. But given it's been 11 years, I'm not sure if this is
worth my time


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured 
megaliths that address all questions by piling on ridiculous internal links in 
forms which are hideously over-complex." -- Simon St. Laurent on xml-dev


Re: std.uni CodepointSet toString

2024-02-07 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Feb 08, 2024 at 05:44:59AM +1300, Richard (Rikki) Andrew Cattermole via 
Digitalmars-d-learn wrote:
> On 08/02/2024 5:36 AM, Carl Sturtivant wrote:
[...]
> > ```
> > $ dmd --help | grep allinst
> >    -allinst  generate code for all template instantiations
> > ```
> > Unclear exactly how -allinst does this, given type parameters, and
> > it will affect all of the many templates I use in source with
> > CodepointSet.
> > 
> > Can you shed any light?
> 
> Basically the compiler will by default try to elide templates it
> thinks isn't used.
> 
> However it doesn't always get this right, which this flag overrides by
> turning it off.

Do we know why the compiler isn't getting it right?  Shouldn't we be
fixing it instead of just turning off elision completely?


T

-- 
Let's call it an accidental feature. -- Larry Wall


Re: trouble with associative Arrays

2024-01-20 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Jan 20, 2024 at 02:33:24PM +, atzensepp via Digitalmars-d-learn 
wrote:
> Hello,
> 
> I am new with D and want to convert a c program for a csv file manipulation
> with exhaustive dynamic memory mechanics to D .
> 
> When reading a CSV-file line by line I would like to create an associative
> array to get the row values by the value in the second column.
> Although I save the rows in an array (to get different pointers to the
> values) the program below always gives the last row.
[...]

Because .byLine reuses its line buffer.  You want .byLineCopy instead.


T

-- 
Everybody talks about it, but nobody does anything about it!  -- Mark Twain


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-19 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Jan 20, 2024 at 01:35:44AM +0100, Daniel Kozak via Digitalmars-d-learn 
wrote:
[...]
>> Try addressing the points I wrote above and see if it makes a
>> difference.
> 
>I have tried it (all of it) even before you wrote it here, because
>I have completely the same ideas, but to be fair it has almost zero
>effect on speed.
>There is my version (It still use OOP, but I have try it wit
>Printer and Counter to be structs and it has no effect at
>all) [2]https://paste.ofcode.org/38vKWLS8DHRazpv6MTidRJY
>The only difference in speed in the end is caused by hash
>implementation of dlang associative arrays and rust HashMap,
>actually if you modify rust to not used ahash it has almost same
>speed as D
[...]

I'm confused by the chained hashing of the digits. Why is that
necessary?  I would have thought it'd be faster to hash the entire key
instead of computing the hash of each digit and chaining them together.

I looked up Rust's ahash algorithm. Apparently they leverage the CPU's
hardware AES instruction to compute a collision-resistant hash very
quickly.

Somebody should file a bug on druntime to implement this where the
hardware supports it, instead of the current hashOf. For relatively
small keys this would be a big performance boost.


T

-- 
Valentine's Day: an occasion for florists to reach into the wallets of nominal 
lovers in dire need of being reminded to profess their hypothetical love for 
their long-forgotten.


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-19 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jan 19, 2024 at 01:40:39PM +, Renato via Digitalmars-d-learn wrote:
> On Friday, 19 January 2024 at 10:15:57 UTC, evilrat wrote:
[...]
> > Additionally if you comparing D by measuring DMD performance -
> > don't.  It is valuable in developing for fast iterations, but it
> > lacks many modern optimization techniques, for that we have LDC and
> > GDC.
> 
> I tried with DMD again, and yeah, it's much slower.

For anything where performance is even remotely important, I wouldn't
even consider DMD.  It's a well-known fact that it produces suboptimal
executables.  Its only redeeming factor is really only its fast
turnaround time.  If fast turnaround is not important, I would always
use LDC or GDC instead.


> Here's the [current implementation in
> D](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/src/d/src/dencoder.d),
> and the roughly [equivalent Rust
> implementation](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/src/rust/phone_encoder/src/main.rs).

Taking a look at this code:

One of the most important thing I found is that every call to
printTranslations allocates a new array (`auto keyValue = new
ubyte[...];`).  Given the large number of recursions involved in this
algorithm, this will add up to quite a lot.  If I were optimizing this
code, I'd look into ways of reducing, if not eliminating, this
allocation.  Observe that this allocation is needed each time
printTranslations recurses, so instead of making separate allocations,
you could put it on a stack. Either with alloca, or with my appendPath()
trick in my version of the code: preallocate a reasonably large buffer
and take slices of it each time you need a new keyValue array.

Secondly, your hash function looks suspicious. Why are you chaining your
hash per digit?  That's a lot of hash computations.  Shouldn't you just
hash the entire key each time?  That would eliminate the need to store a
custom hash in your key, you could just lookup the entire key at once.

Next, what stood out is ISolutionHandler.  If I were to write this, I
wouldn't use OO for this at all, and especially not interfaces, because
they involve a double indirection.  I'd just return a delegate instead
(single indirection, no object lookup).  This is a relatively small
matter, but when it's being used inside a hot inner loop, it could be
important.

Then a minor point: I wouldn't use Array in printTranslations. It's
overkill for what you need to do; a built-in array would work just fine.
Take a look at the implementation of Array and you'll see lots of
function calls and locks and GC root-adding and all that stuff. Most of
it doesn't apply here, of course, and is compiled out. Nevertheless, it
uses wrapped integer operations and range checks, etc.. Again, these are
all minor issues, but in a hot inner loop they do add up. Built-in
arrays let you literally just bump the pointer when adding an element.
Just a couple of instructions as opposed to several function calls.
Important difference when you're on the hot path.  Now, as I mentioned
earlier w.r.t. my own code, appending to built-in arrays comes with a
cost. So here's where you'd optimize by creating your own buffer and
custom push/pop operations. Something like appendPath() in my version of
the code would do the job.

Finally, a very a small point: in loadDictionary, you do an AA lookup
with `n in result`, and then if that returns null, you create a new
entry. This does two AA lookups, once unsuccessfully, and the second
time to insert the missing key.  You could use the .require operation
with a delegate instead of `in` followed by `if (... is null)`, which
only requires a single lookup.  Probably not an important point, but for
a large dictionary this might make a difference.


> The only "significant" difference is that in Rust, an enum
> `WordOrDigit` is used to represent currently known "words"... I [did
> try using that in
> D](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-int128-word-and-digit/src/d/src/dencoder.d),
> but it made the algorithm slower.
> 
> If you see anything in D that's not as efficient as it should be, or
> somehow "inferior" to what the Rust version is doing , please let me
> know.

Couldn't tell you, I don't know Rust. :-D


> Notice that almost all of the time is spent in the for-loop inside
> `printTranslations` (which is misnamed as it doesn't necessarily
> "print" anything, like it did earlier) - the rest of the code almost
> doesn't matter.
[...]

Of course, that's where your hot path is.  And that loop makes recursive
calls to printTranslations, so the entire body of the function could use
some optimization. ;-)

Try addressing the points I wrote above and see if it makes a
difference.


T

-- 
The two rules of success: 1. Don't tell everything you know. -- YHL


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-18 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Jan 18, 2024 at 04:23:16PM +, Renato via Digitalmars-d-learn wrote:
[...]
> Ok, last time I'm running this for someone else :D
> 
> ```
> Proc,Run,Memory(bytes),Time(ms)
> ===> ./rust
> ./rust,23920640,30
> ./rust,24018944,147
> ./rust,24068096,592
> ./rust,24150016,1187
> ./rust,7766016,4972
> ./rust,8011776,46101
> ===> src/d/dencoder
> src/d/dencoder,44154880,42
> src/d/dencoder,51347456,87
> src/d/dencoder,51380224,273
> src/d/dencoder,51462144,441
> src/d/dencoder,18644992,4414
> src/d/dencoder,18710528,43548
> ```

OK, this piqued my interest enough that I decided to install rust using
rustup instead of my distro's package manager.  Here are the numbers I
got for my machine:

===> ./rust
./rust,22896640,35
./rust,22896640,137
./rust,22384640,542
./rust,22896640,1034
./rust,8785920,2489
./rust,8785920,12157
===> src/d/dencoder
src/d/dencoder,1066799104,36
src/d/dencoder,1066799104,72
src/d/dencoder,1066799104,198
src/d/dencoder,1066799104,344
src/d/dencoder,1035292672,2372
src/d/dencoder,1035292672,13867

Looks like we lost out to Rust for larger inputs. :-D  Probably due to
environmental factors (and the fact that std.stdio is slow).  I re-ran
it again and got this:

===> ./rust
./rust,22896640,30
./rust,22896640,131
./rust,22896640,511
./rust,22896640,983
./rust,8785920,3102
./rust,8785920,9912
===> src/d/dencoder
src/d/dencoder,1066799104,36
src/d/dencoder,1066799104,71
src/d/dencoder,1066799104,197
src/d/dencoder,1066799104,355
src/d/dencoder,1035292672,3441
src/d/dencoder,1035292672,9471

Notice the significant discrepancy between the two runs; this seems to
show that the benchmark is only accurate up to about ±1.5 seconds.

Anyway, oddly enough, Java seems to beat Rust on larger inputs.  Maybe
my Java compiler has a better JIT implementation? :-P


> Congratulations on beating Rust :D but remember: you're using a much
> more efficient algorithm! I must conclude that the Rust translation of
> the Trie algorithm would be much faster still, unfortunately (you may
> have noticed that I am on D's side here!).

At this point, it's not really about the difference between languages
anymore; it's about the programmer's skill at optimizing his code.

Traditionally Java is thought to be the slowest, because it runs in a VM
and generally tends to use more heap allocations.  In recent times,
however, JIT and advanced GC implementations have significantly levelled
that out, so you're probably not going to see the difference unless you
hand-tweak your code down to the bare metal.

Surprisingly, at least on my machine, Lisp actually performed the worst.
I'd have thought it would at least beat Java, but I was quite wrong. :-D
Perhaps the Lisp implementation I'm using is suboptimal, I don't know.
Or perhaps modern JVMs have really overtaken Lisp.

Now I'm really curious how a Rust version of the trie algorithm would
perform.  Unfortunately I don't know Rust so I wouldn't be able to write
it myself. (Hint, hint, nudge, nudge ;-)).

As far as the performance of my D version is concerned, I still haven't
squeezed out all the performance I could yet.  Going into this, my
intention was to take the lazy way of optimizing only what the profiler
points out to me, with the slight ulterior motive of proving that a
relatively small amount of targeted optimizations can go a long way at
making the GC a non-problem in your typical D code. ;-)  I haven't
pulled out all the optimization guns at my disposal yet.

If I were to go the next step, I'd split up the impl() function so that
I get a better profile of where it's spending most of its time, and then
optimize that.  My current suspicion is that the traversal of the trie
could be improved by caching intermediate results to eliminate a good
proportion of recursive calls in impl().

Also, the `print` mode of operation is quite slow, probably because
writefln() allocates. (It allocates less than if I had used .format like
I did before, but it nevertheless still allocates.) To alleviate this
cost, I'd allocate an output buffer and write to that, flushing only
once it filled up.

Another thing I could do is to use std.parallelism.parallel to run
searches on batches of phone numbers in parallel. This is kinda
cheating, though, since it's the same algorithm with the same cost,
we're just putting more CPU cores to work. :-P  But in D this is quite
easy to do, often as easy as simply adding .parallel to your outer
foreach loop. In this particular case it will need some additional
refactoring due to the fact that the input is being read line by line.
But it's relatively easy to load the input into a buffer by chunks
instead, and just run the searches on all the numbers found in the
buffer in parallel.


On Thu, Jan 18, 2024 at 04:25:45PM +, Renato via Digitalmars-d-learn wrote:
[...]
> BTW here's you main function so it can run on the benchmark:
[...]

Thanks, I've adapted my code accordingly and pushed to my github repo.


T

-- 
This is a tpyo.


Re: Datetime format?

2024-01-18 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Jan 18, 2024 at 11:58:32PM +, zoujiaqing via Digitalmars-d-learn 
wrote:
> On Thursday, 18 January 2024 at 23:43:13 UTC, Jonathan M Davis wrote:
> > On Thursday, January 18, 2024 4:26:42 PM MST zoujiaqing via
> > Digitalmars-d- learn wrote:
> > > ```D
> > > import std.datetime : Clock, format;
> > > import std.stdio : writeln;
> > > 
> > > void main()
> > > {
> > >  auto currentTime = Clock.currTime;
> > > 
> > >  auto formattedTime = currentTime.format("%Y-%m-%d %H:%M:%S");
> > > 
> > >  writeln("Formatted Time: ", formattedTime);
> > > }
> > > ```
[...]
> So shame! The standard library doesn't have date formatting.
[...]

It's easy to write your own:

d
import std;

void main() {
auto curTime = Clock.currTime;
auto dt = cast(DateTime) curTime;
auto fmtTime = format("%04d-%02d-%02d %02d:%02d:%02d",
dt.year, dt.month, dt.day, dt.hour, dt.minute,
dt.second);
writeln(fmtTime);
}


Output:
2024-01-18 16:21:51

You have maximum flexibility to format it however you like.


T

-- 
Computers aren't intelligent; they only think they are.


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-17 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via Digitalmars-d-learn 
wrote:
[...]
> I'll push the code to github.
[...]

Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d


T

-- 
Why do conspiracy theories always come from the same people??


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-17 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Jan 17, 2024 at 07:19:39AM +, Renato via Digitalmars-d-learn wrote:
[...]
> But pls run the benchmarks yourself as I am not going to keep running
> it for you, and would be nice if you posted your solution on a Gist
> for example, pasting lots of code in the forum makes it difficult to
> follow.

I can't. I spent half an hour trying to get ./benchmark.sh to run, but
no matter what it could not compile benchmark_runner. It complains that
my rustc is too old and some dependencies do not support it. I tried
running the suggested cargo update command to pin the versions but none
of them worked.  Since I'm not a Rust user, I'm not feeling particularly
motivated right now to spend any more time on this.  Upgrading my rustc
isn't really an option because that's the version currently in my distro
and I really don't feel like spending more time to install a custom
version of rustc just for this benchmark.


T

-- 
Today's society is one of specialization: as you grow, you learn more and more 
about less and less. Eventually, you know everything about nothing.


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-17 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Jan 17, 2024 at 07:19:39AM +, Renato via Digitalmars-d-learn wrote:
> On Tuesday, 16 January 2024 at 22:13:55 UTC, H. S. Teoh wrote:
> > used for the recursive calls. Getting rid of the .format ought to
> > speed it up a bit. Will try that now...
> > 
> 
> That will make no difference for the `count` option which is where
> your solution was very slow.

Of course it will. Passing the data directly to the callback that bumps
a counter is faster than allocating a new string, formatting the data,
and then passing it to the callback that bumps a counter.  It may not
look like much, but avoiding unnecessary GC allocations means the GC
will have less work to do later when a collection is run, thus you save
time over the long term.


> To run the slow test manually use the `words_quarter.txt` dictionary
> (the phone numbers file doesn't matter much - it's all in the
> dictionary).
> 
> But pls run the benchmarks yourself as I am not going to keep running
> it for you, and would be nice if you posted your solution on a Gist
> for example, pasting lots of code in the forum makes it difficult to
> follow.

I'll push the code to github.


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured 
megaliths that address all questions by piling on ridiculous internal links in 
forms which are hideously over-complex." -- Simon St. Laurent on xml-dev


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-16 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Jan 16, 2024 at 10:15:04PM +, Siarhei Siamashka via 
Digitalmars-d-learn wrote:
> On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:
[...]
> > ... what I am really curious about is what the code I wrote is doing
> > wrong that causes it to run 4x slower than Rust despite doing "the
> > same thing"...
> 
> It's a GC allocations fest.

Indeed.

I have just completed 2 rounds of optimizations of my version of the
code, and both times the profiler also showed the problem to be
excessive allocations in the inner loop.  So, I did the following
optimizations:

1) Get rid of .format in the inner loop. Not only does .format cause a
lot of allocations, it is also a known performance hog.  So instead
of constructing the output string in the search function, I changed it
to take a delegate instead, and the delegate either counts the result or
prints it directly (bypassing the construction of an intermediate
string).  This improved performance quite a bit for the count-only runs,
but also wins some performance even when output is generated.  Overall,
however, this optimization only gave me some minor savings.

2) Changed the `path` parameter from string[] to string, since I didn't
really need it to be an array of strings anyway. This in itself only
improved performance marginally, barely noticeable, but it led to (3),
which gave a huge performance boost.

3) Basically, in the earlier version of the code, the `path` parameter
was appended to every time I recursed, and furthermore the same initial
segment gets appended to many times with different trailers as the
algorithm walks the trie. As a result, this triggers a lot of array
reallocations to store the new strings.  Most of these allocations are
unnecessary, because we already know that the initial segment of the
string will stay constant, only the tail end changes. Furthermore, we
only ever have a single instance of .path at any point in time in the
algorithm.  So we could use a single buffer to hold all of these
instances of .path, and simply return slices to it as we go along,
overwriting the tail end each time we need to append something.

This significantly cut down on the number of allocations, and along
with (1) and (2), performance improved by about 3x (!).  It didn't
completely remove all allocations, but I'm reasonably happy with the
performance now that I probably won't try to optimize it more unless
it's still losing out to another language. ;-)

(I'm especially curious to see if this beats the Rust version. :-P)

Optimized version of the code:

---snip
/**
 * Encoding phone numbers according to a dictionary.
 */
import std;

/**
 * Table of digit mappings.
 */
static immutable ubyte[dchar] digitOf;
shared static this()
{
digitOf = [
'E': 0,
'J': 1, 'N': 1, 'Q': 1,
'R': 2, 'W': 2, 'X': 2,
'D': 3, 'S': 3, 'Y': 3,
'F': 4, 'T': 4,
'A': 5, 'M': 5,
'C': 6, 'I': 6, 'V': 6,
'B': 7, 'K': 7, 'U': 7,
'L': 8, 'O': 8, 'P': 8,
'G': 9, 'H': 9, 'Z': 9,
];
}

/**
 * Trie for storing dictionary words according to the phone number mapping.
 */
class Trie
{
Trie[10] edges;
string[] words;

private void insert(string word, string suffix)
{
const(ubyte)* dig;
while (!suffix.empty &&
   (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null)
{
suffix = suffix[1 .. $];
}

if (suffix.empty)
{
words ~= word;
return;
}

auto node = new Trie;
auto idx = *dig;
if (edges[idx] is null)
{
edges[idx] = new Trie;
}
edges[idx].insert(word, suffix[1 .. $]);
}

/**
 * Insert a word into the Trie.
 *
 * Characters that don't map to any digit are ignored in building the Trie.
 * However, the original form of the word will be retained as-is in the
 * leaf node.
 */
void insert(string word)
{
insert(word, word[]);
}

/**
 * Iterate over all words stored in this Trie.
 */
void foreachEntry(void delegate(string path, string word) cb)
{
void impl(Trie node, string path = "")
{
if (node is null) return;
foreach (word; node.words)
{
cb(path, word);
}
foreach (i, child; node.edges)
{
impl(child, path ~ cast(char)('0' + i));
}
}
impl(this);
}
}

/**
 * Loads the given dictionary into a Trie.
 */
Trie loadDictionary(R)(R lines)
if (isInputRange!R & is(ElementType!R : const(char)[]))
{
Trie result = new Trie;
foreach (line; lines)
{
result.insert(line.idup);
}
return result;
}

///
unittest
{
auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer

Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-16 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Jan 16, 2024 at 09:15:19PM +, Renato via Digitalmars-d-learn wrote:
> On Tuesday, 16 January 2024 at 20:34:48 UTC, H. S. Teoh wrote:
> > On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via
> > Digitalmars-d-learn wrote: [...]
> > > Anyway, I've fixed the problem, now my program produces the exact
> > > same output as Renato's repo. Code is posted below.
> > [...]
> > 
> 
> Great, I ran the benchmarks for you :)
> 
> I had to change how you accept arguments, even though you did "the
> right thing" using `getopt`, the solutions should just take a `count`
> or `print` argument first...

Oops, haha :-P


> Anyway, here's your result:
> 
> ```
> ===> ./rust
> ./rust,24133632,25
> ./rust,24739840,130
> ./rust,24477696,536
> ./rust,25247744,1064
> ./rust,8175616,6148
> ./rust,8306688,8315
> ===> src/d/dencoder
> src/d/dencoder,46055424,43
> src/d/dencoder,96337920,146
> src/d/dencoder,102350848,542
> src/d/dencoder,102268928,1032
> src/d/dencoder,40206336,99936
> ^C
> ```
> 
> It took too long with the `count` option, so I had to abort before the
> last run ended... there's probably some bug there, otherwise the Trie
> runs very fast, as I had expected.
[...]

Do you have the problematic data file handy?  I'd like to look into any
potential bugs.

Also, the profiler revealed that a lot of time was spent in the GC and
in small allocations.  The cause is in all likelihood the .format() call
for each found match, and array append being used for the recursive
calls. Getting rid of the .format ought to speed it up a bit. Will try
that now...


T

-- 
If the comments and the code disagree, it's likely that *both* are wrong. -- 
Christopher


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-16 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via Digitalmars-d-learn 
wrote:
[...]
> Anyway, I've fixed the problem, now my program produces the exact same
> output as Renato's repo. Code is posted below.
[...]

Oops, forgot to actually paste the code. Here it is:

snip
/**
 * Encoding phone numbers according to a dictionary.
 */
import std;

/**
 * Table of digit mappings.
 */
static immutable ubyte[dchar] digitOf;
shared static this()
{
digitOf = [
'E': 0,
'J': 1, 'N': 1, 'Q': 1,
'R': 2, 'W': 2, 'X': 2,
'D': 3, 'S': 3, 'Y': 3,
'F': 4, 'T': 4,
'A': 5, 'M': 5,
'C': 6, 'I': 6, 'V': 6,
'B': 7, 'K': 7, 'U': 7,
'L': 8, 'O': 8, 'P': 8,
'G': 9, 'H': 9, 'Z': 9,
];
}

/**
 * Trie for storing dictionary words according to the phone number mapping.
 */
class Trie
{
Trie[10] edges;
string[] words;

private void insert(string word, string suffix)
{
const(ubyte)* dig;
while (!suffix.empty &&
   (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null)
{
suffix = suffix[1 .. $];
}

if (suffix.empty)
{
words ~= word;
return;
}

auto node = new Trie;
auto idx = *dig;
if (edges[idx] is null)
{
edges[idx] = new Trie;
}
edges[idx].insert(word, suffix[1 .. $]);
}

/**
 * Insert a word into the Trie.
 *
 * Characters that don't map to any digit are ignored in building the Trie.
 * However, the original form of the word will be retained as-is in the
 * leaf node.
 */
void insert(string word)
{
insert(word, word[]);
}

/**
 * Iterate over all words stored in this Trie.
 */
void foreachEntry(void delegate(string path, string word) cb)
{
void impl(Trie node, string path = "")
{
if (node is null) return;
foreach (word; node.words)
{
cb(path, word);
}
foreach (i, child; node.edges)
{
impl(child, path ~ cast(char)('0' + i));
}
}
impl(this);
}
}

/**
 * Loads the given dictionary into a Trie.
 */
Trie loadDictionary(R)(R lines)
if (isInputRange!R & is(ElementType!R : const(char)[]))
{
Trie result = new Trie;
foreach (line; lines)
{
result.insert(line.idup);
}
return result;
}

///
unittest
{
auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

auto app = appender!(string[]);
dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); });
assert(app.data == [
"10: je",
"105513: jemand",
"107: neu",
"1550: Name",
"253302: Wasser",
"35: da",
"38: so",
"400: Fee",
"4021: fern",
"4034: Fest",
"482: Tor",
"4824: fort",
"4824: Torf",
"51: an",
"562: mir",
"562: Mix",
"56202: Mixer",
"78: Bo\"",
"783: bo\"s",
"7857: blau",
"7884: Boot",
"824: Ort",
"83: o\"d"
]);
}

/**
 * Find all encodings of the given phoneNumber according to the given
 * dictionary, and write each encoding to the given sink.
 */
void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink)
if (isOutputRange!(W, string))
{
bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit)
{
if (node is null)
return false;

// Ignore non-digit characters in phone number
while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9'))
suffix = suffix[1 .. $];

if (suffix.empty)
{
// Found a match, print result
foreach (word; node.words)
{
put(sink, format("%s: %-(%s %)", phoneNumber,
 path.chain(only(word;
}
return !node.words.empty;
}

bool ret;
foreach (word; node.words)
{
// Found a matching word, try to match the rest of the phone
// number.
ret = true;
if (impl(dict, suffix, path ~ word, true))
allowDigit = false;
}

if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false))
{
allowDigit = false

Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-16 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Jan 16, 2024 at 06:54:56PM +, Renato via Digitalmars-d-learn wrote:
> On Tuesday, 16 January 2024 at 16:56:04 UTC, Siarhei Siamashka wrote:
[...]
> > You are not allowed to emit "1" as the first token in the output as
> > long as there are any dictionary word matches at that position. The
> > relevant paragraph from the problem statement:

Ohhh now I get it.  Initially I misunderstood that as saying that if the
rest of the phone number has at least one match, then a digit is not
allowed.  Now I see that what it's actually saying is that even if some
random dictionary word matches at that position, even if it does not
lead to any full matches, then a digit is excluded.


[...]
> > I also spent a bit of time trying to figure out this nuance when
> > implementing my solution. It doesn't make much sense visually (no
> > back-to-back digits in the output either way), but that's how it is.
> 
> Exactly, this is one of the things that make this problem a bit
> annoying to solve :)

It's a strange requirement, for sure, but I don't think it's annoying.
It makes the problem more Interesting(tm). ;-)

Anyway, I've fixed the problem, now my program produces the exact same
output as Renato's repo. Code is posted below. Interestingly enough, the
running time has now halved to about 0.9 seconds for 1 million phone
numbers. I guess that's caused by the more stringent requirement
excluding many more matching possibilities, effectively pruning away
large parts of the search tree.


> @"H. S. Teoh" you implemented the solution as a Trie!! Nice, that's
> also what I did when I "participated" in the study. Here's [my Trie
> solution in
> Java](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/java/Main.java).
> 
> These are basically the two common approaches to the problem: a Trie
> or a numeric-based table. According to the study, people who use
> scripting languages almost always go with the numeric approach, while
> people coming from lower level languages tend to use a data structure
> like Trie (if they don't know Trie, they come up with something
> similar which is fascinating), which is harder to implement but more
> efficient in general.

Interesting.  I guess my C/C++ background is showing. ;-)

I'm not sure what exactly motivated me to go this route; I guess it was
just my default preference of choosing the path of least work as far as
the algorithm is concerned: I chose the algorithm that did the least
amount of work needed to produce the right answer.  Scanning through
sections of the dictionary to find a match was therefore excluded; so my
first thought was an AA. But then how much of the initial prefix to look
up an in AA?  Since it can't be known beforehand, I'd have to gradually
lengthen the prefix to search for, which does a lot of repetitive work
(we keep looking up the first few digits repeatedly each time we search
for a longer prefix). Plus, multiple consecutive AA lookups is not
cache-friendly.  So my next thought was, what should I do such that I
don't have to look at the initial digits anymore once I already
processed it?  This line of thought naturally led to a trie structure.

Once I arrived at a trie structure, the next question was how exactly
dictionary entries would be stored in it.  Again, in the vein of doing
the least amount of work I could get away with, I thought, if I stored
words in the trie directly, with each edge encoding a letter, then
during the search I'd have to repeatedly convert letters to the
corresponding phone number digit and vice versa.  So why not do this
conversion beforehand, and store only phone digits in the trie?  This
also had the additional benefit of letting me effectively search
multiple letters simultaneously, since multiple letters map to the same
digit, so scanning a digit is equivalent to searching multiple letters
at the same time.  The output, of course, required the original form of
the words -- so the obvious solution was to attach the original words as
a list of words attached to the trie node representing the end of that
word.

Once this was all decided, the only remaining question was the search
algorithm. This turned out to take the most time in solving this
problem, due to the recursive nature of the search, I had to grapple
with where and how to make the recursive calls, and how to propagate
return values correctly.  The initial implementation only found word
matches, and did not allow the single digits.  Happily, the recursive
algorithm turned out to have enough structure to encode the single digit
requirements as well, although it took a bit of trial and error to find
the correct implementation.


> Can I ask you why didn't you use the [D stdlib
> Trie](https://dlang.org/phobos/std_uni.html#codepointTrie)? Not sure
> that would've worked, but did you consider that?

Haha, I didn't even think of that. :-D  I wouldn't have wanted to use it
anyway, because it was optimized for 

Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-16 Thread H. S. Teoh via Digitalmars-d-learn
P.S. Compiling my program with `ldc -O2`, it runs so fast that I
couldn't measure any meaningful running time that's greater than startup
overhead.  So I wrote a helper program to generate random phone numbers
up to 50 characters long, and found that it could encode 1 million phone
numbers in 2.2 seconds (using the 75,000 entry dictionary from your
repository).  Counting vs. printing the results made no significant
difference to this.


T

-- 
People tell me that I'm skeptical, but I don't believe them.


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-16 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Jan 16, 2024 at 07:50:35AM -0800, H. S. Teoh via Digitalmars-d-learn 
wrote:
[...]
> Unfortunately there seems to be some discrepancy between the output I
> got and the prescribed output in your repository. For example, in your
> output the number 1556/0 does not have an encoding, but isn't "1 Mai 0"
> a valid encoding according to your dictionary and the original problem
> description?
[...]

Also, found a bug in my program that misses some solutions when the
phone number has trailing non-digits. Here's the updated code.  It still
finds extra encodings from the output in your repo, though.  Maybe I
misunderstood part of the requirements?


--snip--
/**
 * Encoding phone numbers according to a dictionary.
 */
import std;

/**
 * Table of digit mappings.
 */
static immutable ubyte[dchar] digitOf;
shared static this()
{
digitOf = [
'E': 0,
'J': 1, 'N': 1, 'Q': 1,
'R': 2, 'W': 2, 'X': 2,
'D': 3, 'S': 3, 'Y': 3,
'F': 4, 'T': 4,
'A': 5, 'M': 5,
'C': 6, 'I': 6, 'V': 6,
'B': 7, 'K': 7, 'U': 7,
'L': 8, 'O': 8, 'P': 8,
'G': 9, 'H': 9, 'Z': 9,
];
}

/**
 * Trie for storing dictionary words according to the phone number mapping.
 */
class Trie
{
Trie[10] edges;
string[] words;

private void insert(string word, string suffix)
{
const(ubyte)* dig;
while (!suffix.empty &&
   (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null)
{
suffix = suffix[1 .. $];
}

if (suffix.empty)
{
words ~= word;
return;
}

auto node = new Trie;
auto idx = *dig;
if (edges[idx] is null)
{
edges[idx] = new Trie;
}
edges[idx].insert(word, suffix[1 .. $]);
}

/**
 * Insert a word into the Trie.
 *
 * Characters that don't map to any digit are ignored in building the Trie.
 * However, the original form of the word will be retained as-is in the
 * leaf node.
 */
void insert(string word)
{
insert(word, word[]);
}

/**
 * Iterate over all words stored in this Trie.
 */
void foreachEntry(void delegate(string path, string word) cb)
{
void impl(Trie node, string path = "")
{
if (node is null) return;
foreach (word; node.words)
{
cb(path, word);
}
foreach (i, child; node.edges)
{
impl(child, path ~ cast(char)('0' + i));
}
}
impl(this);
}
}

/**
 * Loads the given dictionary into a Trie.
 */
Trie loadDictionary(R)(R lines)
if (isInputRange!R & is(ElementType!R : const(char)[]))
{
Trie result = new Trie;
foreach (line; lines)
{
result.insert(line.idup);
}
return result;
}

///
unittest
{
auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

auto app = appender!(string[]);
dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); });
assert(app.data == [
"10: je",
"105513: jemand",
"107: neu",
"1550: Name",
"253302: Wasser",
"35: da",
"38: so",
"400: Fee",
"4021: fern",
"4034: Fest",
"482: Tor",
"4824: fort",
"4824: Torf",
"51: an",
"562: mir",
"562: Mix",
"56202: Mixer",
"78: Bo\"",
"783: bo\"s",
"7857: blau",
"7884: Boot",
"824: Ort",
"83: o\"d"
]);
}

/**
 * Find all encodings of the given phoneNumber according to the given
 * dictionary, and write each encoding to the given sink.
 */
void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink)
if (isOutputRange!(W, string))
{
bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit)
{
if (node is null)
return false;

// Ignore non-digit characters in phone number
while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9'))
suffix = suffix[1 .. $];

if (suffix.empty)
{
// Found a match, print result
foreach (word; node.words)
{
put(sink, format("%s: %-(%s %)", phoneNumber,
 path.chain(only(word;
}
return !node.words.empty;
}


Re: Help optimize D solution to phone encoding problem: extremely slow performace.

2024-01-16 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Jan 15, 2024 at 08:10:55PM +, Renato via Digitalmars-d-learn wrote:
> On Monday, 15 January 2024 at 01:10:14 UTC, Sergey wrote:
> > On Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:
> > > If anyone can find any flaw in my methodology or optmise my code so
> > > that it can still get a couple of times faster, approaching Rust's
> > > performance, I would greatly appreciate that! But for now, my
> > > understanding is that the most promising way to get there would be
> > > to write D in `betterC` style?!
> > 
> > I've added port from Rust in the PR comment. Can you please check
> > this solution?
> > Most probably it need to be optimized with profiler. Just
> > interesting how close-enough port will work.
> 
> As discussed on GitHub, the line-by-line port of the Rust code is 5x
> slower than [my latest solution using
> int128](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/src/d/src/dencoder.d),
> which is itself 3 to 4x slower than the Rust implementation (at around
> the same order of magnitude as algorithm-equivalent Java and Common
> Lisp implementations, D is perhaps 15% faster).
> 
> I did the best I could to make D run faster, but we hit a limit that's
> a bit hard to get past now. Happy to be given suggestions (see
> profiling information in previous messages), but I've run out of ideas
> myself.

This problem piqued my interest, so yesterday and today I worked on it
and came up with my own solution (I did not look at existing solutions
in order to prevent bias).  I have not profiled it or anything, but the
runtime seems quite promising.

Here it is:

-snip--
/**
 * Encoding phone numbers according to a dictionary.
 */
import std;

/**
 * Table of digit mappings.
 */
static immutable ubyte[dchar] digitOf;
shared static this()
{
digitOf = [
'E': 0,
'J': 1, 'N': 1, 'Q': 1,
'R': 2, 'W': 2, 'X': 2,
'D': 3, 'S': 3, 'Y': 3,
'F': 4, 'T': 4,
'A': 5, 'M': 5,
'C': 6, 'I': 6, 'V': 6,
'B': 7, 'K': 7, 'U': 7,
'L': 8, 'O': 8, 'P': 8,
'G': 9, 'H': 9, 'Z': 9,
];
}

/**
 * Trie for storing dictionary words according to the phone number mapping.
 */
class Trie
{
Trie[10] edges;
string[] words;

private void insert(string word, string suffix)
{
const(ubyte)* dig;
while (!suffix.empty &&
   (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null)
{
suffix = suffix[1 .. $];
}

if (suffix.empty)
{
words ~= word;
return;
}

auto node = new Trie;
auto idx = *dig;
if (edges[idx] is null)
{
edges[idx] = new Trie;
}
edges[idx].insert(word, suffix[1 .. $]);
}

/**
 * Insert a word into the Trie.
 *
 * Characters that don't map to any digit are ignored in building the Trie.
 * However, the original form of the word will be retained as-is in the
 * leaf node.
 */
void insert(string word)
{
insert(word, word[]);
}

/**
 * Iterate over all words stored in this Trie.
 */
void foreachEntry(void delegate(string path, string word) cb)
{
void impl(Trie node, string path = "")
{
if (node is null) return;
foreach (word; node.words)
{
cb(path, word);
}
foreach (i, child; node.edges)
{
impl(child, path ~ cast(char)('0' + i));
}
}
impl(this);
}
}

/**
 * Loads the given dictionary into a Trie.
 */
Trie loadDictionary(R)(R lines)
if (isInputRange!R & is(ElementType!R : const(char)[]))
{
Trie result = new Trie;
foreach (line; lines)
{
result.insert(line.idup);
}
return result;
}

///
unittest
{
auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

auto app = appender!(string[]);
dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); });
assert(app.data == [
"10: je",
"105513: jemand",
"107: neu",
"1550: Name",
"253302: Wasser",
"35: da",
"38: so",
"400: Fee",
"4021: fern",
"4034: Fest",
"482: Tor",
"4824: fort",
"4824: Torf",
"51: an",
"562: mir",
"562: Mix",
"56202: Mixer",
"78: Bo\"",
"783: bo\"s",
"7857: blau",
"7884: Boot",
"824: Ort",
"83: o\"d"
]);
}

/**
 * Find all encodings of the given phoneNumber according to the given
 * dictionary, and write each encoding to the given sink.
 */
void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink)

Re: `static` function ... cannot access variable in frame of ...

2024-01-15 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Jan 15, 2024 at 06:16:44PM +, Bastiaan Veelo via 
Digitalmars-d-learn wrote:
> Hey people, I can use some help understanding why the last line
> produces a compile error.
> 
> ```d
> import std.stdio;
> 
> struct S
> {
> static void foo(alias len)()
[...]

The trouble is with the `static` here.  A context pointer is necessary
in order to have access to the context of main() from the body of this
function; but `static` precludes this possibility.


T

-- 
It is of the new things that men tire --- of fashions and proposals and 
improvements and change. It is the old things that startle and intoxicate. It 
is the old things that are young. -- G.K. Chesterton


Re: Doubt about Struct and members

2024-01-08 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Jan 08, 2024 at 05:28:50PM +, matheus via Digitalmars-d-learn wrote:
> Hi,
> 
> I was doing some tests and this code:
> 
> import std;
> 
> struct S{
> string[] s = ["ABC"];
> int i = 123;
> }
[...]

It's not recommended to use initializers to initialize mutable
array-valued members, because it probably does not do what you think it
does.  What the above code does is to store the array ["ABC"] somewhere
in the program's pre-initialized data segment and set s to point to that
by default. It does NOT allocated a new array literal every time you
create a new instance of S; every instance of S will *share* the same
array value unless you reassign it.  As such, altering the contents
array may cause the new contents to show up in other instances of S.

This behaviour is generally harmless if your array is immutable. In
fact, it saves space in your executable by reusing the same data for
multiple instances of s. It also avoids repeated GC allocations at
runtime.

However, if you're banking on each instance of S getting its own copy of
the array, you're in for a surprise. In this case, what you want is to
use a ctor to initialize it rather than the above initializer.


T

-- 
Right now I'm having amnesia and deja vu at the same time. I think I've 
forgotten this before.


Re: Trying to understand map being a template

2024-01-05 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jan 05, 2024 at 08:41:53PM +, Noé Falzon via Digitalmars-d-learn 
wrote:
> On the subject of `map` taking the function as template parameter, I
> was surprised to see it could still be used with functions determined
> at runtime, even closures, etc. I am trying to understand the
> mechanism behind it.

That's simple, if the argument is a runtime function, it is treated as a
function pointer (or delegate).


[...]
> In fact, how can the template be instantiated at all in the following
> example, where no functions can possibly be known at compile time:
> 
> ```
> auto do_random_map(int delegate(int)[] funcs, int[] values)
> {
>   auto func = funcs.choice;
>   return values.map!func;
> }
> ```
[...]

The argument is taken to be a delegate to be bound at runtime. In the
instantiation a shim is inserted to pass along the delegate from the
caller's context.


T

-- 
Creativity is not an excuse for sloppiness.


Re: Pick a class at random

2024-01-03 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Jan 03, 2024 at 04:50:57PM +, axricard via Digitalmars-d-learn 
wrote:
> I have an interface that is implemented by many classes, and I want to
> pick one of these implementations at random. There are two more
> constraints : first the distribution is not uniform, all classes can
> define the chance they have to be picked (this is reflected by the
> function 'weight()' below).  And all classes are not always available,
> this depends on some runtime information.

I would tag each implementation with a compile-time enum and use
compile-time introspection with CRTP[1] to auto-generate the code for
choosing a class according to the desired distribution.

[1] https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern


Something like this:


--SNIP---
import std.stdio;

interface MyIntf {
void work();
}

struct ImplemInfo {
int weight;
MyIntf function() instantiate;
}

ImplemInfo[] implems; // list of implementations
int totalWeight;

MyIntf chooseImplem() {
import std.random;
auto pick = uniform(0, totalWeight);
auto slice = implems[];
assert(slice.length > 0);
while (slice[0].weight <= pick) {
pick -= slice[0].weight;
slice = slice[1 .. $];
}
return slice[0].instantiate();
}

// Base class that uses CRTP to auto-register implementations in
// .implems without needing too much boilerplate in every
// subclass.
class Base(C) : MyIntf {
// Derived class must define a .weight member readable
// at compile-time.
static assert(is(typeof(C.weight) : int),
"Derived class must define .weight");

static this() {
implems ~= ImplemInfo(C.weight, () {
return cast(MyIntf) new C;
});
totalWeight += C.weight;
}

// Derived classes must implement this
abstract void work();
}

// These classes can be anywhere
class Implem1 : Base!Implem1 {
enum weight = 1;
override void work() { writeln(typeof(this).stringof); }
}

class Implem2 : Base!Implem2 {
enum weight = 2;
override void work() { writeln(typeof(this).stringof); }
}

class Implem3 : Base!Implem3 {
enum weight = 3;
override void work() { writeln(typeof(this).stringof); }
}

void main() {
// pipe output of program to `sort | uniq -c` to verify that the
// required distribution is generated correctly.
foreach (_; 0 .. 100) {
auto impl = chooseImplem();
impl.work();
}
}
--SNIP---


--T


Re: D is nice whats really wrong with gc??

2023-12-22 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Dec 22, 2023 at 09:40:03PM +, bomat via Digitalmars-d-learn wrote:
> On Friday, 22 December 2023 at 16:51:11 UTC, bachmeier wrote:
> > Given how fast computers are today, the folks that focus on memory
> > and optimizing for performance might want to apply for jobs as
> > flooring inspectors, because they're often solving problems from the
> > 1990s.
> 
> *Generally* speaking, I disagree. Think of the case of GTA V where
> several *minutes* of loading time were burned just because they
> botched the implementation of a JSON parser.

IMNSHO, if I had very large data files to load, I wouldn't use JSON.
Precompile the data into a more compact binary form that's already ready
to use, and just mmap() it at runtime.


> Of course, this was unrelated to memory management. But it goes to
> show that today's hardware being super fast doesn't absolve you from
> knowing what you're doing... or at least question your implementation
> once you notice that it's slow.

My favorite example is this area is the poor selection of algorithms, a
very common mistake being choosing an O(n²) algorithm because it's
easier to implement than the equivalent O(n) algorithm, and not very
noticeable on small inputs. But on large inputs it slows to an unusable
crawl. "But I wrote it in C, why isn't it fast?!" Because O(n²) is
O(n²), and that's independent of language. Given large enough input, an
O(n) Java program will beat the heck out of an O(n²) C program.


> But that is true for any language, obviously.
>
> I think there is a big danger of people programming in C/C++ and
> thinking that it *must* be performing well just because it's C/C++.
> The C++ codebase I have to maintain in my day job is a really bad
> example for that as well.

"Elegant or ugly code as well as fine or rude sentences have something
in common: they don't depend on the language." -- Luca De Vitis

:-)


> > I say this as I'm in the midst of porting C code to D. The biggest
> > change by far is deleting line after line of manual memory
> > management.  Changing anything in that codebase would be miserable.
> 
> I actually hate C with a passion.

Me too. :-D


> I have to be fair though: What you describe doesn't sound like a
> problem of the codebase being C, but the codebase being crap. :)

Yeah, I've seen my fair share of crap C and C++ codebases. C code that
makes you do a double take and stare real hard at the screen to
ascertain whether it's actually C and not some jokelang or exolang
purposely designed to be unreadable/unmaintainable. (Or maybe it would
qualify as an IOCCC entry. :-D)  And C++ code that looks like ... I
dunno what.  When business logic is being executed inside of a dtor, you
*know* that your codebase has Problems(tm), real big ones at that.



> If you have to delete "line after line" of manual memory management, I
> assume you're dealing with micro-allocations on the heap - which are
> performance poison in any language.

Depends on what you're dealing with.  Some micro-allocations are totally
avoidable, but if you're manipulating a complex object graph composed of
nodes of diverse types, it's hard to avoid. At least, not without
uglifying your APIs significantly and introducing long-term
maintainability issues.  One of my favorite GC "lightbulb" moments is
when I realized that having a GC allowed me to simplify my internal APIs
significantly, resulting in much cleaner code that's easy to debug and
easy to maintain. Whereas the equivalent bit of code in the original C++
codebase would have required disproportionate amounts of effort just to
navigate the complex allocation requirements.

These days my motto is: use the GC by default, when it becomes a
problem, then use a more manual memory management scheme, but *only
where the bottleneck is* (as proven by an actual profiler, not where you
"know" (i.e., imagine) it is).  A lot of C/C++ folk (and I speak from my
own experience as one of them) spend far too much time and energy
optimizing things that don't need to be optimized, because they are
nowhere near the bottleneck, resulting in lots of sunk cost and added
maintenance burden with no meaningful benefit.


[...]
> Of course, this directly leads to the favorite argument of C
> defenders, which I absolutely hate: "Why, it's not a problem if you're
> doing it *right*."
> 
> By this logic, you have to do all these terrible mistakes while
> learning your terrible language, and then you'll be a good programmer
> and can actually be trusted with writing production software - after
> like, what, 20 years of shooting yourself in the foot and learning
> everything the hard way?  :) And even then, the slightest slipup will
> give you dramatic vulnerabilities.  Such a great concept.

Year after year I see reports of security vulnerabilities, the most
common of which are buffer overflows, use-after-free, and double-free.
All of which are caused directly by using a language that forces you to
manage memory manually.  If C were only 10 

Re: D is nice whats really wrong with gc??

2023-12-22 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Dec 22, 2023 at 07:22:15PM +, Dmitry Ponyatov via 
Digitalmars-d-learn wrote:
> > It's called GC phobia, a knee-jerk reaction malady common among
> > C/C++ programmers
> 
> I'd like to use D in hard realtime apps (gaming can be thought as one
> of them, but I mostly mean realtime dynamic multimedia and digital
> signal processing).

For digital signal processing, couldn't you just preallocate beforehand?
Even if we had a top-of-the-line incremental GC I wouldn't want to
allocate wantonly in my realtime code. I'd preallocate whatever I can,
and use region allocators for the rest.


> So, GC in such applications commonly supposed unacceptable. In
> contrast, I can find some PhD theses speaking about realtime GC,
> prioritized message passing and maybe RDMA-based clustering.

I'm always skeptical of general claims like this. Until you actually
profile and identify the real hotspots, it's just speculation.


> Unfortunately, I have no hope that D lang is popular enough that
> somebody in the topic can rewrite its runtime and gc to be usable in
> more or less hard RT apps.

Popularity has nothing to do with it. The primary showstopper here is
the lack of write barriers (and Walter's reluctance to change this).
If we had write barriers a lot more GC options would open up.


T

-- 
What is Matter, what is Mind? Never Mind, it doesn't Matter.


Re: D is nice whats really wrong with gc??

2023-12-18 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Dec 18, 2023 at 04:44:11PM +, Bkoie via Digitalmars-d-learn wrote:
[...]
> but what is with these ppl and the gc?
[...]

It's called GC phobia, a knee-jerk reaction malady common among C/C++
programmers (I'm one of them, though I got cured of GC phobia thanks to
D :-P).  95% of the time the GC helps far more than it hurts.  And the
5% of the time when it hurts, there are plenty of options for avoiding
it in D.  It's not shoved down your throat like in Java, there's no need
to get all worked up about it.


T

-- 
Computerese Irregular Verb Conjugation: I have preferences.  You have biases.  
He/She has prejudices. -- Gene Wirchenko


Re: Is it possible to set/override the name of the source file when piping it into DMD via stdin?

2023-12-13 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Dec 13, 2023 at 11:58:42AM -0800, H. S. Teoh via Digitalmars-d-learn 
wrote:
[...]
> Add a module declaration to your source file. For example:
> 
>   echo 'module abc; import std; void main(){writefln(__MODULE__);}' | dmd 
> -run -
> 
> Output:
>   abc
> 
> `__stdin` is used as a placeholder when no module declaration is
> present, and dmd doesn't know the filename (which is what it would
> normally have used for the module name in this case).
[...]

Hmm, apparently the module declaration doesn't change the placeholder
filename. Using `#line 1 abc.d` does the trick, as Adam suggests.


T

-- 
People tell me I'm stubborn, but I refuse to accept it!


Re: Is it possible to set/override the name of the source file when piping it into DMD via stdin?

2023-12-13 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Dec 13, 2023 at 07:37:09PM +, Siarhei Siamashka via 
Digitalmars-d-learn wrote:
> Example:
> 
> ```D
> import std;
> void main() {
>   deliberate syntax error here
> }
> ```
> 
> ```bash
> $ cat example.d | dmd -run -
> __stdin.d(3): Error: found `error` when expecting `;` or `=`, did you mean
> `deliberate syntax = here`?
> __stdin.d(3): Error: found `}` when expecting `;` or `=`, did you mean
> `error here = End of File`?
> ```
> 
> Now I'm curious. Is it possible to somehow communicate the real source
> file name to `dmd`, so that it shows up in the error log instead of
> "__stdin.d"?

Add a module declaration to your source file. For example:

echo 'module abc; import std; void main(){writefln(__MODULE__);}' | dmd 
-run -

Output:
abc

`__stdin` is used as a placeholder when no module declaration is
present, and dmd doesn't know the filename (which is what it would
normally have used for the module name in this case).


T

-- 
Век живи - век учись. А дураком помрёшь.


Re: union default initialization values

2023-12-05 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Dec 06, 2023 at 04:24:51AM +0900, confuzzled via Digitalmars-d-learn 
wrote:
[...]
> import std.stdio;
> void main()
> {
> F fp;
> fp.lo.writeln; // Why is this not zero? How is this value derived?
> fp.hi.writeln; // expected
> fp.x.writeln;  // expected
> 
> fp.x = 
> 19716939937510315926535.148979323846264338327950288458209749445923078164062862089986280348253421170679;
> fp.lo.writeln;
> fp.hi.writeln;
> fp.x.writefln!"%20.98f"; // Also, why is precision completely lost after
> 16 digits (18 if I change the type of x to real)?
> }
> 
> Sorry if this seem like noise but I genuinely do not understand. What
> changes would I need to make to retain the precision of the value
> provided in the assignment above?
[...]

A `double` type is stored as an IEEE double-precision floating-point
number, which is a 64-bit value containing 1 sign bit, 11 exponent bits,
and 53 mantissa bits (52 stored, 1 implied).  A mantissa of 53 bits can
store up to 2^53 distinct values, which corresponds with log_10(2^53) ≈
15.95 decimal digits. So around 15-16 decimal digits.  (The exponent
bits only affect the position of the decimal point, not the precision of
the value, so they are not relevant here.)

In D, you can use the .dig property to find out approximately how many
of precision a format has (e.g., `writeln(double.dig);` or
`writeln(real.dig);`).

The number you have above is WAY beyond the storage capacity of the
double-precision floating-point format or the 80-bit extended precision
format of `real`.  If you need that level of precision, you probably
want to use an arbitrary-precision floating point library like libgmp
instead of the built-in `double` or `real`.  (Keep in mind that the
performance will be significantly slower, because the hardware only
works with IEEE 64-bit / 8088 80-bit extended precision numbers.
Anything beyond that has to be implemented in software, and will incur
memory management costs as well since the storage size of the number
will not be fixed.)

Also, if you don't understand how floating-point in computers work, I
highly recommend reading this:

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

It's a bit long, but well worth the time to read to understand why
floating-point behaves the way it does.


T

-- 
It is of the new things that men tire --- of fashions and proposals and 
improvements and change. It is the old things that startle and intoxicate. It 
is the old things that are young. -- G.K. Chesterton


Re: anonymous structs within structs

2023-12-04 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Dec 04, 2023 at 11:46:45PM +, DLearner via Digitalmars-d-learn 
wrote:
[...]
> Basically, B corresponds to the whole record (and only a whole record
> can be read).
> But the task only requires Var1 and Var2, the last two fields on the record.
> By putting all the irrelevant fields into A, and defining B as above,
> program remains unpolluted with data it does not need.
[...]

Sounds like what you need is something like this:

struct Record {
struct UnimportantStuff {
...
}
UnimportantStuff unimportant;

struct ImportantStuff {
...
}
ImportantStuff important;
}

ImportantStuff readData() {
Record rec = readData(...); // read entire record
return rec.important; // discard unimportant stuff
}

int main() {
...
ImportantStuff data = readData(); // only important stuff 
returned
processData(data);
...
}


T

-- 
Let X be the set not defined by this sentence...


Re: D: Declaring empty pointer variables that return address inside function calls?

2023-11-23 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Nov 23, 2023 at 07:22:22PM +, BoQsc via Digitalmars-d-learn wrote:
> Is it possible to declare empty pointer variable inside function calls
> and pass its address to the function?
> 
> These are sometimes required while using Win32 - Windows Operating
> System API.
> 
> * Empty pointer variables are used by functions to return information
> after the function is done.
> 
> My own horrible **suggestion** of empty pointer declaration inside
> function call:
> `someFunction(uint & passingEmptyVariableForWrite);`
> 
> What it would do:
> * A new variable is declared inside function call.
> * Address of that variable is passed to the function.
> * After function is done, you can refer to it for returned value.

What's wrong with:

uint* result;
someFunction();
// use *result

?


T

-- 
One Word to write them all, One Access to find them, One Excel to count them 
all, And thus to Windows bind them. -- Mike Champion


Re: Keyword "package" prevents from importing a package module "package.d"

2023-11-02 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Nov 03, 2023 at 12:19:48AM +, Andrey Zherikov via 
Digitalmars-d-learn wrote:
> On Thursday, 2 November 2023 at 19:43:01 UTC, Adam D Ruppe wrote:
> > On Thursday, 2 November 2023 at 19:30:58 UTC, Jonathan M Davis wrote:
> > > The entire reason that it was added to the language was to be able
> > > to split up existing modules without breaking code. And it does that
> > > well.
> > 
> > No, it doesn't do that well at all. In fact, it does that so extremely
> > poorly that (as you might recall) there were a very large number of
> > support requests shortly after Phobos started using it about broken
> > builds, since it would keep the old file and the new file when you
> > updated and this stupid, idiotic design can't handle that situation.
> > 
> > This only subsided because enough time has passed that nobody tries
> > using it to break up existing modules anymore.
> > 
> > It is just a *terrible* design that never should have passed review. It
> > is randomly inconsistent with the rest of the language and this
> > manifests as several bugs.
> > 
> > (including but not limited to:
> > 
> > https://issues.dlang.org/show_bug.cgi?id=14687 doesn't work with .di
> > https://issues.dlang.org/show_bug.cgi?id=17699 breaks if you try to use
> > it for its intended purpose
> > https://issues.dlang.org/show_bug.cgi?id=20563 error messages hit random
> > problems
> >  all-at-once vs separate compilation of package
> > leads to inconsistent reflection results
> > 
> > im sure the list went on if i spent a few more minutes looking for my
> > archives)
> > 
> > 
> > > package.d is indeed completely unnecessary for creating a module
> > > that publicly imports other modules in order to be able to import a
> > > single module and get several modules.
> > 
> > Yeah, it is a terrible feature that is poorly designed, hackily
> > implemented, and serves no legitimate use case.
> 
> Is there any guide how one can refactor single-module package into
> multi-module package with distinction between public and private
> modules?

Supposedly you can do this:

/* Original: */

// pkg/mymodule.d
module mymodule;
... // code here

// main.d
import mymodule;
void main() { ... }

/* Split */

// pkg/mymodule/pub_submod.d
module mymodule.pub_submod;
... // code here

// pkg/mymodule/priv_submod.d
module mymodule.priv_submod;
... // code here

// pkg/mymodule/package.d
module mymodule;
public import priv_submod;

// main.d
import mymodule;
void main() { ... }

Barring the issues listed above, of course.


T

-- 
"The number you have dialed is imaginary. Please rotate your phone 90 degrees 
and try again."


Re: is the array literal in a loop stack or heap allocated?

2023-10-10 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Oct 11, 2023 at 02:54:53AM +, mw via Digitalmars-d-learn wrote:
> Hi,
> 
> I want to confirm: in the following loop, is the array literal `a` vs.
> `b` stack or heap allocated? and how many times?
> 
> void main() {
> 
> int[2] a;

This is stack-allocated. Once per call to the function.


> int[] b;

This is an empty slice. It can refer to either stack or heap memory,
depending on what's assigned to it.


> int i;
> While(++i <=100) {
> 
>   a = [i, i+1];  // array literal

`a` is overwritten in-place once per loop.


>   b = [i, i+1];
[...]

A new array consisting of 2 elements is allocated, once per loop, and
assigned to b each time. Any arrays from previous iterations will be
collected by the GC eventually.


T

-- 
They pretend to pay us, and we pretend to work. -- Russian saying


Re: array setting : Whats going in here?

2023-10-06 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Oct 07, 2023 at 12:00:48AM +, claptrap via Digitalmars-d-learn 
wrote:
> 
> char[] foo;
> foo.length = 4;
> foo[] = 'a'; // ok sets all elements
> foo[] = "a"; // range error at runtime?
> foo[] = "ab"; // range error at runtime?
> 
> So I meant to init with a char literal but accidently used double
> quotes.  Should that even compile? Shouldn't the compiler at least
> complain when trying to init with "ab"?

If you want initialization, don't slice the target array. For example:

char[] foo = "a";

Or:

char[] foo;
...
foo = "a";

When you write `foo[]` you're taking a slice of the array, and in that
case if the lengths of both sides of the assignment don't match, you'll
get a runtime error.


T

-- 
Always remember that you are unique. Just like everybody else. -- despair.com


Re: Setting struct as default parameter of a function using struct literal?

2023-09-11 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Sep 11, 2023 at 10:39:00PM +, Salih Dincer via Digitalmars-d-learn 
wrote:
> On Monday, 11 September 2023 at 22:13:25 UTC, H. S. Teoh wrote:
> > 
> > Because sometimes I want a specific type.
> > 
> 
> it's possible...
> 
> ```d
> alias ST = Options;
> void specificType(ST option = ST())
[...]

This is missing the point.  The point is that I don't want to have to
type `Options` or `ST` twice.  Since the type of the parameter is
already known, the compiler does not need me to repeat the type name. It
already knows enough to figure it out on its own.  "Don't Repeat
Yourself" (DRY).


T

-- 
"Holy war is an oxymoron." -- Lazarus Long


Re: Setting struct as default parameter of a function using struct literal?

2023-09-11 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Sep 11, 2023 at 10:05:11PM +, Salih Dincer via Digitalmars-d-learn 
wrote:
> On Monday, 11 September 2023 at 20:17:09 UTC, H. S. Teoh wrote:
> > 
> > Someone should seriously come up with a way of eliminating the
> > repeated type name in default parameters.
> 
> Why not allow it to be flexible enough by using a template parameter?

Because sometimes I want a specific type.


T

-- 
What did the alien say to Schubert? "Take me to your lieder."


Re: Setting struct as default parameter of a function using struct literal?

2023-09-11 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Sep 11, 2023 at 07:59:37PM +, ryuukk_ via Digitalmars-d-learn wrote:
[...]
> Recent version of D added named arguments so you can do something
> like:
> 
> ```D
> void someFunction(Options option = Options(silenceErrors: false))
> ```
> 
> I don't like the useless repeating "option option option", but that's
> D for you

Someone should seriously come up with a way of eliminating the repeated
type name in default parameters.  It's a constant fly in my otherwise
tasty soup of D.  Every time I have to type that I think about how nice
it would be if we could just write

void someFunction(Options option = .init) {...}

and be done with it.  Or else:

void someFunction(auto options = Options.init) {}

though this is not as good because the `auto` may make it hard to parse
function declarations.


T

-- 
Life would be easier if I had the source code. -- YHL


Re: malloc error when trying to assign the returned pointer to a struct field

2023-09-09 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Sep 09, 2023 at 09:21:32AM +, rempas via Digitalmars-d-learn wrote:
> On Saturday, 9 September 2023 at 08:54:14 UTC, Brad Roberts wrote:
> > I'm pretty sure this is your problem.  You're allocating size bytes
> > which is only going to work where sizeof(T) == 1.  Changing to
> > malloc(size * sizeof(T)) is likely going to work better.
> 
> Oh man That was it! I had forget about that! Funny enough, the
> reallocation tests I do letter when expanding the vector do include
> that but I had forgot to place it in the new (because I had the an old
> one and it included this) constructor I had made that only allocates
> memory!
> 
> Now, if only one could expect how and why "libc" knows that and
> doesn't just care to give me the memory I asked it for? Or it could be
> than D does something additional without telling us? Which can explain
> when this memory is only present when I assign the value to the
> "this._ptr` field!

libc doesn't know what you intended. All it knows is that you asked it
for 20 bytes (even though you actually needed 40), then later on its
internal structures are corrupted (because you thought you got 40 bytes;
storing data past the 20 bytes overwrote some of malloc's internal data
-- this is the buffer overrun / buffer overflow I referred to). So it
aborts the program instead of continuing to run in a compromised state.


T

-- 
There are four kinds of lies: lies, damn lies, and statistics.


Re: malloc error when trying to assign the returned pointer to a struct field

2023-09-08 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Sep 08, 2023 at 06:59:21PM +, rempas via Digitalmars-d-learn wrote:
> On Friday, 8 September 2023 at 16:02:36 UTC, Basile B. wrote:
> > 
> > Could this be a problem of copy construction ?
> 
> I don't think so. The assertion seems to be violated when `malloc` is used.
> And when I assert the result in the `_ptr` field. Really weird...

The error message looks to me like a corruption of the malloc heap.
These kinds of bugs are very hard to trace, because they may go
undetected and only show up in specific circumstances, so small
perturbations of completely unrelated code may make the bug appear or
disappear -- just because the bug doesn't show up when you disable some
code does not prove that that's where the problem is; it could be that
corruption is still happening, it just so happens that it goes unnoticed
when the behaviour of the code changes slightly.

My guess is that you have a double-free somewhere, or there's a buffer
overrun. Or maybe some bad interaction with the GC, e.g. if you tried to
free a pointer from the GC heap. (Note that this may not immediately
show up; free() could've assumed that everything was OK when it has in
fact messed up its internal data structures; the problem would only show
up later on in code that's actually unrelated to the real problem.)

If I were in your shoes I'd use Valgrind / Memcheck to try to find the
real cause of the problem.  Chances are, it may have nothing to do with
the bit of code you quoted at all.  You could try to insert extra
malloc/free's in various places around the code (in places along the
code path, but unrelated to the problematic code) to see if that changes
the behaviour of the bug. If it does, your corruption is likely
somewhere other than the _ptr code you showed.


T

-- 
If the comments and the code disagree, it's likely that *both* are wrong. -- 
Christopher


Re: Keeping data from memory mapped files

2023-09-01 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Sep 01, 2023 at 03:53:42AM +, Alexibu via Digitalmars-d-learn wrote:
> Why do I need to copy data out of memory mapped files to avoid seg faults.
> This defeats the purpose of memory mapped files.
> Shouldn't the GC be able to manage it if I keep a pointer into it.

The GC does not manage memory-mapped files. That's the job of the OS.


> I am closing them because the OS has a limit in how many it can open,
> either way the memory is still there isn't it ?

No, once you close it, the OS will remove the mapping. So when you try
to access that address, it will segfault.  This has nothing to do with
the GC, the page tables that map the memory addresses to the file are
managed by the OS.  By closing it you're basically telling the OS "I
don't need the mapping anymore", so it removes the mapping from your
page tables and that address no longer exists in your process' address
space.  So the next time you try to access it, you will get a segfault.


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be 
algorithms.


Re: Unicode in strings

2023-07-27 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Jul 27, 2023 at 10:15:47PM +, Cecil Ward via Digitalmars-d-learn 
wrote:
> How do I get a wstring or dstring with a code point of 0xA0 in it ?
> That’s a type of space, is it? I keep getting a message from the LDC
> compiler something like "Outside Unicode code space" in my unittests
> when this is the first character in a wstring. I’ve tried all sorts of
> escape sequences but I must simply be misunderstanding the docs. I
> could always copy-paste a real live one into a double quoted string
> and be done with it, I suppose.

D strings are assumed to be encoded in UTF-8 / UTF-16 / UTF-32. So if
you wrote something like `\xA0` in your string will likely generate an
invalid encoding.  Try instead `\u00A0`.


T

-- 
Ph.D. = Permanent head Damage


Re: Pre-expanding alloc cell(s) / reserving space for an associative array

2023-07-10 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Jul 10, 2023 at 09:30:57AM +, IchorDev via Digitalmars-d-learn 
wrote:
[...]
> From the spec it sounds as though (but good luck testing for sure)
> that if you have (for example) 6 big dummy key-value pairs in the AA
> to begin with, then if you use `.clear` it "Removes all remaining keys
> and values from [the] associative array. The array is not rehashed
> after removal, __to allow for the existing storage to be reused.__"
[...]

This is not an accurate understanding of what actually happens.  The AA
implementation consists of a primary hashtable (an array), each slot of
which points to a list of buckets. Clearing the AA does not discard the
hashtable, but does dispose of the buckets, so adding new keys
afterwards will allocate new buckets.  So the buckets used by the dummy
key-value pairs do not get reused without a reallocation.


T

-- 
This is a tpyo.


Re: Dynamic array of strings and appending a zero length array

2023-07-08 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Jul 08, 2023 at 05:15:26PM +, Cecil Ward via Digitalmars-d-learn 
wrote:
> I have a dynamic array of dstrings and I’m spending dstrings to it. At
> one point I need to append a zero-length string just to increase the
> length of the array by one but I can’t have a slot containing garbage.
> I thought about ++arr.length - would that work, while giving me valid
> contents to the final slot ?

Unlike C/C++, the D runtime always ensures that things are initialized
unless you explicitly tell it not to (via void-initialization). So
++arr.length will work; the new element will be initialized to
dstring.init (which is the empty string).


T

-- 
If Java had true garbage collection, most programs would delete themselves upon 
execution. -- Robert Sewell


Re: Bug in usage of associative array: dynamic array with string as a key

2023-06-30 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jun 30, 2023 at 07:05:23PM +, Cecil Ward via Digitalmars-d-learn 
wrote:
[...]

It would help if you could post the complete code that reproduces the
problem. Or, if you do not wish to reveal your code, reduce it to a
minimal case that still exhibits the same problem, so that we can see it
for ourselves.  The snippets you provided do not provide enough
information to identify the problem.


T

-- 
What's the difference between a 4D tube and an overweight Dutchman?  One
is a hollow spherinder, and the other is a spherical Hollander.


Re: Debugging by old fashioned trace log printfs / writefln

2023-06-30 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jun 30, 2023 at 03:43:14PM +, Cecil Ward via Digitalmars-d-learn 
wrote:
[...]
> Since I can pass my main function some compile-time-defined input, the
> whole program should be capable of being executed with CTFE, no? So in
> that case pragma( msg ) should suffice for a test situation? Would
> pragma(message) have the advantage over writefln that I don’t have to
> pervert the function attributes like nogc nothrow pure ?

Just use the `debug` statement:

auto pureFunc(Args args) pure {
...
debug writefln("debug info: %s", ...);
...
}

Compile with `-debug` to enable the writefln during development. When
not compiling with `-debug`, the writefln will not be compiled and the
function will actually be pure.

The problem with pragma(msg) is that it happens very early in the
compilation process; some things may not be available to it, such as the
value of variables in CTFE. This may limit its usefulness in some
situations.  For more details on this, see:

https://wiki.dlang.org/Compile-time_vs._compile-time


T

-- 
He who sacrifices functionality for ease of use, loses both and deserves 
neither. -- Slashdotter


Re: is Dlang support Uniform initialization like c++

2023-06-30 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jun 30, 2023 at 03:18:41PM +, lili via Digitalmars-d-learn wrote:
> struct Point {
>  int x;
>  int y;
>   this(int x, int y) { this.x =x; this.y=y;}
> }
> 
> void addPoint(Point a, Point b) {
>...
> }
> 
> How too wirte this: addPoint({4,5}, {4,6})

addPoint(Point(4,5), Point(4,6));


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured 
megaliths that address all questions by piling on ridiculous internal links in 
forms which are hideously over-complex." -- Simon St. Laurent on xml-dev


Re: static if - unexpected results

2023-06-23 Thread H. S. Teoh via Digitalmars-d-learn

On Friday, 23 June 2023 at 15:22:36 UTC, DLearner wrote:

On Friday, 23 June 2023 at 14:31:45 UTC, FeepingCreature wrote:

On Friday, 23 June 2023 at 14:22:24 UTC, DLearner wrote:

[...]


```
static assert(__traits(isPOD, int)); // ok.
static assert(__traits(isPOD, byte)); // ok.
```
It's a bug in either the spec or the compiler.


I am using
```
DMD64 D Compiler v2.103.0-dirty

```
under
 ```
Windows [Version 10.0.19045.3086]
```

Do I need to report this anywhere?


Tested your original code on latest dmd git master, here's the 
output:


d
char1  is a char
int1  is a struct
foovar1  is a struct
byte1  is a struct


Looks like there isn't a problem? Or at least, it's now fixed in 
git master.


Which exact version of dmd are you using?  Did you download from 
dlang.org or did you build your own?



--T


Re: A couple of questions about arrays and slices

2023-06-21 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Jun 21, 2023 at 02:09:26AM +, Cecil Ward via Digitalmars-d-learn 
wrote:
> First is an easy one:
> 
> 1.) I have a large array and a sub-slice which I want to set up to be
> pointing into a sub-range of it. What do I write if I know the start
> and end indices ? Concerned about an off-by-one error, I have
> start_index and past_end_index (exclusive).

array[start_idx .. one_past_end_idx]


> 2.) I have a dynamic array and I wish to preinitialise its alloc cell
> to be a certain large size so that I don’t need to reallocate often
> initially. I tell myself that I can set the .length property. Is that
> true?

You can use `array.reserve(preinitSize);`.


> 2a.) And what happens when the cell is extended, is the remainder
> zero-filled or remaining full of garbage, or is the size of the alloc
> cell something separate from the dynamic array’s knowledge of the
> number of valid elements in it ?

The size of the allocated cell is managed by druntime. On the user code
side, all you know is the slice (start pointer + length).  The allocated
region outside the current array length is not initialized.  Assigning
to array.length initializes the area that the array occupies after the
length has been extended.


T

-- 
Ph.D. = Permanent head Damage


Re: How does D’s ‘import’ work?

2023-06-18 Thread H. S. Teoh via Digitalmars-d-learn
On Sun, Jun 18, 2023 at 03:51:14PM -0600, Jonathan M Davis via 
Digitalmars-d-learn wrote:
> On Sunday, June 18, 2023 2:24:10 PM MDT Cecil Ward via Digitalmars-d-learn 
> wrote:
> > I wasn’t intending to use DMD, rather ldc if possible or GDC
> > because of their excellent optimisation, in which DMD seems
> > lacking, is that fair? (Have only briefly looked at dmd+x86 and
> > haven’t given DMD’s back end a fair trial.)

My experience with D for the past decade or so has consistently shown
that executables produced by LDC or GDC generally run about 40% faster
than those produced by DMD. Especially with CPU-intensive computations.
This is just the hard fact.

Of course, for some applications like shell-script replacements (which,
incidentally, D is really good at -- once your script passes the level
of complexity beyond which writing a shell script just becomes
unmanageable), the difference doesn't really matter, and I'd use DMD
just for faster compile times.

The one thing the DMD backend is really good at, is compiling stuff
*really* fast. LDC has been catching up in this department, but
currently DMD still wins the fast compilation time race, by quite a lot.
So it's very useful for fast turnaround when you're coding.  But for
release builds, LDC and GDC are your ticket.


> In general, dmd is fantastic for its fast compilation speed. So, it
> works really well for developing whatever software you're working on
> (whereas ldc and gdc are typically going to be slower at compiling).
> And depending on what you're doing, the code is plenty fast. However,
> if you want to maximize the efficiency of your code, then you
> definitely want to be building the binaries that you actually use or
> release with ldc or gdc.
[...]

Yeah, LDC/GDC are really good at producing optimized executables, but
they do take a long time to do it. (Probably 'cos it's a hard problem!)
So for development -- DMD.  For final release build -- GDC/LDC.


T

-- 
If it tastes good, it's probably bad for you.


Re: ldc link error on new machine: undefined reference to `_D6object9Throwable7messageMxFNbNfZAxa'

2023-06-14 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Jun 15, 2023 at 12:49:30AM +, mw via Digitalmars-d-learn wrote:
> Hi,
> 
> I switched to a different machine to build my project, suddenly I got
> lots of link errors. (It builds fine on the old machine, and my
> software version are the same on both machines LDC - the LLVM D
> compiler (1.32.2))

Recently encountered a similar problem, ultimately the cause was that my
library paths turned out to be wrongly set, so it was picking up the
wrong version of the precompiled libraries.  Probably you could check
whether the library paths in ldc2.conf are set correctly, and also
double-check whether the libraries at those paths are actually the
correct ones for your compiler version (you may have installed the wrong
libraries in the right paths).  Mixing up libraries from different LDC
releases tend to show up as link errors of this kind.


T

-- 
The computer is only a tool. Unfortunately, so is the user. -- Armaphine, K5


Re: byte and short data types use cases

2023-06-10 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Jun 10, 2023 at 09:58:12PM +, Cecil Ward via Digitalmars-d-learn 
wrote:
> On Friday, 9 June 2023 at 15:07:54 UTC, Murloc wrote:
[...]
> > So you can optimize memory usage by using arrays of things smaller
> > than `int` if these are enough for your purposes, but what about
> > using these instead of single variables, for example as an iterator
> > in a loop, if range of such a data type is enough for me? Is there
> > any advantages on doing that?
> 
> A couple of other important use-cases came to me. The first one is
> unicode which has three main representations, utf-8 which is a stream
> of bytes each character can be several bytes, utf-16 where a character
> can be one or rarely two 16-bit words, and utf32 - a stream of 32-bit
> words, one per character. The simplicity of the latter is a huge deal
> in speed efficiency, but utf32 takes up almost four times as memory as
> utf-8 for western european languages like english or french. The
> four-to-one ratio means that the processor has to pull in four times
> the amount of memory so that’s a slowdown, but on the other hand it is
> processing the same amount of characters whichever way you look at it,
> and in utf8 the cpu is having to parse more bytes than characters
> unless the text is entirely ASCII-like.
[...]

On contemporary machines, the CPU is so fast that memory access is a
much bigger bottleneck than processing speed. So unless an operation is
being run hundreds of thousands of times, you're not likely to notice
the difference. OTOH, accessing memory is slow (that's why the memory
cache hierarchy exists). So utf8 is actually advantageous here: it fits
in a smaller space, so it's faster to fetch from memory; more of it can
fit in the CPU cache, so less DRAM roundtrips are needed. Which is
faster.  Yes you need extra processing because of the variable-width
encoding, but it happens mostly inside the CPU, which is fast enough
that it generally outstrips the memory roundtrip overhead. So unless
you're doing something *really* complex with the utf8 data, it's an
overall win in terms of performance. The CPU gets to do what it's good
at -- running complex code -- and the memory cache gets to do what it's
good at: minimizing the amount of slow DRAM roundtrips.


T

-- 
It said to install Windows 2000 or better, so I installed Linux instead.


Re: byte and short data types use cases

2023-06-09 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jun 09, 2023 at 11:24:38AM +, Murloc via Digitalmars-d-learn wrote:
[...]
> Which raised another question: since objects of types smaller than
> `int` are promoted to `int` to use integer arithmetic on them anyway,
> is there any point in using anything of integer type less than `int`
> other than to limit the range of values that can be assigned to a
> variable at compile time?

Not just at compile time, at runtime they will also be fixed to that
width (mapped to a hardware register of that size) and will not be able
to contain a larger value.


[...]
> People say that there is no advantage for using `byte`/`short` type
> for integer objects over an int for a single variable, however, as
> they say, this is not true for arrays, where you can save some memory
> space by using `byte`/`short` instead of `int`.

That's correct.


> But isn't any further manipulations with these array objects will
> produce results of type `int` anyway? Don't you have to cast these
> objects over and over again after manipulating them to write them back
> into that array or for some other manipulations with these smaller
> types objects?

Yes you will have to cast them back.  Casting often translates to a
no-op or just a single instruction in the machine code; you just write
part of a 32-bit register back to memory instead of the whole thing, and
this automatically truncates the value to the narrow int.

The general advice is, perform computations with int or wider, then
truncate when writing back to storage for storage efficiency. So
generally you wouldn't cast the value to short/byte until the very end
when you're about to store the final result back to the array.  At that
point you'd probably also want to do a range check to catch any
potential overflows.


> Some people say that these promoting and casting operations in summary
> may have an even slower overall effect than simply using int, so I'm
> kind of confused about the use cases of these data types... (I think
> that my misunderstanding comes from not knowing how things happen at a
> slightly lower level of abstractions, like which operations require
> memory allocation, which do not, etc. Maybe some resource
> recommendations on that?) Thanks!

I highly recommend taking an introductory course to assembly language,
or finding a book / online tutorial on the subject.  Understanding how
the machine actually works under the hood will help answer a lot of
these questions, even if you'll never actually write a single line of
assembly code.

But in a nutshell: integer data types do not allocate, unless you
explicitly ask for it (e.g. `int* p = new int;` -- but you almost never
want to do this). They are held in machine registers or stored on the
runtime stack, and always occupy a fixed size, so almost no memory
management is needed for them. (Which is also why they're preferred when
you don't need anything more fancy, because they're also super-fast.)
Promoting an int takes at most 1 machine instruction, or, in the case of
unsigned values, sometimes zero instructions. Casting back to a narrow
int is often a no-op (the subsequent code just ignores the upper bits).
The performance difference is negligible, unless you're doing expensive
things like range checking after every operation (generally you don't
need to anyway, usually it's sufficient to check range at the end of a
computation, not at every intermediate step -- unless you have reason to
believe that an intermediate step is liable to overflow or wrap around).


T

-- 
People who are more than casually interested in computers should have at
least some idea of what the underlying hardware is like. Otherwise the
programs they write will be pretty weird. -- D. Knuth


Re: How does D’s ‘import’ work?

2023-05-31 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, May 31, 2023 at 06:43:52PM +, Cecil Ward via Digitalmars-d-learn 
wrote:
> Is there an explanation of how D’s ‘import’ works somewhere? I’m
> trying to understand the comparison with the inclusion of .h files,
> similarities if any and differences with the process.

Unlike C's #include, `import` does NOT paste the contents of the
imported file into the context of `import`, like #include would do.
Instead, it causes the compiler to load and parse the imported file,
placing the parsed symbols into a separate symbol table dedicated for
that module (in D, a file == a module). These symbols are then pulled
into the local symbol table so that they become available to code
containing the import declaration.

(There's a variation, `static import`, that does the same thing except
the last step of pulling symbols into the local symbol table. So the
symbols will not "pollute" the current namespace, but are still
accessible via their fully-qualified name (FQN), i.e., by the form
`pkg.mod.mysymbol`, for a symbol `mysymbol` defined in the module
`pkg.mod`, which in turn is a module under the package `pkg`.)

For more information:

https://tour.dlang.org/tour/en/basics/imports-and-modules
https://dlang.org/spec/module.html


T

-- 
People who are more than casually interested in computers should have at least 
some idea of what the underlying hardware is like. Otherwise the programs they 
write will be pretty weird. -- D. Knuth


Re: How get struct value by member name string ?

2023-05-29 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, May 30, 2023 at 01:24:46AM +, John Xu via Digitalmars-d-learn wrote:
> On Monday, 29 May 2023 at 11:21:11 UTC, Adam D Ruppe wrote:
> > On Monday, 29 May 2023 at 09:35:11 UTC, John Xu wrote:
> > > Error: variable `column` cannot be read at compile time
> > 
> > you should generally getMember on a variable
> > 
> > T t;
> > __traits(getMember, t, "name")
> > 
> > like that, that's as if you wrote t.name
> 
> It seems I can't use variable as member name:
> 
> struct T {int a; string name;}
> T t;
> string s = "name";
> writeln(__traits(getMember, t, s));
> 
> Above code fails to compile. Any help?

Short answer:

`s` must be known at compile-time.  Or more precisely, known at the time
of template expansion. In this case, use `enum`:

enum s = "name";


Long answer:
https://wiki.dlang.org/Compile-time_vs._compile-time


T

-- 
Which is worse: ignorance or apathy? Who knows? Who cares? -- Erich Schubert


Re: Concepts like c++20 with specialized overload resolution.

2023-05-27 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, May 27, 2023 at 05:49:27PM +, vushu via Digitalmars-d-learn wrote:
> On Saturday, 27 May 2023 at 16:38:43 UTC, Steven Schveighoffer wrote:
[...]
> > void make_lava(T)(ref T lava) if (hasMagma!T) {
> > lava.magma();
> > }
> > 
> > void make_lava(T)(ref T lava_thing) if (!hasMagma!T){
> > lava_thing.try_making_lava();
> > }
[...]
> I see thanks for the example :), I think this probably the closest
> equivalent i dlang.

You can also use static if inside the function, which will give you an
if-then-else structure:

void make_lava(T)(ref T lava) {
static if (hasMagma!T) {
lava.magma();
} else {
lava_thing.try_making_lava();
}
}


T

-- 
Written on the window of a clothing store: No shirt, no shoes, no service.


Re: Proper way to handle "alias this" deprecation for classes

2023-05-10 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, May 10, 2023 at 10:57:13PM +, Chris Piker via Digitalmars-d-learn 
wrote:
> On Wednesday, 10 May 2023 at 20:25:48 UTC, H. S. Teoh wrote:
> > On Wed, May 10, 2023 at 07:56:10PM +, Chris Piker via
> > Digitalmars-d-learn wrote: [...]
> > I also suffer from left/right confusion, and always have to pause to
> > think about which is the right(!) word before uttering it.
> Oh, I though was the only one with that difficulty.  Glad to hear I'm
> not alone. :-)

:-)


> I have a tendency to think of things by their purpose when programming
> but not by their location on the line or page.  So terms such as
> "writable" versus "ephemeral" or "addressable" versus "temporary" (or
> "register"), make so much more sense to me.

Yeah TBH I was never a fan of the lvalue/rvalue terminology. In a
hypothetical language where the arguments to an assignment operator is
reversed, the terminology would become needlessly confusing. E.g., if
there was an operator `X => Y;` that means "assign the value of X to Y",
then the roles of lvalues/rvalues would be reversed.


> Back on the ref issue for a moment... I'd imagine that asking the
> compiler to delay creating a writable variable until it finds out that
> a storage location is actually needed by subsequent statements, is a
> tall order. So D chose to introduce programmers to lvalues and rvalues
> head-on, instead of creating a leaky abstraction.

It depends on how you look at it. The very concept of a variable in
memory is actually already an abstraction. Modern compilers may
enregister variables or even completely elide them. Assignments may be
reordered, and the CPU may execute things out-of-order (as long as
semantics are preserved). Intermediate values may not get stored at all,
but get folded into the larger computation and perhaps merged with some
other operation with the resulting compound operation mapped to a single
CPU instruction, etc.. So in that sense the compiler is quite capable of
figuring out what to do...

But what it can't do is read the programmer's mind to deduce the intent
of his code. Exact semantics must be somehow conveyed to the compiler,
and sad to say humans aren't very good at being exact. Often we *think*
we know exactly what the computation is, but in reality we gloss over
low-level details that will make a big difference in the outcome of the
computation in the corner cases. The whole rvalue/lvalue business is
really more a way of conveying to the compiler what exactly must happen,
rather than directly corresponding to any actual feature in the
underlying physical machine.


T

-- 
Computerese Irregular Verb Conjugation: I have preferences.  You have biases.  
He/She has prejudices. -- Gene Wirchenko


Re: Proper way to handle "alias this" deprecation for classes

2023-05-10 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, May 10, 2023 at 07:56:10PM +, Chris Piker via Digitalmars-d-learn 
wrote:
[...]
> My problem with the terms lvalue and rvalue is much more basic, and is
> just a personal one that only affects probably 0.1% of people.  I just
> can't keep left vs. right straight in real life.  "Right" in my head
> always means "correct".
> 
> My daughter hates it when I'm telling her which way to turn the car
> since I've said the wrong direction so many times. :)

I also suffer from left/right confusion, and always have to pause to
think about which is the right(!) word before uttering it. :-D  Would
compass directions be more helpful? (wvalue vs. evalue) Or would it
suffer from the same problem?

(One could retroactively rationalize it as *w*ritable value vs.
*e*phemeral value. :-P)


T

-- 
By understanding a machine-oriented language, the programmer will tend to use a 
much more efficient method; it is much closer to reality. -- D. Knuth


Re: Proper way to handle "alias this" deprecation for classes

2023-05-10 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, May 10, 2023 at 03:24:48PM +, Chris Piker via Digitalmars-d-learn 
wrote:
[...]
> It's off topic, but I forget why managing memory for rvalues* was
> pushed onto the programmer and not handled by the compiler.  I'm sure
> there is a good reason but it does seem like a symmetry breaking
> requirement.
> 
> --
> *or was it lvalues, I can never keep the two separate.  Wish the some
> other terminology was adopted long ago, such as "named" vs.
> "ephemeral".

x   =   y;
^   ^
|   |
lvalue  rvalue

An lvalue is simply something that can appear on the *l*eft side of an
assignment statement, and an rvalue is something that appears on the
*r*ight side of an assignment statement.

It seems trivially obvious, but has far-reaching consequences. For one
thing, to be an lvalue means that you must be able to assign a value to
it. I.e., it must be a variable that exists somewhere in memory; `1 =
x;` is illegal because `1` is a literal with no memory associated with
it, so you cannot assign a new value to it.

For something to be an rvalue means that it's a value like `1` that may
not necessarily have a memory address associated with it. For example,
the value of a computation is an rvalue:

// This is OK:
x = y + 1;

// This is not OK:
(y + 1) = x;

The value of a computation cannot be assigned to, it makes no sense.
Therefore, given an rvalue, you are not guaranteed that assignment is
legal.

Note however, that given an lvalue, you can always get an rvalue out of
it. In the first example above, `y` can be an lvalue because it's a
variable with a memory location. However, it can also be used as an
rvalue.  Or, if you like, `x = y;` contains an implicit "cast" of y to
an rvalue.  But you can never turn an rvalue back into an lvalue.


T

-- 
It's bad luck to be superstitious. -- YHL


Re: quick question, probably of little importance...

2023-04-26 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Apr 26, 2023 at 11:07:39PM +, WhatMeWorry via Digitalmars-d-learn 
wrote:
> On Wednesday, 26 April 2023 at 23:02:07 UTC, Richard (Rikki) Andrew
> Cattermole wrote:
> > Don't forget ``num % 2 == 0``.
> > 
> > None should matter, pretty much all production compilers within the
> > last 30 years should recognize all forms of this and do the right
> > thing.
> 
> Thanks. Fastest reply ever! And I believe across the world?   I
> suppose my examples required overhead of a function call. So maybe num
> % 2 == 0 is fastest?

If performance matters, you'd be using an optimizing compiler. And
unless you're hiding your function implementation behind a .di, almost
all optimizing compilers would inline it, so you shouldn't even be able
to tell the difference.


T

-- 
Without outlines, life would be pointless.


Re: D style - member functions

2023-04-26 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Apr 26, 2023 at 06:24:08PM +, DLearner via Digitalmars-d-learn 
wrote:
> Consider:
> ```
> struct S1 {
>int A;
>int B;
>int foo() {
>   return(A+B);
>}
> }
> 
> struct S2 {
>int A;
>int B;
> }
> int fnAddS2(S2 X) {
>return (X.A + X.B);
> }
> 
> void main() {
>import std.stdio : writeln;
> 
>S1 Var1 = S1(1, 2);
>writeln("Total Var1 = ", Var1.foo());
> 
>S2 Var2 = S2(1, 2);
>writeln("Total Var2 = ", fnAddS2(Var2));
> 
>return;
> }
> ```
> 
> Of the two ways shown of producing the total from the same underlying
> structure, which is the better style?

Either way works, it doesn't really matter.  The slight difference is
that the member function is preferred when resolving a symbol, so if
there's a module-level function called `foo` that takes S1 as a
parameter, the member function would be called instead.


> Further, do we care about the situation where there are many variables
> of type 'S', which presumably means the function code generated from
> S1 gets duplicated many times, but not so with S2?

This is false. Member functions are only instantiated once in the entire
program, not with every instance of S.

(Template functions may be instantiated more than once, but that's still
only once per combination of template arguments, not once per instance
of the enclosing type.)


T

-- 
MAS = Mana Ada Sistem?


Re: How can a function pointer required to be extern(C)?

2023-04-12 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Apr 12, 2023 at 08:23:51PM +, rempas via Digitalmars-d-learn wrote:
> Sorry if the title doesn't make any sense, let me explain. So, I do have the
> following code that does not compile:
> 
> ```d
> import core.sys.posix.pthread; /* The library */
> 
> struct Thread {
> private:
>   pthread_t thread_id;
> 
> public:
>   this(void* function(void*) func, void* arg = null, scope
> const(pthread_attr_t*) attr = null) {
> pthread_create(_id, attr, func, arg);
>   }
> 
>   @property:
> pthread_t id() { return this.thread_id; }
> }
> 
> ```
> 
> Yes, I'm trying to "encapsulate" the Pthread (POSIX threads) API.
> Normally, the function pointer that is passed to "pthread_create" must
> be "extern(C)" and this is the complaining that the compile does. So,
> I'm thinking to replace the constructor to this:
> 
> ```d
> this(extern(C) void* function(void*) func, void* arg = null,
>  scope const(pthread_attr_t*) attr = null)
> { pthread_create(_id, attr, func, arg); }
> ```
> 
> I just added "extern(C)" before the type. This is how it looks in the
> error message so it must work right? Well... it doesn't. And here I am
> wondering why. Any ideas?

IMO this is a bug either in D's syntax or in the parser.  I'd file an
enhancement request.

In the meantime, you can use alias as a workaround:


---snip---
extern(C) void* abc(void*) {return null;}

alias FuncPtr = typeof();
pragma(msg, typeof(abc));
pragma(msg, typeof());

//void wrapper(extern(C) void* function(void*) callback) {} // NG
void wrapper(FuncPtr callback) {} // OK

pragma(msg, typeof(wrapper));
---snip---


T

-- 
A programming language should be a toolbox for the programmer to draw
upon, not a minefield of dangerous explosives that you have to very
carefully avoid touching in the wrong way.


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-05 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Apr 06, 2023 at 01:20:28AM +, Paul via Digitalmars-d-learn wrote:
[...]
> Yes I understand, basically, what's going on in hardware.  I just
> wasn't sure if the access type was linked to the container type.  It
> seems obvious now, since you've both made it clear, that it also
> depends on how I'm accessing my container.
> 
> Having said all of this, isn't a D 'range' fundamentally a sequential
> access container (i.e popFront) ?

D ranges are conceptually sequential, but the actual underlying memory
access patterns depends on the concrete type at runtime. An array's
elements are stored sequentially in memory, and arrays are ranges.  But
a linked-list can also have a range interface, yet its elements may be
stored in non-consecutive memory locations.  So the concrete type
matters here; the range API only gives you conceptual sequentiality, it
does not guarantee physically sequential memory access.


T

-- 
Many open minds should be closed for repairs. -- K5 user


Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-05 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Apr 05, 2023 at 10:34:22PM +, Paul via Digitalmars-d-learn wrote:
> On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote:
> 
> > Best practices for arrays in hot loops:
[...]
> > - Where possible, prefer sequential access over random access (take
> >   advantage of the CPU cache hierarchy).
> 
> Thanks for sharing Teoh!  Very helpful.
> 
> would this be random access? for(size_t i; i indices?  ...and this be sequential foreach(a;arr) ?
> 
> or would they have to be completely different kinds of containers?  a
> dlang 'range' vs arr[]?
[...]

The exact syntactic construct you use is not important; under the hood,
for(i; i

Re: foreach (i; taskPool.parallel(0..2_000_000)

2023-04-04 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Apr 04, 2023 at 09:35:29PM +, Paul via Digitalmars-d-learn wrote:
[...]
> Well Steven just making the change you said reduced the execution time
> from ~6-7 secs to ~3 secs.  Then, including the 'parallel' in the
> foreach statement took it down to ~1 sec.
> 
> Boy lesson learned in appending-to and zeroing dynamic arrays in a hot
> loop!

Best practices for arrays in hot loops:
- Avoid appending if possible; instead, pre-allocate outside the loop.
- Where possible, reuse existing arrays instead of discarding old ones
  and allocating new ones.
- Use slices where possible instead of making copies of subarrays (this
  esp. applies to strings).
- Where possible, prefer sequential access over random access (take
  advantage of the CPU cache hierarchy).


T

-- 
Famous last words: I *think* this will work...


Re: better video rendering in d

2023-03-21 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Mar 21, 2023 at 05:29:22PM +, monkyyy via Digitalmars-d-learn wrote:
> On Tuesday, 21 March 2023 at 17:18:15 UTC, H. S. Teoh wrote:
> > On Tue, Mar 21, 2023 at 04:57:49PM +, monkyyy via
> > Digitalmars-d-learn wrote:
> > > My current method of making videos of using raylib to generate
> > > screenshots, throwing those screenshots into a folder and calling
> > > a magic ffmpeg command is ... slow.
> > [...]
> > 
> > How slow is it now, and how fast do you want it to be?
> > T
> 
> I vaguely remember an hour and half for 5 minutes of video when its
> extremely lightweight and raylib trivially does real-time to display
> it normally and realistically I wouldn't be surprised if it could do
> 1000 frames a second.
> 
> Coping several gb of data to disk(that probably asking the gpu one
> pixel at a time) to be compressed down into a dozen mb of video is
> just... temp shit.  I should just do something that isnt stressing
> hard drives extremely unnecessarily.

You could try to feed the frames to ffmpeg over stdin instead of storing
the frames on disk. See this, for example:


https://stackoverflow.com/questions/45899585/pipe-input-in-to-ffmpeg-stdin

Then you can just feed live data to it in the background while you
generate frames in the foreground.


T

-- 
Lottery: tax on the stupid. -- Slashdotter


Re: better video rendering in d

2023-03-21 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Mar 21, 2023 at 04:57:49PM +, monkyyy via Digitalmars-d-learn wrote:
> My current method of making videos of using raylib to generate screenshots,
> throwing those screenshots into a folder and calling a magic ffmpeg command
> is ... slow.
[...]

How slow is it now, and how fast do you want it to be?

One possibility is to generate frames in parallel... though if you're
recording a video of a sequence of operations, each of which depends on
the previous, it may not be possible to parallelize.

I have a toy project that generates animations of a 3D model
parametrized over time. It generates .pov files and runs POVRay to
generate frames, then calls ffmpeg to make the video.  This is
parallelized with std.parallelism.parallel, and is reasonably fast.
However, ffmpeg will take a long time no matter what (encoding a video
is a non-trivial operation).


T

-- 
Try to keep an open mind, but not so open your brain falls out. -- theboz


Re: @nogc and Phobos

2023-03-11 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Mar 11, 2023 at 04:21:40PM +, bomat via Digitalmars-d-learn wrote:
[...]
> Although I come from a C++ background, I'm not exactly a fanboy of
> that language (you can probably tell, otherwise I wouldn't be here).
> But after hearing praise for D for being a cleaner and better version
> of C/C++, I am a bit disappointed so far, tbh. I don't want to go into
> too much detail here to not derail the thread entirely, but I think it
> repeats too many old sins, like implicit type conversions, the `for`
> loop syntax (although I guess one wouldn't need it that often because
> of `foreach`), the `switch` `case` fallthrough, and the cancerous
> `const` (as far as I can tell, `immutable` is an even worse flavor of
> it).
[...]

I also came from a C/C++ background.  The GC turned me off D for a long
time, until one day I decided to just give it a try to see if it was all
as bad as people made it sound.  I have to admit that GC phobia stuck
with me for a long time, but once I actually started using the language
seriously, I discovered to my surprise that it wasn't *that* big of a
deal as I had thought. In fact, I found that I quite liked it, because
it made my APIs cleaner. A LOT cleaner because I didn't have to pollute
every function call with memory management paraphrenalia; they can be
nice and clean with no extraneous detritus and things Just Work(tm).
Also, the amount of time/effort spent (i.e., wasted) debugging memory
problems was gone, and I was a LOT more productive than I ever was in
C++.  True, I have to relinquish 100% control of my memory, and as an
ex-C++ fanboy I totally understand that it's not a pleasant feeling. But
I have to say that I was pleasantly surprised at how much D's GC
*didn't* get in my way, once I actually started using it for real (toy
examples can be misleading).

Why am I saying all this?  Because to be frank, you haven't really used
D if you've been avoiding its standard library like the plague. Not all
of Phobos is GC-dependent; the range-based stuff, for example, lets you
avoid GC use most of the time. True, for exceptions you need GC, but
exceptions are supposed to be ... exceptions ... not the norm, and in
practice it isn't really *that* big of a deal. You shouldn't be catching
exceptions inside performance-sensitive inner loops anyway.  D's strong
points don't really show until you start using range-based stuff with
UFCS chains -- now *that's* power.  Even if you dislike the GC you can
still mostly manage your own memory, and let the GC be the fallback
mechanism for stuff you missed.

As for implicit type conversions: I don't know where you got your
information from, but D's implicit conversions are a WHOLE different
world from C++. Walter has resisted adding implicit conversion
mechanisms in spite of harsh criticism and constant pressure, and in
practice, you aren't gonna see a lot of it in D code, if at all. It's
not even close to C++ where SFINAE + Koenig lookup gang up on you from
behind and you don't even know what hit you. Issues with implicit
conversions in D only really come up if you go out of your way to abuse
alias this and/or use short ints a little too zealously. Otherwise in
practice it's not even an issue IME.

For-loop syntax: I can't remember the last time I wrote one in D. Maybe
I did like 1 or 2 times (across my 20+ D projects) when I really needed
to do something weird with my loops. But foreach covers 90% of my
looping needs, and while loops take care of the 9.9% of the cases.
Besides, once you start getting used to UFCS chains and Phobos
algorithms, most of the time you won't even be writing any loops at all.
You won't believe how much more readable your code becomes when you can
finally stop worrying about pesky fragile loop conditions and just tack
on a couple more components to your UFCS chain and it just automagically
takes care of itself.  Again, not something you'll understand if you
never tried to use D in a serious way. I recommend actually trying to
write D, not as transplanted C++, but the way D code is meant to be
written.

As for switch: yeah D switch has some crazy parts (like Duff's device --
you can actually write that in D). But I've never needed to use it...
also, final switch + enums = awesome.

As for const: I hardly ever use it. It's useful occasional for low-level
code, but not much beyond that.  My advice: don't bother. Just pretend
it doesn't exist, and your life will be happier. Well OK, once in a
while you do need to deal with it. But if it were me, I'd avoid it
unless I have to.  It doesn't mix well with high-level code, I'll put it
that way. Immutable is the same thing, I only use it as `static
immutable` just so the compiler would put my data in the preinitialized
segment. Other than that, I don't bother.


T

-- 
If you're not part of the solution, you're part of the precipitate.


Re: Bug in DMD?

2023-03-02 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Mar 02, 2023 at 09:55:55PM +, ryuukk_ via Digitalmars-d-learn wrote:
> On Thursday, 2 March 2023 at 21:38:23 UTC, ryuukk_ wrote:
> > On Thursday, 2 March 2023 at 21:21:14 UTC, Richard (Rikki) Andrew
> > Cattermole wrote:
[...]
> > > 2. Dustmite, so we have something we can work with.
> > 
> > [...] 2. do you have a link for a guide how to setup "dustmite"?

https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/

Dustmite automatically reduces your code to a minimal example that still
exhibits the same problem, good for bug reports that are easily
reproducible.  Also useful if you don't want to publicly share the code
for whatever reason, but still want to provide enough information so
that the dmd devs can find the problem and fix it.


[...]
> That's is not something i like doing, it should just work, i shouldn't
> have to debug DMD, that aint my job

Dustmite can run in the background on a temporary copy of your code, you
don't have to babysit it and can work on other things while it's doing
its thing.


T

-- 
Written on the window of a clothing store: No shirt, no shoes, no service.


Re: Transform static immutable string array to tuple.

2023-02-19 Thread H. S. Teoh via Digitalmars-d-learn
On Sun, Feb 19, 2023 at 11:08:34AM +, realhet via Digitalmars-d-learn wrote:
> Hello,
> 
> Is there a better way to transform a string array to a tuple or to an
> AliasSeq?
> 
> ```
> mixin(customSyntaxPrefixes.format!`tuple(%(%s,%))`)
> ```
> 
> I'd like to use this as variable length arguments passed to the
> startsWith() std function (as multiple needles).

In situations like this it's often better to define your dats as an
AliasSeq to begin with, since it's easier to covert that to an array
than the other way round.


T

-- 
I don't trust computers, I've spent too long programming to think that they can 
get anything right. -- James Miller


Re: Deciding one member of iteration chain at runtime

2023-02-17 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Feb 17, 2023 at 05:30:40PM +, Chris Piker via Digitalmars-d-learn 
wrote:
[...]
> In order to handle new functionality it turns out that operatorG needs
> to be of one of two different types at runtime.  How would I do
> something like the following:
> 
> ```d
> auto virtualG;  // <-- probably invalid code, illustrating the idea
> if(runtime_condition)
>virtualG = operatorG1;
> else
>virtualG = operatorG2;
[...]
> ```
> ?
> 
> I've tried various usages of `range.InputRangeObject` but haven't been
> able to get the syntax right.  Any suggestions on the best way to
> proceed?  Maybe the whole chain should be wrapped in InputRangeObject
> classes, I don't know.
[...]

Here's an actual function taken from my own code, that returns a
different range type depending on a runtime condition, maybe this will
help you?

```d
/**
 * Expands '@'-directives in a range of strings.
 *
 * Returns: A range of strings with lines that begin with '@'
 * substituted with the contents of the file named by the rest of the
 * line.
 */
auto expandFileDirectives(File = std.stdio.File, R)(R args)
if (isInputRange!R && is(ElementType!R : const(char)[]))
{
import std.algorithm.iteration : joiner, map;
import std.algorithm.searching : startsWith;
import std.range : only;
import std.range.interfaces : InputRange, inputRangeObject;
import std.typecons : No;

return args.map!(arg => arg.startsWith('@') ?
cast(InputRange!string) inputRangeObject(
File(arg[1 .. $]).byLineCopy(No.keepTerminator)) :
cast(InputRange!string) inputRangeObject(only(arg)))
   .joiner;
}
```

Note that the cast is to a common base class of the two different
subclasses returned by inputRangeObject().

This function is used in the rest of the code as part of a UFCS chain of
ranges.


T

-- 
Long, long ago, the ancient Chinese invented a device that lets them see 
through walls. It was called the "window".


Re: Non-ugly ways to implement a 'static' class or namespace?

2023-02-16 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Feb 16, 2023 at 08:51:39AM +, FeepingCreature via 
Digitalmars-d-learn wrote:
[...]
> Springboarding off this post:
> 
> This thread is vastly dominated by some people who care very much
> about this issue. Comparatively, for instance, I care very little
> because I think D already does it right.
> 
> But then the thread will look unbalanced. This is a fundamental design
> flaw in forum software.
> 
> So let me just say: I think D does it right. D does not have class
> encapsulation; it has module encapsulation. This is by design, and the
> design is good.

+1, this issue is wayyy overblown by a vocal minority. D's design
diverges from other languages, but that in itself does not make it a bad
design.  In the context of D it actually makes sense. Saying that D's
design is bad because language X does it differently is logically
fallacious (X is good, Y is not X, therefore Y is bad).


T

-- 
He who sacrifices functionality for ease of use, loses both and deserves 
neither. -- Slashdotter


Re: Simplest way to convert an array into a set

2023-02-13 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Feb 13, 2023 at 06:04:40PM +, Matt via Digitalmars-d-learn wrote:
> Obviously, there is no "set" object in D,

Actually, bool[T] could be used as a set object of sorts. Or even
void[0][T], though that's a little more verbose to type. But this can be
aliased to something nicer (see below).


> but I was wondering what the quickest way to remove duplicates from an
> array would be. I was convinced I'd seen a "unique" method somewhere,
> but I've looked through the documentation for std.array, std.algorithm
> AND std.range, and I've either missed it, or my memory is playing
> tricks on me. That, or I'm looking in the wrong place entirely, of
> course

Try this:

-snip-
import std;
auto deduplicate(R)(R input)
if (isInputRange!R)
{
alias Unit = void[0];
enum unit = Unit.init;
Unit[ElementType!R] seen;
return input.filter!((e) {
if (e in seen) return false;
seen[e] = unit;
return true;
});
}
unittest {
assert([ 1, 2, 3, 4, 2, 5, 6, 4, 7 ].deduplicate.array ==
[ 1, 2, 3, 4, 5, 6, 7 ]);
assert([ "abc", "def", "def", "ghi", "abc", "jkl" ].deduplicate.array ==
[ "abc", "def", "ghi", "jkl" ]);
}
-snip-


T

-- 
Маленькие детки - маленькие бедки.


Re: betterC DLL in Windows

2023-02-06 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Feb 06, 2023 at 03:54:40PM +, bachmeier via Digitalmars-d-learn 
wrote:
> On Sunday, 5 February 2023 at 08:48:34 UTC, Tamas wrote:
[...]
> > This is the specification for the D Programming Language.
> 
> I've been bitten by that a few times over the years, though to be
> honest, I'm not sure of the relationship of the spec to documentation.
> The Phobos documentation and compiler documentation appear to be
> actual documentation, in the sense that you can trust it to be
> accurate, and if not it's a bug.  Maybe someone that has been around
> from the earliest days understands the goal of the spec.

IIRC the spec was started as part of an ongoing effort to fully specify
the language so that, at least in theory, someone could read the spec
and implement a D compiler completely independent of the current ones.


T

-- 
INTEL = Only half of "intelligence".


Re: Which TOML package, or SDLang?

2023-01-30 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Jan 30, 2023 at 03:59:52PM +, Adam D Ruppe via Digitalmars-d-learn 
wrote:
> On Monday, 30 January 2023 at 15:37:56 UTC, Guillaume Piolat wrote:
> > Why not XML? :) It has comments, you can use backslashes too.
> 
> no kidding, xml is an underrated format.

XML is evil.

Let me qualify that statement.  XML, as specified by the XML spec, is
pure evil.  It has some absolutely nasty corners that has pathological
behaviours like recursive expansion of entities (exploitable for DOS
attacks or to induce OOM crashes in XML parsers), which includes
token-pasting style pathology like C's preprocessor, and remote fetching
of arbitrary network resources (which, no thanks to pathological
entities, can be easily obfuscated).

XML as used by casual users, however, is a not-bad format for markup
text.  It's far too verbose for my tastes, but for some applications it
could be a good fit.  As far as implementation is concerned, a
(non-compliant) XML parser that implements the subset of XML employed
for "normal" use, i.e., without the pathological bits, would be a good
thing, e.g., Jonathan's dxml.  A fully-compliant XML parser that
includes the pathological bits, however, I wouldn't touch with a 10-foot
pole.


T

-- 
Customer support: the art of getting your clients to pay for your own 
incompetence.


Re: Where I download Digital Mars C Preprocessor sppn.exe?

2023-01-23 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Jan 23, 2023 at 08:06:28PM +, Alain De Vos via Digitalmars-d-learn 
wrote:
> Mixing D with C or C++ or Python is looking for problems.
> Better write something in D.
> And write something in C/C++/Python.
> And have some form of communication between both.

I don't know about Python, but I regularly write D code that interacts
with external C libraries and have not encountered any major problems.
You just have to put `extern(C)` in the right places and make sure you
link the right objects / libraries, and you're good to go.

So far I haven't actually tried integrating non-trivial C++ libraries
with D yet, but I expect it will be similar unless you're dealing with
C++ templates (which are not compatible with D templates) or multiple
inheritance, which D doesn't support.


T

-- 
Right now I'm having amnesia and deja vu at the same time. I think I've 
forgotten this before.


Re: Non-ugly ways to implement a 'static' class or namespace?

2023-01-20 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jan 20, 2023 at 01:32:22PM -0800, Ali Çehreli via Digitalmars-d-learn 
wrote:
> On 1/20/23 07:01, torhu wrote:
> 
> > But why not have drawLine just be a free function?
> 
> Exactly.
> 
> If I'm not mistaken, and please teach me if I am wrong, they are
> practically free functions in Java as well. That Java class is working
> as a namespace.

Exactly. Every time you see a static singleton class, you're essentially
looking at a namespace. Only, in OO circles non-class namespaces are
taboo, it's not OO-correct to call them what they are, instead you have
to do lip service to OO by calling them static singleton classes
instead.  And free functions are taboo in OO; OO doctrine declares them
unclean affronts to OO purity and requires that you dress them in more
OO-appropriate clothing, like putting them inside a namesp^W excuse me,
static singleton class.

;-)


> So, the function above is the same as the following free-standing
> function in D, C++, C, and many other languages:
> 
>   void Algo_drawLine(Canvas c, Pos from, Pos to) { .. };
[...]

That way of naming a global function is essentially a poor
man's^W^Wexcuse me, I mean, C's way of working around the lack of a
proper namespacing / module system. In D, we do have a proper module
system, so you could just call the function `drawLine` and put it in a
file named Algo.d, then you can just use D's symbol resolution rules to
disambiguate between Algo.drawLine and PersonalSpace.drawLine, for
example. :-P


T

-- 
Public parking: euphemism for paid parking. -- Flora


Re: What is the 'Result' type even for?

2023-01-20 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jan 20, 2023 at 12:49:54PM +, Ruby The Roobster via 
Digitalmars-d-learn wrote:
[...]
> Thank you.  I didn't know that there was such a property `.array`.

It's not a property, it's a Phobos function from std.array.


T

-- 
INTEL = Only half of "intelligence".


Re: What is the 'Result' type even for?

2023-01-19 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jan 20, 2023 at 03:34:43AM +, Ruby The Roobster via 
Digitalmars-d-learn wrote:
> On Friday, 20 January 2023 at 03:30:56 UTC, Steven Schveighoffer wrote:
> > On 1/19/23 10:11 PM, Ruby The Roobster wrote:
> > ...
> > 
> > The point is to be a range over the original input, evaluated
> > lazily. Using this building block, you can create an array, or use
> > some other algorithm, or whatever you want. All without allocating
> > more space to hold an array.
[...]
> I get the point that it is supposed to be lazy.  But why are these
> basic cases not implemented?  I shouldn't have to go write a wrapper
> for something as simple as casting this type to the original type.
> This is one of the things that one expects the standard library to do
> for you.

There's no need to write any wrappers.  Just tack `.array` to the end of
your pipeline, and you're good to go.


T

-- 
My father told me I wasn't at all afraid of hard work. I could lie down right 
next to it and go to sleep. -- Walter Bright


Re: What is the 'Result' type even for?

2023-01-19 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jan 20, 2023 at 03:11:33AM +, Ruby The Roobster via 
Digitalmars-d-learn wrote:
> Take this example:
> 
> ```d
> import std;
> void main()
> {
> auto c = "a|b|c|d|e".splitter('|');
> c.writeln;
> string[] e = ["a", "b", "c", "d", "e"];
> assert(c.equal(e));
> typeof(c).stringof.writeln;
> }
> ```
> 
> The program prints:
> 
> ["a", "b", "c", "d", "e"]
> Result
> 
> What is the purpose of this 'Result' type?  To serve as a generic
> range?

It's a Voldemort type, representing a range that iterates over its
elements lazily.


> Because, it seems to only cause problems.  For example, you cannot
> assign or cast the result type into a range, even when the type has
> the same inherent function:
> 
> ```d
> string[] c = "a|b|c|d|e".splitter('|'); // fails
> string[] d = cast(string[])"a|b|c|d|e".splitter('|'); // also fails
> ```

You're confusing arrays and ranges.  A "range" isn't any specific type,
it refers to *any* type that behaves a certain way (behaves like a
range).  Each `Result` you get back has its own unique type (arguably,
it's a compiler bug to display it as merely `Result` without
distinguishing it from other identically-named but distinct Voldemort
types), so you cannot just assign it back to an array.

You can either create an array from it using std.array.array, use a
function that eagerly creates its results instead of a lazy result (in
the above instance, use std.string.split instead of .splitter), or use
std.algorithm.copy to copy the contents of the lazy range into an array:

// Option 1
string[] c = "a|b|c|d|e".splitter('|').dup;

// Option 2
string[] c = "a|b|c|d|e".split('|');

// Option 3
// Caveat: .copy expects you to have prepared the buffer
// beforehand to be large enough to hold the contents; it does
// not reallocate the result array for you.
string[] result = new string[5];
"a|b|c|d|e".splitter('|').copy(result);


[...]
> Then what is the point of this type, if not to just make things
> difficult?  It cannot be casted, and vector operations cannot be
> performed, and it seems to just serve as an unnecessary
> generalization.

It serves to chain further range operations into a pipeline:

string[] c = "a|b|c|d|e".splitter('|')
.filter!(c => c >= 'b' && c <= 'd')
.map!(c => c+1)
.array;

Because ranges are lazily iterated, the .array line only allocates the 3
elements that got through the .filter. Whereas if you created the
intermediate result array eagerly, you'd have to allocate space for 5
elements only to discard 2 of them afterwards.

One way to think about this is that the intermediate Result ranges are
like the middle part of a long pipe; you cannot get stuff from the
middle of the pipe without breaking it, you need to terminate the pipe
with a sink (like .array, .copy, etc.) first.


T

-- 
I am Pentium of Borg. Division is futile; you will be approximated.


Re: Coding Challenges - Dlang or Generic

2023-01-17 Thread H. S. Teoh via Digitalmars-d-learn
On Tue, Jan 17, 2023 at 11:08:19PM +, Siarhei Siamashka via 
Digitalmars-d-learn wrote:
> On Tuesday, 17 January 2023 at 21:50:06 UTC, matheus wrote:
> > Question: Have you compared the timings between this way (With
> > ranges) and a normal way (Without ranges)?
> 
> If you are intensively using ranges, UFCS or the other convenient high
> level language features, then the compiler choice does matter a lot.
> And only LDC compiler is able to produce fast binaries from such
> source code.
> 
> GDC compiler has severe performance problems with inlining, unless LTO
> is enabled. And it also allocates closures on stack. This may or may
> not be fixed in the future, but today I can't recommend GDC if you
> really care about performance.

Interesting, I didn't know GDC has issues with inlining. I thought it
was more-or-less on par with LDC in terms of the quality of code
generation.  Do you have a concrete example of this problem?


> DMD compiler uses an outdated code generation backend from Digital
> Mars C++ and will never be able to produce fast binaries. It
> prioritizes fast compilation speed over everything else.
[...]

For anything performance related, I wouldn't even consider DMD. For all
the 10+ years I've been using D, it has consistently produced
executables that run about 20-30% slower than those produced by LDC or
GDC, sometimes even up to 40%.  For script-like programs or interactive
apps that don't care about performance, DMD is fine for convenience and
fast compile turnaround times.  But as soon as performance matters, DMD
is not even on my radar.


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be 
algorithms.


Re: Creating a pointer/slice to a specific-size buffer? (Handing out a page/frame from a memory manager)

2023-01-13 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jan 13, 2023 at 08:31:17AM -0800, Ali Çehreli via Digitalmars-d-learn 
wrote:
> On 1/13/23 07:07, Gavin Ray wrote:
> 
> > This is "valid" D I hope?
> 
> Yes because static arrays are just elements side-by-side in memory.
> You can cast any piece of memory to a static array provided the length
> and alignment are correct.
[...]

Or to be more precise, cast the memory to a *pointer* to a static array
of the right size.  Static arrays are by-value types; passing around the
raw array will cause the array to be copied every time, which is
probably not what is intended.


T

-- 
"You are a very disagreeable person." "NO."


Re: Why not allow elementwise operations on tuples?

2023-01-13 Thread H. S. Teoh via Digitalmars-d-learn
On Fri, Jan 13, 2023 at 02:22:34PM +, Sergei Nosov via Digitalmars-d-learn 
wrote:
> Hey, everyone!
> 
> I was wondering if there's a strong reason behind not implementing
> elementwise operations on tuples?
> 
> Say, I've decided to store 2d points in a `Tuple!(int, int)`. It would
> be convenient to just write `a + b` to yield another `Tuple!(int,
> int)`.

I've written a Vec type that implements precisely this, using tuples
behind the scenes as the implementation, and operator overloading to
allow nice syntax for vector arithmetic.

---snip
/**
 * Represents an n-dimensional vector of values.
 */
struct Vec(T, size_t n)
{
T[n] impl;
alias impl this;

/**
 * Per-element unary operations.
 */
Vec opUnary(string op)()
if (is(typeof((T t) => mixin(op ~ "t"
{
Vec result;
foreach (i, ref x; result.impl)
x = mixin(op ~ "this[i]");
return result;
}

/**
 * Per-element binary operations.
 */
Vec opBinary(string op, U)(Vec!(U,n) v)
if (is(typeof(mixin("T.init" ~ op ~ "U.init"
{
Vec result;
foreach (i, ref x; result.impl)
x = mixin("this[i]" ~ op ~ "v[i]");
return result;
}

/// ditto
Vec opBinary(string op, U)(U y)
if (isScalar!U &&
is(typeof(mixin("T.init" ~ op ~ "U.init"
{
Vec result;
foreach (i, ref x; result.impl)
x = mixin("this[i]" ~ op ~ "y");
return result;
}

/// ditto
Vec opBinaryRight(string op, U)(U y)
if (isScalar!U &&
is(typeof(mixin("U.init" ~ op ~ "T.init"
{
Vec result;
foreach (i, ref x; result.impl)
x = mixin("y" ~ op ~ "this[i]");
return result;
}

/**
 * Per-element assignment operators.
 */
void opOpAssign(string op, U)(Vec!(U,n) v)
if (is(typeof({ T t; mixin("t " ~ op ~ "= U.init;"); })))
{
foreach (i, ref x; impl)
mixin("x " ~ op ~ "= v[i];");
}

void toString(W)(W sink) const
if (isOutputRange!(W, char))
{
import std.format : formattedWrite;
formattedWrite(sink, "(%-(%s,%))", impl[]);
}
}

/**
 * Convenience function for creating vectors.
 * Returns: Vec!(U,n) instance where n = args.length, and U is the common type
 * of the elements given in args. A compile-time error results if the arguments
 * have no common type.
 */
auto vec(T...)(T args)
{
static if (args.length == 1 && is(T[0] == U[n], U, size_t n))
return Vec!(U, n)(args);
else static if (is(typeof([args]) : U[], U))
return Vec!(U, args.length)([ args ]);
else
static assert(false, "No common type for " ~ T.stringof);
}

///
unittest
{
// Basic vector construction
auto v1 = vec(1,2,3);
static assert(is(typeof(v1) == Vec!(int,3)));
assert(v1[0] == 1 && v1[1] == 2 && v1[2] == 3);

// Vector comparison
auto v2 = vec(1,2,3);
assert(v1 == v2);

// Unary operations
assert(-v1 == vec(-1, -2, -3));
assert(++v2 == vec(2,3,4));
assert(v2 == vec(2,3,4));
assert(v2-- == vec(2,3,4));
assert(v2 == vec(1,2,3));

// Binary vector operations
auto v3 = vec(2,3,1);
assert(v1 + v3 == vec(3,5,4));

auto v4 = vec(1.1, 2.2, 3.3);
static assert(is(typeof(v4) == Vec!(double,3)));
assert(v4 + v1 == vec(2.1, 4.2, 6.3));

// Binary operations with scalars
assert(vec(1,2,3)*2 == vec(2,4,6));
assert(vec(4,2,6)/2 == vec(2,1,3));
assert(3*vec(1,2,3) == vec(3,6,9));

// Non-numeric vectors
auto sv1 = vec("a", "b");
static assert(is(typeof(sv1) == Vec!(string,2)));
assert(sv1 ~ vec("c", "d") == vec("ac", "bd"));
assert(sv1 ~ "post" == vec("apost", "bpost"));
assert("pre" ~ sv1 == vec("prea", "preb"));
}

unittest
{
// Test opOpAssign.
auto v = vec(1,2,3);
auto w = vec(4,5,6);
v += w;
assert(v == vec(5,7,9));
}

unittest
{
int[4] z = [ 1, 2, 3, 4 ];
auto v = vec(z);
static assert(is(typeof(v) == Vec!(int,4)));
assert(v == vec(1, 2, 3, 4));
}

unittest
{
import std.format : format;
auto v = vec(1,2,3,4);
assert(format("%s", v) == "(1,2,3,4)");
}
---snip


T

-- 
Never ascribe to malice that which is adequately explained by incompetence. -- 
Napoleon Bonaparte


Re: append - too many files

2023-01-10 Thread H. S. Teoh via Digitalmars-d-learn
On Wed, Jan 11, 2023 at 02:15:13AM +, Joel via Digitalmars-d-learn wrote:
> I get this error after a while (seems append doesn't close the file
> each time):
> std.file.FileException@std/file.d(836): history.txt: Too many open files
> 
> ```d
> auto jm_addToHistory(T...)(T args) {
>   import std.conv : text;
>   import std.file : append;
> 
>   auto txt = text(args);
>   append("history.txt", txt);
> 
>   return txt;
> }
> ```

This is a bug, please file an issue in mantis.  Phobos functions should
not leak file descriptors.


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured 
megaliths that address all questions by piling on ridiculous internal links in 
forms which are hideously over-complex." -- Simon St. Laurent on xml-dev


  1   2   3   4   5   6   7   8   9   10   >