Re: Reimplementing software building blocks like malloc and free in D

2018-08-13 Thread bpr via Digitalmars-d

On Monday, 13 August 2018 at 01:49:35 UTC, Mike Franklin wrote:

On Sunday, 12 August 2018 at 06:35:17 UTC, Eugene Wissner wrote:

P.S. In the last weeks I had thoughts to split low-level stuff 
from tanya and put it into a separate library, kind of native, 
gc-free x86-64 (and maybe aarch64) Linux runtime for D. 
Probably I should go for it :)


In recent months, I've been thinking about something like that 
as well.


I think it's a very promising idea. Why not start with DasBetterC 
and build a Phobos like library from there?





Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Mike Franklin via Digitalmars-d

On Sunday, 12 August 2018 at 06:35:17 UTC, Eugene Wissner wrote:

P.S. In the last weeks I had thoughts to split low-level stuff 
from tanya and put it into a separate library, kind of native, 
gc-free x86-64 (and maybe aarch64) Linux runtime for D. 
Probably I should go for it :)


In recent months, I've been thinking about something like that as 
well.


As I work to improve implementations in DMD and DRuntime, I'm 
always running into situations where I need utility functions or 
other convenience implementations that currently exists in 
Phobos, but I can't use Phobos in DMD or DRuntime.  I'd like to 
have a small dependency-less library that has implementations 
that don't require much of a runtime implementation (just 
depending on the language itself), so it can actually be used to 
create the DRuntime implementations.  These include things type 
conversion, formatted IO, compile-time type information like that 
found in std.traits and std.meta, and generally useful algorithms 
like those in std.algorithm.  It might even be quite nice to have 
some features from `std.experimental.allocator` available for 
implementing memory allocation in Druntime.


I started explore this idea with 
https://github.com/JinShil/utiliD, but found that the Phobos 
implementations are so interconnected, it's quite a burden to 
decouple them.  That, and I began encountering chicken-and-egg 
issues with DRuntime implementations.  There was a discussion 
recently about refactoring out the compiler dependencies from the 
platform dependencies in DRuntime 
(https://forum.dlang.org/post/pjqepc$2sfv$1...@digitalmars.com), and 
I think that will have to be done first before continuing with 
the idea.


Mike


Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Aruna Maurya via Digitalmars-d

On Sunday, 12 August 2018 at 11:37:54 UTC, Mike Parker wrote:

On Sunday, 12 August 2018 at 11:23:55 UTC, Aruna Maurya wrote:



So I'll be taking Eugene's code as reference to try and 
implement malloc free and realloc in dlang.


Be sure to send your proposal in by August 15!


Yes. Definitely!I'll do my best!


Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Mike Parker via Digitalmars-d

On Sunday, 12 August 2018 at 11:23:55 UTC, Aruna Maurya wrote:



So I'll be taking Eugene's code as reference to try and 
implement malloc free and realloc in dlang.


Be sure to send your proposal in by August 15!


Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Aruna Maurya via Digitalmars-d

On Sunday, 12 August 2018 at 08:34:46 UTC, Mike Franklin wrote:

On Sunday, 12 August 2018 at 07:00:30 UTC, Eugene Wissner wrote:


[...]


Inline ASM is a feature of D, so "idiomatic D" includes 
assembly implementations.  Where D shines here is with it's 
metaprogramming capabilities and the ability to select an 
implementation at compile-time or compose implementations.  See 
https://github.com/JinShil/memcpyD/blob/master/memcpyd.d  There 
you will that the compiler selects an implementation based on 
size in powers of 2.  Sizes that aren't powers of 2 can be 
composed of the powers of 2 implementations.  For example a 6 
byte implementation would be comprised of a 4-byte 
implementation plus a the 2 byte implementation.


[...]


So I'll be taking Eugene's code as reference to try and implement 
malloc free and realloc in dlang.


Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Mike Franklin via Digitalmars-d

On Sunday, 12 August 2018 at 06:44:37 UTC, Aruna Maurya wrote:

Also what about other implementations like memset or memcmp. 
Have they been implemented or require work from scratch?


I don't know of any D implementations of these other than the 
trivial and naive.  I think it's a lot of work for one person to 
re-implement all of the software building blocks in D, so I 
envisioned the Autumn of Code participant choosing one or two to 
focus on.


Mike


Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Mike Franklin via Digitalmars-d

On Sunday, 12 August 2018 at 07:00:30 UTC, Eugene Wissner wrote:

Also what about other implementations like memset or memcmp. 
Have they been implemented or require work from scratch?


These functions are still mostly implemented in asm, so I'm not 
sure there is an "idiomatic D" way. I would only wrap them into 
a function working with slices and checking the length. Mike?


It is also not only the implementation that's "idiomatic D", but 
also the interface.  For example, I find `void copy(T)(const ref 
T from, ref T to)` to be much more "idiomatic D" than `void* 
memcpy(void* to, const void* from, size_t size)`.


Mike


Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Mike Franklin via Digitalmars-d

On Sunday, 12 August 2018 at 07:00:30 UTC, Eugene Wissner wrote:

Also what about other implementations like memset or memcmp. 
Have they been implemented or require work from scratch?


These functions are still mostly implemented in asm, so I'm not 
sure there is an "idiomatic D" way. I would only wrap them into 
a function working with slices and checking the length. Mike?


Inline ASM is a feature of D, so "idiomatic D" includes assembly 
implementations.  Where D shines here is with it's 
metaprogramming capabilities and the ability to select an 
implementation at compile-time or compose implementations.  See 
https://github.com/JinShil/memcpyD/blob/master/memcpyd.d  There 
you will that the compiler selects an implementation based on 
size in powers of 2.  Sizes that aren't powers of 2 can be 
composed of the powers of 2 implementations.  For example a 6 
byte implementation would be comprised of a 4-byte implementation 
plus a the 2 byte implementation.


I have more code than what's currently check into that 
repository, but I stalled because I need AVX512 to do an 
optimized implementation, and that doesn't seem to be a feature 
of DMD at the moment.  That was one reason I posted 
https://wiki.dlang.org/SAOC_2018_ideas#Implement_AVX2_and_AVX-512_in_DMD.2FDRuntime on the wiki.


Agner Fog had this to say in 
https://agner.org/optimize/optimizing_assembly.pdf


---
There are several ways of moving large blocks of data. The most 
common methods are:

1. REP MOVS instruction.
2. If data are aligned: Read and write in a loop with the largest 
available register size.

3. If size is constant: inline move instructions.
165
4. If data are misaligned: First move as many bytes as required 
to make the destination aligned. Then read unaligned and write 
aligned in a loop with the largest available register size.
5. If data are misaligned: Read aligned, shift to compensate for 
misalignment and write aligned.
6. If the data size is too big for caching, use non-temporal 
writes to bypass the cache. Shift to compensate for misalignment, 
if necessary.


As you can see, it can be very difficult to choose the optimal 
method in a given situation. The best advice I can give for a 
universal memcpy function, based on my testing, is as follows:
* On Intel Wolfdale and earlier, Intel Atom, AMD K8 and earlier, 
and VIA Nano, use the aligned read - shift - aligned write method 
(5).
* On Intel Nehalem and later, method (4) is up to 33% faster than 
method (5).

166
* On AMD K10 and later and Bobcat, use the unaligned read - 
aligned write method (4).
* The non-temporal write method (6) can be used for data blocks 
bigger than half the size of the largest-level cache.

---

I think D is well suited for an implementation that takes all 
these things into consideration, selecting an appropriate 
implementation, and composing complex implementations of the 
simpler implementations.


Mike


Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Eugene Wissner via Digitalmars-d

On Sunday, 12 August 2018 at 06:44:37 UTC, Aruna Maurya wrote:

On Sunday, 12 August 2018 at 05:24:56 UTC, Mike Franklin wrote:

On Sunday, 12 August 2018 at 04:16:24 UTC, Aruna Maurya wrote:


[...]


I'd say there is only 1 requirement:  implementing `malloc`, 
`realloc`, and `free` in idiomatic D without any dependencies 
on other libraries from other languages.


[...]
Also what about other implementations like memset or memcmp. 
Have they been implemented or require work from scratch?


These functions are still mostly implemented in asm, so I'm not 
sure there is an "idiomatic D" way. I would only wrap them into a 
function working with slices and checking the length. Mike?


So the only option to implement the above is to go through 
Eugene's code, take some measurements and see if it can be 
improved.


As said my allocator is nothing official but I would be overhappy 
if someone finds it useful. I can also advice to look at Doug Lea 
malloc:


http://g.oswego.edu/dl/html/malloc.html

It is like a book. It's one of the best documented programs I've 
ever seen.


Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Aruna Maurya via Digitalmars-d

On Sunday, 12 August 2018 at 05:24:56 UTC, Mike Franklin wrote:

On Sunday, 12 August 2018 at 04:16:24 UTC, Aruna Maurya wrote:


[...]


I'd say there is only 1 requirement:  implementing `malloc`, 
`realloc`, and `free` in idiomatic D without any dependencies 
on other libraries from other languages.


[...]
Also what about other implementations like memset or memcmp. Have 
they been implemented or require work from scratch?


But, how does it perform?  In order to make the case for using 
these D implementations in druntime, we'll have to gather some 
metrics on the implementations in terms of performance, memory 
fragmentation, etc. and ensure they are on par with the C 
standard library implementations or alternative implementations 
like jemalloc and tcmalloc?


[...]


So the only option to implement the above is to go through 
Eugene's code, take some measurements and see if it can be 
improved.


Perhaps the D implementations can be enhanced to provide 
compile-time or runtime tuning parameters to achieve better 
performance in different use cases.


[...]
That's totally okay. Your every comment is making the idea more 
clearer!


[...]




Re: Reimplementing software building blocks like malloc and free in D

2018-08-12 Thread Eugene Wissner via Digitalmars-d

On Sunday, 12 August 2018 at 05:24:56 UTC, Mike Franklin wrote:
I wasn't aware of Eugene's implementation.  At first glance it 
looks real nice.  It appears to have some dependencies on other 
features of his Tanya library, so at a minimum, those will have 
to be removed or replaced.




It depends only on internal libc-free syscalls that can be 
replaced with mmap/munmap and on copy()/copyBackward() that are 
just memcpy and memmove respectively.


But, how does it perform?  In order to make the case for using 
these D implementations in druntime, we'll have to gather some 
metrics on the implementations in terms of performance, memory 
fragmentation, etc. and ensure they are on par with the C 
standard library implementations or alternative implementations 
like jemalloc and tcmalloc?


Some years ago I had a BigInt implementation that allocated and 
deallocated memory like crazy. I compared how it works with 
malloc and my own allocator and my allocator was slightly faster 
than glibc (probably because of thread-safety). But yes, a good 
implementation needs benchmarks which test it systematically 
under various conditions.


Tanya's allocator is also tested only on x86-64 Linux, but is 
trivially to port to Windows and other architectures. I actually 
had a Windows version before but removed it because I was too 
lazy to read msdn.


Perhaps, with Eugene's permission, you can use his 
implementation as a base, take some measurements, and see if it 
can be improved upon with either more features or better 
performance.


You can use it as reference when implementing something else or 
as a starting point for the further work. Anyway feel free to use 
it as you like.


P.S. In the last weeks I had thoughts to split low-level stuff 
from tanya and put it into a separate library, kind of native, 
gc-free x86-64 (and maybe aarch64) Linux runtime for D. Probably 
I should go for it :)


Re: Reimplementing software building blocks like malloc and free in D

2018-08-11 Thread Mike Franklin via Digitalmars-d

On Sunday, 12 August 2018 at 04:16:24 UTC, Aruna Maurya wrote:

So there are essentially two requirements here. Building the 
memory management functionalities from scratch in idiomatic D 
and implementing the existing runtime hooks as templates. The 
latter isn't quite a part of the idea as mentioned on the page.


I'd say there is only 1 requirement:  implementing `malloc`, 
`realloc`, and `free` in idiomatic D without any dependencies on 
other libraries from other languages.


The "idiomatic D" part means you don't even have to name them 
"malloc", "realloc", and "free", nor use the same type-unsafe 
signatures found in the C standard library.  Do what makes sense 
for D, for example, you may want to implement an overload set 
like `T[] heapAllocate(T)(size_t length)` and `T 
heapAllocate(T)()`, or something totally different.  You may even 
want to provide some other features for gathering runtime 
statistics on the heap.  The world's your oyster here.


The only reason I mentioned the runtime hooks is to help make the 
case for why these D implementations are needed and provide some 
perspective about how they may be used.  I don't envision the 
Autumn of Code participant learning about the compiler/runtime 
interface and refactoring the D runtime with the D 
implementations (unless they really want to).


Regardless of whether we decide to incorporate these D 
implementations into the official D runtime or not, they will 
still be worth gold to those who desire using D in freestanding 
systems programming, so don't concern yourself too much with the 
d runtime; just make a kick-ass dynamic heap allocator in D, and 
people will use it.



Also I see in one of the above replies that malloc and free 
have been already implemented. So is that off the shelf or 
still available for implementation?


I wasn't aware of Eugene's implementation.  At first glance it 
looks real nice.  It appears to have some dependencies on other 
features of his Tanya library, so at a minimum, those will have 
to be removed or replaced.


But, how does it perform?  In order to make the case for using 
these D implementations in druntime, we'll have to gather some 
metrics on the implementations in terms of performance, memory 
fragmentation, etc. and ensure they are on par with the C 
standard library implementations or alternative implementations 
like jemalloc and tcmalloc?


Also, can Eugene's implementation be used in `@safe` code and is 
it compatible with the DIP1000? (See 
https://github.com/dlang/DIPs/blob/master/DIPs/DIP1000.md and the 
-dip1000 compiler switch at 
https://github.com/dlang/DIPs/blob/master/DIPs/DIP1000.md)


Perhaps the D implementations can be enhanced to provide 
compile-time or runtime tuning parameters to achieve better 
performance in different use cases.


Perhaps, with Eugene's permission, you can use his implementation 
as a base, take some measurements, and see if it can be improved 
upon with either more features or better performance.


I'm sorry if I'm just muddying the waters with all this talk.  
All I want is a high-quality, idiomatic D implementation of a 
heap allocator that I can use in my own freestanding software and 
as a substitute for the C standard library implementations in the 
D runtime.


Mike


Re: Reimplementing software building blocks like malloc and free in D

2018-08-11 Thread Aruna Maurya via Digitalmars-d

On Saturday, 11 August 2018 at 20:15:06 UTC, Mike Franklin wrote:

On Saturday, 11 August 2018 at 18:59:00 UTC, Aruna Maurya wrote:


[...]


The compiler recognizes certain expressions and lowers them to 
functions in the D runtime that we call runtime hooks.  See 
https://wiki.dlang.org/Runtime_Hooks for an unmaintained, but 
still relevant, list of some of them.


[...]


So there are essentially two requirements here. Building the 
memory management functionalities from scratch in idiomatic D and 
implementing the existing runtime hooks as templates. The latter 
isn't quite a part of the idea as mentioned on the page.


Also I see in one of the above replies that malloc and free have 
been already implemented. So is that off the shelf or still 
available for implementation?


[...]


Thanks for the resource! Will go through it and get back in case 
of any doubts.


[...]




Re: Reimplementing software building blocks like malloc and free in D

2018-08-11 Thread Mike Franklin via Digitalmars-d

On Saturday, 11 August 2018 at 18:59:00 UTC, Aruna Maurya wrote:

1. If it would have been C++, the above can be implemented by 
using the mmap() syscall to  allocate memory 
from the heap. However, something that I am not able to 
understand in the idea description is 'runtime hooks'.


The compiler recognizes certain expressions and lowers them to 
functions in the D runtime that we call runtime hooks.  See 
https://wiki.dlang.org/Runtime_Hooks for an unmaintained, but 
still relevant, list of some of them.


For example, the expression `cast(int[])a`, where `a` is an array 
of another type will get lowered to `_d_arraycast`.


We're trying to rewrite many of these hooks as templates.  D has 
changed quite a bit over the past few years, and at the time the 
existing runtime hooks were written D didn't have templates and 
many other features.  See 
https://github.com/dlang/druntime/pull/2264 for an example 
converting `_d_arraycast` to a template.


Many of the existing runtime hooks leverage software building 
blocks like `memcpy`, `memcmp`, `malloc`, `free`, etc. in their 
implementation.  If the runtime hooks are implemented as 
templates, we will have access to the type information and other 
information at compile-time rather than run-time.  This will give 
us opportunities to specialize implementations at compile-time.  
See https://github.com/JinShil/memcpyD/blob/master/memcpyd.d for 
a proof of concept exploring such opportunities for `memcpy`.


2. Also as far as I can understand, this idea is about 
implementing the above from scratch, and not tweaking the 
existing codebase. Do let me know if I am wrong.


I'm the one who proposed the idea, and, yes, I envisioned 
implementing these building blocks from scratch, translating an 
existing implementation from another language (watch the 
license!), or implementing an existing published algorithm in D.


There might even be a way to leverage implementation code in 
https://dlang.org/phobos/std_experimental_allocator.html, but if 
that code were used in the D runtime, it couldn't have 
dependencies on Phobos (D's standard library), so it'd probably 
have to be modified to make it free-standing.  I don't know, 
though, I haven't looked into it in any detail.


I didn't envision the participant modifying any existing code.  
In the case of `malloc`, `realloc`, and `free`, I'd just like to 
have an idiomatic D implementation that we might be able to 
leverage when re-implementing the runtime hooks as templates so 
we no longer have to depend on the C standard library.  I think 
implementing them in D will make D more portable to other 
platforms, including freestanding bare-metal platforms like 
operating systems or microcontrollers, and D's unique feature set 
may provide opportunities to improve upon existing 
implementations in terms of performance and compile-time 
guarantees.


It's difficult for me to articulate all this, as it's a complex 
vision in my head, so if the above just raised more questions, 
keep asking.


Also see https://wiki.dlang.org/Memory_Management for some 
perspective, especially if you're new to D.


The ideas on the wiki page are just to plant the seeds of 
creativity.  Feel free to deviate from the idea as you wish, or 
submit a proposal formulated from your own ideas.


Mike




Re: Reimplementing software building blocks like malloc and free in D

2018-08-11 Thread Eugene Wissner via Digitalmars-d

On Saturday, 11 August 2018 at 18:59:00 UTC, Aruna Maurya wrote:
Hello all! I am a Computer Science undergraduate student. 
Currently a junior, I find the idea of 're-implementing 
software building blocks like malloc and free in D' listed in 
the wiki page of SAOC(Symmetry Autumn of Code) quite 
interesting.


I don't have experience of coding in D however, I have 4 years 
of coding in C++ and have a good hold on pointers and the 
standard template library. I have also successfully completed a 
course of Operating Systems and Data Structures in my 4th 
semester. I believe I can pick up D in due course of time.


I had few queries regarding the same and would be glad if 
anyone could pitch in to help me out.


1. If it would have been C++, the above can be implemented by 
using the mmap() syscall to  allocate memory 
from the heap. However, something that I am not able to 
understand in the idea description is 'runtime hooks'.


2. Also as far as I can understand, this idea is about 
implementing the above from scratch, and not tweaking the 
existing codebase. Do let me know if I am wrong.


Thank you for your time!


I missed this project; To let everyone know, I've implemented 
malloc and free:


https://github.com/caraus-ecms/tanya/blob/master/source/tanya/memory/mmappool.d

(see allocate() and deallocate() for a starting point). My 
implementation is just not thread-safe.


Re: Reimplementing software building blocks like malloc and free in D

2018-08-11 Thread rikki cattermole via Digitalmars-d

On 12/08/2018 6:59 AM, Aruna Maurya wrote:
1. If it would have been C++, the above can be implemented by using the 
mmap() syscall to  allocate memory from the heap. 
However, something that I am not able to understand in the idea 
description is 'runtime hooks'.


A runtime hook is a function that the compiler injects a call to, to 
perform some function. Like allocate a class.


By reimplementing a lot of these primitives, the compiler will be able 
to optimize them better e.g. inlining and removing dependencies on other 
runtime information.


2. Also as far as I can understand, this idea is about implementing the 
above from scratch, and not tweaking the existing codebase. Do let me 
know if I am wrong.


It is modifying existing druntime and with it dmd (frontend) after 
developing these building blocks.


This project is just an idea, you can set your own set of goals and 
scope in the proposal. For example, you may choose to rewrite the memory 
management stuff in libc in D and support it across Windows, Linux and 
OSX. Instead of dealing with integration into druntime/dmd.