Re: Making regex replace CTFE by removing malloc

2015-04-10 Thread Dmitry Olshansky via Digitalmars-d

On 07-Apr-2015 02:48, Pierre Krafft wrote:

I got some help from the IRC, poked the code a bit and have some details
I can share. The code uses a reference counted memory pool, that's the
part I linked. It's possible to avoid the malloc but that's not the hard
part. For me it looks like memory pools aren't compatible with CTFE.
This is because CTFE disallows most pointer casts.


Just noticed this thread, I've been detached from D lately. Cool idea, 
but it wasn't considered in its design, so memchr/malloc and other nice 
optimized C primitives are all around the engine sources.


Indeed there is a fair amount of custom memory allocation.
One engine uses a freelist and the other (+ compile-time generated one) 
uses segmented stack.


On top of that both engines use adhoc ref-counting for internal data 
structures. Initially I delayed memory allocation problem until we get 
memory allocators. Current memory allocation scheme is further obscured 
by trying hard to allocate just once and reusing/splitting the initial 
chunk.



Allocations and frees have fortunately been moved to common functions so
it should be possible to change the allocation code without too much
hassle. I would prefer to have the same code for runtime and CTFE, but
for performance reasons that might not be possible if CTFE has a too
limited feature set.


I learned the hard way that CTFE-able code and high-performance code 
look very much unlike each other. Unless everything is new-ed and not 
deleted we can't have the same code paths. A fair amount of __ctfe is in 
order.



I think a good idea would be to extract the memory pool code out of the
regex module. It's not a core part of the problem and could be reused
elsewhere. Having something named memory pool would also make the code
clearer. It might be impossible to implement memory pools at CTFE so  we
could hide that detail away by simulating a memory pool (at CTFE) until
it becomes possible.


Would be nice to have I guess, std.regex though uses somewhat 
specialized intrusive free-list so it's too niche for generic solution.



The code in general is quite clear, but low level and more about how to
do it than what to do. This is not the D-style I know and love, but it
might be needed for performance.


Yeah, sadly ranges and algorithms were underused. Reasons were multiple, 
for parser it had more to do with CTFE-ablity, for engines more of 
performance concerns.



Therefore I probably won't be the one
that starts the implementation of this, but I will try to help if
someone more experienced in this kind of code takes action.


I might pull this off on occasion. Compared to compilation of regex 
itself it might even run in sensible amount of time.


Anyhow I'm curious what use cases you have in mind for running regexes 
at CTFE.


--
Dmitry Olshansky


Re: Making regex replace CTFE by removing malloc

2015-04-07 Thread ketmar via Digitalmars-d
thank you.

and i read the code a little, and found that matching engine using stream-
like interface to work with data, so it wouldn't be very hard to use 
ranges instead of strings. and for real regexps (those without 
backtracking) range seems to doesn't even require random access.

signature.asc
Description: PGP signature


Re: Making regex replace CTFE by removing malloc

2015-04-06 Thread Pierre Krafft via Digitalmars-d
I got some help from the IRC, poked the code a bit and have some 
details I can share. The code uses a reference counted memory 
pool, that's the part I linked. It's possible to avoid the malloc 
but that's not the hard part. For me it looks like memory pools 
aren't compatible with CTFE. This is because CTFE disallows most 
pointer casts.
Allocations and frees have fortunately been moved to common 
functions so it should be possible to change the allocation code 
without too much hassle. I would prefer to have the same code for 
runtime and CTFE, but for performance reasons that might not be 
possible if CTFE has a too limited feature set.
I think a good idea would be to extract the memory pool code out 
of the regex module. It's not a core part of the problem and 
could be reused elsewhere. Having something named memory pool 
would also make the code clearer. It might be impossible to 
implement memory pools at CTFE so  we could hide that detail away 
by simulating a memory pool (at CTFE) until it becomes possible.
The code in general is quite clear, but low level and more about 
how to do it than what to do. This is not the D-style I know and 
love, but it might be needed for performance. Therefore I 
probably won't be the one that starts the implementation of this, 
but I will try to help if someone more experienced in this kind 
of code takes action.


Re: Making regex replace CTFE by removing malloc

2015-04-05 Thread ketmar via Digitalmars-d
On Sat, 04 Apr 2015 01:50:00 +, Jakob Ovrum wrote:

 On Friday, 3 April 2015 at 03:58:33 UTC, ketmar wrote:
 p.s. i don't think that this is the only problem, though. but i never
 read std.regexp source. it's bad, 'cause i want to make it work with
 any range, not only with strings. this will allow to run regexp on
 anything -- and open way to my rbtree-based document system.
 
 I've wanted this too. I really hope we can figure this out in the
 future, it would be another completely unique feature of D's already
 excellent regex engine.

i hope to do some work on that in this month. but don't hold your breath, 
i didn't even read the code yet, so i don't know how hard it will be.

signature.asc
Description: PGP signature


Re: Making regex replace CTFE by removing malloc

2015-04-05 Thread ketmar via Digitalmars-d
On Fri, 03 Apr 2015 15:04:38 +, Pierre Krafft wrote:

 It seems like I have treaded into something which is outside my
 knowledge domain. The malloc is indeed one of the least problems with
 that code. The code makes use of completely unsafe code with pointer
 casts that are disallowed in CTFE code. If someone knows how to replace
 the pointer casts that would probably be everything needed. Otherwise I
 think it would need a rewrite to make it use safe D. The new code would
 probably be slower so for most people it would be a step back. A
 solution would be to have different code for CTFE and runtime but that
 seems unmaintainable and subpar.

i believe that pointer casts are there for speed reasons. or maybe the 
engine just didn't designed for CTFE in the first place.

anyway, you can ask your questions here or in D.learn. even if you will 
not be able to convert the current engine to CTFE, you may be able to use 
a subset of it as a base for separate CTFE regexp engine. i believe that 
it is possible to write a moderately good Thompson-based code, which 
covers alot of needs.

signature.asc
Description: PGP signature


Re: Making regex replace CTFE by removing malloc

2015-04-03 Thread Pierre Krafft via Digitalmars-d

On Friday, 3 April 2015 at 03:58:33 UTC, ketmar wrote:

On Thu, 02 Apr 2015 17:14:24 +, Pierre Krafft wrote:

What can replace malloc that can run on compile time and won't 
make it

slower at run time?


this is actually two questions, so i'll answer to two questions.

1. What can replace malloc that can run on compile time?
new ubyte[](size)

2. ...and won't make it slower at run time?
but we can still use malloc in runtime! `if (_ctfe)` allows us 
to select

the necessary code branch.

p.s. i don't think that this is the only problem, though. but i 
never
read std.regexp source. it's bad, 'cause i want to make it 
work with
any range, not only with strings. this will allow to run regexp 
on

anything -- and open way to my rbtree-based document system.


Thanks!
I'll take a look and see if I can make it work.


Re: Making regex replace CTFE by removing malloc

2015-04-03 Thread Pierre Krafft via Digitalmars-d

On Friday, 3 April 2015 at 03:58:33 UTC, ketmar wrote:

On Thu, 02 Apr 2015 17:14:24 +, Pierre Krafft wrote:

What can replace malloc that can run on compile time and won't 
make it

slower at run time?


this is actually two questions, so i'll answer to two questions.

1. What can replace malloc that can run on compile time?
new ubyte[](size)

2. ...and won't make it slower at run time?
but we can still use malloc in runtime! `if (_ctfe)` allows us 
to select

the necessary code branch.

p.s. i don't think that this is the only problem, though. but i 
never
read std.regexp source. it's bad, 'cause i want to make it 
work with
any range, not only with strings. this will allow to run regexp 
on

anything -- and open way to my rbtree-based document system.


It seems like I have treaded into something which is outside my 
knowledge domain. The malloc is indeed one of the least problems 
with that code. The code makes use of completely unsafe code with 
pointer casts that are disallowed in CTFE code. If someone knows 
how to replace the pointer casts that would probably be 
everything needed. Otherwise I think it would need a rewrite to 
make it use safe D. The new code would probably be slower so for 
most people it would be a step back. A solution would be to have 
different code for CTFE and runtime but that seems unmaintainable 
and subpar.


Re: Making regex replace CTFE by removing malloc

2015-04-03 Thread Jakob Ovrum via Digitalmars-d

On Friday, 3 April 2015 at 03:58:33 UTC, ketmar wrote:
p.s. i don't think that this is the only problem, though. but i 
never
read std.regexp source. it's bad, 'cause i want to make it 
work with
any range, not only with strings. this will allow to run regexp 
on

anything -- and open way to my rbtree-based document system.


I've wanted this too. I really hope we can figure this out in the 
future, it would be another completely unique feature of D's 
already excellent regex engine.


Re: Making regex replace CTFE by removing malloc

2015-04-02 Thread ketmar via Digitalmars-d
On Thu, 02 Apr 2015 17:14:24 +, Pierre Krafft wrote:

 What can replace malloc that can run on compile time and won't make it
 slower at run time?

this is actually two questions, so i'll answer to two questions.

1. What can replace malloc that can run on compile time?
new ubyte[](size)

2. ...and won't make it slower at run time?
but we can still use malloc in runtime! `if (_ctfe)` allows us to select 
the necessary code branch.

p.s. i don't think that this is the only problem, though. but i never 
read std.regexp source. it's bad, 'cause i want to make it work with 
any range, not only with strings. this will allow to run regexp on 
anything -- and open way to my rbtree-based document system.

signature.asc
Description: PGP signature


Making regex replace CTFE by removing malloc

2015-04-02 Thread Pierre Krafft via Digitalmars-d
We have CTFE regex wich creates an optimized regex engine at 
compile time. I would want to expand on that and make the 
generated machine CTFE as well. I think the only thing that is 
stopping this from working is the use of malloc as seen at 
https://github.com/D-Programming-Language/phobos/blob/cb044b02aa3abd0bddc5e48a91faebfd146cab3e/std/regex/package.d#L565
What can replace malloc that can run on compile time and won't 
make it slower at run time?


Here is an example of what I would want to work:

unittest {
enum hello_world = camelCaseToUnderscore(helloWorld);
assertEqual(hello_world, hello_world);
}

private string camelCaseToUnderscore(string input){
import std.regex;
auto ctr = ctRegex!(`([a-z])([A-Z])`);
auto replaceString = $1_$2;
return replaceAll(input, ctr, replaceString).toLower();
}