Re: std.regex is fat

2018-10-14 Thread Chris Katko via Digitalmars-d-learn

On Sunday, 14 October 2018 at 03:26:33 UTC, Adam D. Ruppe wrote:

On Sunday, 14 October 2018 at 03:07:59 UTC, Chris Katko wrote:
For comparison, I just tested and grep uses about 4 MB of RAM 
to run.


Running and compiling are two entirely different things. 
Running the D regex code should be comparable, but compiling it 
is slow, in great part because of internal templates...


There was an effort to speed up the template code, but it is 
still not complete.


I know that. I figured people would miss my point on it though so 
I should have clarified. That's why I said it's likely the 
templates/DMD that's exploding--not the actual regex action.


From a simple program, it takes ~100-150MB of RAM to compile. 
Adding a single regex (not compiled regex) balloons to 550MB at 5 
seconds of compile time.


---

Anyhow, I wrote my own simple "dgrep" and compared the results 
with grep, it's very competitive: (NOT to be confused with the 
above RAM stats for COMPILING)



Command being timed: "sh -c cat dgrep.d | ./dgrep 'write' "
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3192
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 301
Voluntary context switches: 5
Involuntary context switches: 124
Swaps: 0
File system inputs: 8
File system outputs: 8
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Command being timed: "sh -c cat dgrep.d | grep 'write'"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2224
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 282
Voluntary context switches: 10
Involuntary context switches: 0
Swaps: 0
File system inputs: 760
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

So I have to say I'm impressed with the actual performance of the 
regular expressions engine--especially considering "grep" is, 
IIRC, considered a fine-tuned beast.


Re: std.regex is fat

2018-10-13 Thread Adam D. Ruppe via Digitalmars-d-learn

On Sunday, 14 October 2018 at 02:44:55 UTC, Chris Katko wrote:
So wait, if their solution was to simply REMOVE std.regex from 
isEmail.


That was ctRegex, which is different than regex.

That doesn't solve the regex problem at all. And from what I 
read in that thread, this penalty is paid per template 
INSTANTIATION which could explode.


Template instantiation, which is a big issue for ctRegex, but not 
for regular regex.


Re: std.regex is fat

2018-10-13 Thread Adam D. Ruppe via Digitalmars-d-learn

On Sunday, 14 October 2018 at 03:07:59 UTC, Chris Katko wrote:
For comparison, I just tested and grep uses about 4 MB of RAM 
to run.


Running and compiling are two entirely different things. Running 
the D regex code should be comparable, but compiling it is slow, 
in great part because of internal templates...


There was an effort to speed up the template code, but it is 
still not complete.


Re: std.regex is fat

2018-10-13 Thread Chris Katko via Digitalmars-d-learn

On Sunday, 14 October 2018 at 02:44:55 UTC, Chris Katko wrote:

On Friday, 12 October 2018 at 13:42:34 UTC, Alex wrote:

[...]


So wait, if their solution was to simply REMOVE std.regex from 
isEmail. That doesn't solve the regex problem at all. And from 
what I read in that thread, this penalty is paid per template 
INSTANTIATION which could explode.


[...]


For comparison, I just tested and grep uses about 4 MB of RAM to 
run.


So it's not the regex. It's the dmd / templates / CTFE, right?


Re: std.regex is fat

2018-10-13 Thread Chris Katko via Digitalmars-d-learn

On Friday, 12 October 2018 at 13:42:34 UTC, Alex wrote:

On Friday, 12 October 2018 at 13:25:33 UTC, Chris Katko wrote:

Like, insanely fat.

All I wanted was a simple regex. The second include a regex 
function, my program would no longer compile "out of memory 
for fork".


/usr/bin/time -v reports it went from 150MB of RAM for D, 
DAllegro, and Allegro5.


To over 650MB of RAM, and from 1.5 seconds to >5.5 seconds to 
compile. Now I have to close all my Chrome tabs just to 
compile.


Just for one line of regex. And I get it, it's the overhead of 
the library import, not the single line. But good gosh, more 
than 3X the RAM of the entire project for a single library 
import?


Something doesn't add up!


Hm... maybe, you run into this:
https://forum.dlang.org/post/mailman.3091.1517866806.9493.digitalmar...@puremagic.com


So wait, if their solution was to simply REMOVE std.regex from 
isEmail. That doesn't solve the regex problem at all. And from 
what I read in that thread, this penalty is paid per template 
INSTANTIATION which could explode.


 1 - Does anyone know WHY it's so incredibly fat?

 2 - If this isn't going to be fixed anytime soon, shouldn't 
there be a DISCLAIMER on the documentation? (+potential 
workarounds like keeping regex queries in their own file.)


I mean, this kind of thing shouldn't require looking through 
forums. It's a clear bug, and if it's a WONTFIX (even 
temporarily), it should be documented clearly as such.


If I'm running into this issue, how many other people already 
did, and possibly even gave up on using D?





Re: std.regex is fat

2018-10-12 Thread Alex via Digitalmars-d-learn

On Friday, 12 October 2018 at 13:25:33 UTC, Chris Katko wrote:

Like, insanely fat.

All I wanted was a simple regex. The second include a regex 
function, my program would no longer compile "out of memory for 
fork".


/usr/bin/time -v reports it went from 150MB of RAM for D, 
DAllegro, and Allegro5.


To over 650MB of RAM, and from 1.5 seconds to >5.5 seconds to 
compile. Now I have to close all my Chrome tabs just to compile.


Just for one line of regex. And I get it, it's the overhead of 
the library import, not the single line. But good gosh, more 
than 3X the RAM of the entire project for a single library 
import?


Something doesn't add up!


Hm... maybe, you run into this:
https://forum.dlang.org/post/mailman.3091.1517866806.9493.digitalmar...@puremagic.com