Re: s///ge; consumes PL_tmps_stack in its loop
From: YAMASHINA Hio [EMAIL PROTECTED] Subject: Re: s///ge; consumes PL_tmps_stack in its loop Date: Fri, 02 Sep 2005 18:00:38 +0900 I also benchmark, again. With ITERS_BEFORE_FREETMPS_IN_SGE from 10 to 131. memory time time 1 0 26.33 MB 5.5850 s ++ 1 1 26.33 MB 5.6777 s + 1 2 26.33 MB 5.6570 s + 1 4 26.33 MB 5.6020 s +++ 1 9 26.35 MB 5.6070 s +++ 110 26.38 MB 5.6340 s 113 26.97 MB 5.6653 s + 118 46.30 MB 5.8337 s +++ XEON 3.2G ht x86_64, 1G mem, SUSE Linux 9.3. detail is at http://fleur.hio.jp/~hio/p5p/. sorry X(, this test was applied another version of patch. that is with no SAVETMPS. SAVETMPS maybe not needed in this place. SAVETMPS consumes PL_savestack instead of mortal consumes PL_tmps_stack. SAVETMPS also makes PL_tmps_floor = PL_tmps_ix, but same affect occured by FREETMPS. PL_savestack will be kept until at LEAVE. In test of this time, there is no clear differences on less then 112. detail is at http://fleur.hio.jp/~hio/p5p/report3.html --- perl-5.8.x.orig/pp_ctl.c2005-04-22 23:29:48.0 +0900 +++ perl-5.8.x-f-0/pp_ctl.c2005-09-04 02:24:55.0 +0900 @@ -158,6 +158,8 @@ } RETURN; } +#define HIOS_HACK_FREETMPS_IN_SGE 1 +#define ITERS_BEFORE_FREETMPS_IN_SGE (10) PP(pp_substcont) { @@ -188,6 +190,12 @@ if (!(cx-sb_rxtainted 2) SvTAINTED(TOPs)) cx-sb_rxtainted |= 2; sv_catsv(dstr, POPs); +#ifdef HIOS_HACK_FREETMPS_IN_SGE +if( (cx-sb_iters % ITERS_BEFORE_FREETMPS_IN_SGE)==0 ) { +/* prevent excess tmp stack */ +FREETMPS; +} +#endif /* Are we done */ if (cx-sb_once || !CALLREGEXEC(aTHX_ rx, s, cx-sb_strend, orig, END_OF_PATCH [scope.h] #define SAVETMPS save_int((int*)PL_tmps_floor), PL_tmps_floor = PL_tmps_ix #define SSPUSHINT(i) (PL_savestack[PL_savestack_ix++].any_i32 = (I32)(i)) [scope.c] Perl_save_int(pTHX_ int *intp) { SSCHECK(3); SSPUSHINT(*intp); SSPUSHPTR(intp); SSPUSHINT(SAVEt_INT); } -- YAMASHINA Hio [EMAIL PROTECTED]
Re: s///ge; consumes PL_tmps_stack in its loop
From: Dan Kogai [EMAIL PROTECTED] Subject: Re: s///ge; consumes PL_tmps_stack in its loop On Sep 01, 2005, at 22:21 , Rafael Garcia-Suarez wrote: +if( (cx-sb_iters0x)==0 ) { OK, so if I understand correctly, you're doing that every 65536th loop ? Just trying to understand your patch a bit more. I too wondered if 65536 was the optimal value so I benchmarked (result below) Looks like the optimal value is 1024, not 65536. Sounds natural since on most platforms the page size is 4k, or sizeof(pointer)*1024. I also benchmark, again. With ITERS_BEFORE_FREETMPS_IN_SGE from 10 to 131. memory time time 1 0 26.33 MB 5.5850 s ++ 1 1 26.33 MB 5.6777 s + 1 2 26.33 MB 5.6570 s + 1 4 26.33 MB 5.6020 s +++ 1 9 26.35 MB 5.6070 s +++ 110 26.38 MB 5.6340 s 113 26.97 MB 5.6653 s + 118 46.30 MB 5.8337 s +++ XEON 3.2G ht x86_64, 1G mem, SUSE Linux 9.3. detail is at http://fleur.hio.jp/~hio/p5p/. 10 is same as always, because optimizer removes `if( (iter%1)==0 )'. By this result, best way is just invoke FREETMPS. always. -- YAMASHINA Hio [EMAIL PROTECTED]
Re: s///ge; consumes PL_tmps_stack in its loop
YAMASHINA Hio wrote: Hi. A large amount of s///ge; consumes PL_tmps_stack in its loop. This occues REPLACEMENT (right) part has statement ( eg. s//$x;$x/ge;). Patch is follows: diff -urN perl-5.8.7.orig/pp_ctl.c perl-5.8.7/pp_ctl.c --- perl-5.8.7.orig/pp_ctl.c2005-04-22 23:12:38.0 +0900 +++ perl-5.8.7/pp_ctl.c 2005-08-30 10:55:05.0 +0900 @@ -188,6 +188,11 @@ if (!(cx-sb_rxtainted 2) SvTAINTED(TOPs)) cx-sb_rxtainted |= 2; sv_catsv(dstr, POPs); + if( (cx-sb_iters0x)==0 ) { OK, so if I understand correctly, you're doing that every 65536th loop ? Just trying to understand your patch a bit more. + /* shrink tmps stack */ + FREETMPS; + SAVETMPS; + } /* Are we done */ if (cx-sb_once || !CALLREGEXEC(aTHX_ rx, s, cx-sb_strend, orig,
Re: s///ge; consumes PL_tmps_stack in its loop
RGS, On Sep 01, 2005, at 22:21 , Rafael Garcia-Suarez wrote: +if( (cx-sb_iters0x)==0 ) { OK, so if I understand correctly, you're doing that every 65536th loop ? Just trying to understand your patch a bit more. I too wondered if 65536 was the optimal value so I benchmarked (result below) Looks like the optimal value is 1024, not 65536. Sounds natural since on most platforms the page size is 4k, or sizeof(pointer)*1024. The modified patch (against maintperl) and benchmark script and its result right after the signature. Dan the (Perl5 Porter|Friend of Hers) --- perl-5.8.x/pp_ctl.c Fri Apr 22 23:29:48 2005 +++ perl-5.8.x.d/pp_ctl.c Fri Sep 2 06:29:36 2005 @@ -159,6 +159,9 @@ RETURN; } +#define HIOS_HACK_FREETMPS_IN_SGE 1 +#define ITERS_BEFORE_FREETMPS_IN_SGE 1024 + PP(pp_substcont) { dSP; @@ -189,6 +192,13 @@ cx-sb_rxtainted |= 2; sv_catsv(dstr, POPs); +#ifdef HIOS_HACK_FREETMPS_IN_SGE + if( (cx-sb_iters % ITERS_BEFORE_FREETMPS_IN_SGE) == 0 ) { + /* shrink tmps stack */ + FREETMPS; + SAVETMPS; + } +#endif /* Are we done */ if (cx-sb_once || !CALLREGEXEC(aTHX_ rx, s, cx-sb_strend, orig, s == m, cx-sb_targ, NULL, __END_OF_PATCH__ # benchmark script -- modified so it runs on BSD-ish platforms use strict; use Time::HiRes qw/time gettimeofday tv_interval/; my $t = [ gettimeofday() ]; my $i = 0; my $s = . x 1_000_000; printf length: %d\n, length($s); my $started = time(); $s=~ s{ . } { my $x=.; ++$i % 100_000 or ps; $x }gex; printf Total: %f seconds\n, time()-$started; # 0 1 23 4 5 # USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND sub ps{ my ($vst, $rss); for my $ps (split /\n/, `ps awwux`){ my @ps = split /\s+/, $ps; next if $ps[1] != $$; ($vst, $rss) = @ps[4,5]; } my $tt = $t; $t=[ gettimeofday ]; printf i=%d, interval=%f, vst=%d, rss=%d\n, $i, tv_interval($tt,$t), $vst, $rss; } __END__ # Benchmark Result on FreeBSD 5.4-STABLE i386, 2GB RAM, Dual Xeon 2.66GHz # Got similar results on Mac OS X v10.4.2 # hack turned off length: 100 i=10, interval=0.335474, vst=9584, rss=9084 i=20, interval=0.421420, vst=15064, rss=14572 i=30, interval=0.535432, vst=20292, rss=19808 i=40, interval=0.629692, vst=25204, rss=24720 i=50, interval=0.723880, vst=30156, rss=29676 i=60, interval=0.978336, vst=32688, rss=32216 i=70, interval=0.972722, vst=36660, rss=36188 i=80, interval=1.044227, vst=47692, rss=47232 i=90, interval=1.154285, vst=48768, rss=48312 i=100, interval=1.204723, vst=58036, rss=57588 Total: 8.211410 seconds #define ITERS_BEFORE_FREETMPS_IN_SGE 8 length: 100 i=10, interval=0.309985, vst=5920, rss=5396 i=20, interval=0.426167, vst=6504, rss=5892 i=30, interval=0.328450, vst=7560, rss=6888 i=40, interval=0.319155, vst=9132, rss=8232 i=50, interval=0.322011, vst=9132, rss=8524 i=60, interval=0.334513, vst=11488, rss=10396 i=70, interval=0.333703, vst=11488, rss=10688 i=80, interval=0.344401, vst=11488, rss=10980 i=90, interval=0.353685, vst=15028, rss=13636 i=100, interval=0.356945, vst=15028, rss=13928 Total: 3.426557 seconds #define ITERS_BEFORE_FREETMPS_IN_SGE 256 length: 100 i=10, interval=0.263990, vst=5272, rss=4772 i=20, interval=0.288537, vst=5316, rss=4820 i=30, interval=0.298329, vst=5708, rss=5212 i=40, interval=0.306359, vst=5848, rss=5352 i=50, interval=0.316509, vst=6136, rss=5640 i=60, interval=0.321965, vst=6328, rss=5832 i=70, interval=0.330730, vst=5940, rss=5440 i=80, interval=0.372460, vst=6820, rss=6324 i=90, interval=0.443334, vst=7012, rss=6516 i=100, interval=0.374078, vst=6376, rss=5828 Total: 3.311563 seconds #define ITERS_BEFORE_FREETMPS_IN_SGE 1024 length: 100 i=10, interval=0.262606, vst=5276, rss=4776 i=20, interval=0.287707, vst=5504, rss=5004 i=30, interval=0.296727, vst=5412, rss=4912 i=40, interval=0.305621, vst=5508, rss=5008 i=50, interval=0.312880, vst=6096, rss=5596 i=60, interval=0.318939, vst=6244, rss=5744 i=70, interval=0.327908, vst=6340, rss=5840 i=80, interval=0.337890, vst=6716, rss=6220 i=90, interval=0.345272, vst=6908, rss=6412 i=100, interval=0.351104, vst=6132, rss=5632 Total: 3.141935 seconds #define ITERS_BEFORE_FREETMPS_IN_SGE 4096 length: 100 i=10, interval=0.264656, vst=5412, rss=4912 i=20, interval=0.288359, vst=5632, rss=5132 i=30, interval=0.298809, vst=5540, rss=5040 i=40, interval=0.347351, vst=5636, rss=5136 i=50, interval=0.421626, vst=6224, rss=5724 i=60, interval=0.368024, vst=6416, rss=5920 i=70, interval=0.328497, vst=6608, rss=6112 i=80, interval=0.337559, vst=6036, rss=5532 i=90, interval=0.345616, vst=6132, rss=5628 i=100, interval=0.353223,
s///ge; consumes PL_tmps_stack in its loop
Hi. A large amount of s///ge; consumes PL_tmps_stack in its loop. This occues REPLACEMENT (right) part has statement ( eg. s//$x;$x/ge;). Patch is follows: diff -urN perl-5.8.7.orig/pp_ctl.c perl-5.8.7/pp_ctl.c --- perl-5.8.7.orig/pp_ctl.c2005-04-22 23:12:38.0 +0900 +++ perl-5.8.7/pp_ctl.c 2005-08-30 10:55:05.0 +0900 @@ -188,6 +188,11 @@ if (!(cx-sb_rxtainted 2) SvTAINTED(TOPs)) cx-sb_rxtainted |= 2; sv_catsv(dstr, POPs); + if( (cx-sb_iters0x)==0 ) { + /* shrink tmps stack */ + FREETMPS; + SAVETMPS; + } /* Are we done */ if (cx-sb_once || !CALLREGEXEC(aTHX_ rx, s, cx-sb_strend, orig, make test is also fine. sample code is follows: $ time ./perl -Ilib -MTime::HiRes=gettimeofday,tv_interval -le ' my$t=[gettimeofday];my$i=0;s;my$s=.x5_000_000; print length: .length($s);s; $s=~s/./my$x=.;++$i%100 or s;$x/ge; s; sub s{ system(grep VmSize /proc/$$/status); my$tt=$t;$t=[gettimeofday]; print i=$i, interval=.tv_interval($tt,$t) }' original one results: VmSize:22,024 kB i=0, interval=0.018394 VmSize: 101,408 kB i=100, interval=1.159893 VmSize: 180,948 kB i=200, interval=1.224288 VmSize: 260,356 kB i=300, interval=1.241251 VmSize: 339,764 kB i=400, interval=1.237665 VmSize: 419,304 kB i=500, interval=1.23893 VmSize: 414,420 kB i=500, interval=0.60233 real0m9.858s user0m7.952s sys 0m1.898s patched one results: VmSize:22,020 kB i=0, interval=0.018664 VmSize:28,192 kB i=100, interval=1.131531 VmSize:29,168 kB i=200, interval=1.145311 VmSize:30,144 kB i=300, interval=1.143441 VmSize:31,120 kB i=400, interval=1.151553 VmSize:32,096 kB i=500, interval=1.152435 VmSize:27,212 kB i=500, interval=0.007292 real0m5.774s user0m4.679s sys 0m1.083s Result of evaluate is putted on stack and it is mortaled. Stacked one is removed immediately when it concateneted into substitute result. But mortaled one is living in the PL_tmps_stack. And at end of s///ge; statement (pp_nextstate) releases all of them. Just small code s//1/ge; does not occur this problem. At least s//1;1/ge; needed. ./perl -Ilib -MO=Terse -le '$_=.x5_000_000; s/./$x/ge;' PMOP (0x6119a0) subst LOGOP (0x62db80) substcont UNOP (0x62f2c0) null LISTOP (0x611cb0) scope OP (0x6305f0) null [174] UNOP (0x611b60) null [15] SVOP (0x611c70) gvsv GV (0x628430) *x ./perl -Ilib -MO=Terse -le '$_=.x5_000_000; s/./$x;$x/ge;' PMOP (0x6119a0) subst LOGOP (0x62db80) substcont UNOP (0x62fa90) null LISTOP (0x611cb0) leave OP (0x61db50) enter COP (0x6305f0) nextstate UNOP (0x611b60) null [15] SVOP (0x611c70) gvsv GV (0x628430) *x COP (0x630660) nextstate UNOP (0x62db00) null [15] SVOP (0x62f2c0) gvsv GV (0x628430) *x ./perl -le '$_=.x5_000_000;s/./1;1/ge;'; this code show 0.5% time loss, memory reduce is from 450M into 25M. but s/./$x;$x/ge; shows 15% speedup. much of s///ge with small string seems no extra time. Regards. -- YAMASHINA Hio [EMAIL PROTECTED]