Re: s///ge; consumes PL_tmps_stack in its loop

2005-09-04 Thread YAMASHINA Hio

From: YAMASHINA Hio [EMAIL PROTECTED]
Subject: Re: s///ge; consumes PL_tmps_stack in its loop
Date: Fri, 02 Sep 2005 18:00:38 +0900
 
   I also benchmark, again.
   With ITERS_BEFORE_FREETMPS_IN_SGE from 10 to 131.
   
  memory time time
   1 0  26.33 MB   5.5850 s ++
   1 1  26.33 MB   5.6777 s +
   1 2  26.33 MB   5.6570 s +
   1 4  26.33 MB   5.6020 s +++
   1 9  26.35 MB   5.6070 s +++
   110  26.38 MB   5.6340 s 
   113  26.97 MB   5.6653 s +
   118  46.30 MB   5.8337 s +++
   
   XEON 3.2G ht x86_64, 1G mem, SUSE Linux 9.3.
   detail is at http://fleur.hio.jp/~hio/p5p/.
  
  sorry X(, this test was applied another version of patch. 
  that is with no SAVETMPS.
  SAVETMPS maybe not needed in this place.
  SAVETMPS consumes PL_savestack instead of mortal consumes 
  PL_tmps_stack. 
  SAVETMPS also makes PL_tmps_floor = PL_tmps_ix, but same affect
  occured by FREETMPS. 
  PL_savestack will be kept until at LEAVE.
  
  In test of this time, there is no clear differences on
  less then 112.
  detail is at http://fleur.hio.jp/~hio/p5p/report3.html
  
--- perl-5.8.x.orig/pp_ctl.c2005-04-22 23:29:48.0 +0900
+++ perl-5.8.x-f-0/pp_ctl.c2005-09-04 02:24:55.0 +0900
@@ -158,6 +158,8 @@
 }
 RETURN;
 }
+#define HIOS_HACK_FREETMPS_IN_SGE 1
+#define ITERS_BEFORE_FREETMPS_IN_SGE (10)
 
 PP(pp_substcont)
 {
@@ -188,6 +190,12 @@
 if (!(cx-sb_rxtainted  2)  SvTAINTED(TOPs))
 cx-sb_rxtainted |= 2;
 sv_catsv(dstr, POPs);
+#ifdef HIOS_HACK_FREETMPS_IN_SGE
+if( (cx-sb_iters % ITERS_BEFORE_FREETMPS_IN_SGE)==0 ) {
+/* prevent excess tmp stack */
+FREETMPS;
+}
+#endif
 
 /* Are we done */
 if (cx-sb_once || !CALLREGEXEC(aTHX_ rx, s, cx-sb_strend, orig,

END_OF_PATCH

[scope.h]
#define SAVETMPS save_int((int*)PL_tmps_floor), PL_tmps_floor = PL_tmps_ix
#define SSPUSHINT(i) (PL_savestack[PL_savestack_ix++].any_i32 = (I32)(i))
[scope.c]
Perl_save_int(pTHX_ int *intp)
{
SSCHECK(3);
SSPUSHINT(*intp);
SSPUSHPTR(intp);
SSPUSHINT(SAVEt_INT);
}

-- 
YAMASHINA Hio [EMAIL PROTECTED]





Re: s///ge; consumes PL_tmps_stack in its loop

2005-09-02 Thread YAMASHINA Hio

From: Dan Kogai [EMAIL PROTECTED]
Subject: Re: s///ge; consumes PL_tmps_stack in its loop
 
 On Sep 01, 2005, at 22:21 , Rafael Garcia-Suarez wrote:
  +if( (cx-sb_iters0x)==0 ) {
 
 
  OK, so if I understand correctly, you're doing that every 65536th  
  loop ?
  Just trying to understand your patch a bit more.
 
 I too wondered if 65536 was the optimal value so I benchmarked  
 (result below)
 Looks like the optimal value is 1024, not 65536.  Sounds natural  
 since on most platforms the page size is 4k, or sizeof(pointer)*1024.
  
  I also benchmark, again.
  With ITERS_BEFORE_FREETMPS_IN_SGE from 10 to 131.
  
 memory time time
  1 0  26.33 MB   5.5850 s ++
  1 1  26.33 MB   5.6777 s +
  1 2  26.33 MB   5.6570 s +
  1 4  26.33 MB   5.6020 s +++
  1 9  26.35 MB   5.6070 s +++
  110  26.38 MB   5.6340 s 
  113  26.97 MB   5.6653 s +
  118  46.30 MB   5.8337 s +++
  
  XEON 3.2G ht x86_64, 1G mem, SUSE Linux 9.3.
  detail is at http://fleur.hio.jp/~hio/p5p/.
  10 is same as always, because optimizer removes `if( (iter%1)==0 )'. 
  
  
  By this result, best way is just invoke FREETMPS. always.
  
  
  
-- 
YAMASHINA Hio [EMAIL PROTECTED]




Re: s///ge; consumes PL_tmps_stack in its loop

2005-09-01 Thread Rafael Garcia-Suarez
YAMASHINA Hio wrote:
 Hi.
 
 A large amount of s///ge; consumes PL_tmps_stack in its loop.
 
 This occues REPLACEMENT (right) part has statement ( eg. s//$x;$x/ge;).
 
 Patch is follows:
 
 diff -urN perl-5.8.7.orig/pp_ctl.c perl-5.8.7/pp_ctl.c
 --- perl-5.8.7.orig/pp_ctl.c2005-04-22 23:12:38.0 +0900
 +++ perl-5.8.7/pp_ctl.c 2005-08-30 10:55:05.0 +0900
 @@ -188,6 +188,11 @@
   if (!(cx-sb_rxtainted  2)  SvTAINTED(TOPs))
   cx-sb_rxtainted |= 2;
   sv_catsv(dstr, POPs);
 + if( (cx-sb_iters0x)==0 ) {

OK, so if I understand correctly, you're doing that every 65536th loop ?
Just trying to understand your patch a bit more.

 + /* shrink tmps stack */
 + FREETMPS;
 + SAVETMPS;
 + }
 
   /* Are we done */
   if (cx-sb_once || !CALLREGEXEC(aTHX_ rx, s, cx-sb_strend, orig,


Re: s///ge; consumes PL_tmps_stack in its loop

2005-09-01 Thread Dan Kogai

RGS,

On Sep 01, 2005, at 22:21 , Rafael Garcia-Suarez wrote:

+if( (cx-sb_iters0x)==0 ) {



OK, so if I understand correctly, you're doing that every 65536th  
loop ?

Just trying to understand your patch a bit more.


I too wondered if 65536 was the optimal value so I benchmarked  
(result below)
Looks like the optimal value is 1024, not 65536.  Sounds natural  
since on most platforms the page size is 4k, or sizeof(pointer)*1024.


The modified patch (against maintperl) and benchmark script and its  
result right after the signature.


Dan the (Perl5 Porter|Friend of Hers)

--- perl-5.8.x/pp_ctl.c Fri Apr 22 23:29:48 2005
+++ perl-5.8.x.d/pp_ctl.c   Fri Sep  2 06:29:36 2005
@@ -159,6 +159,9 @@
 RETURN;
}
+#define HIOS_HACK_FREETMPS_IN_SGE 1
+#define ITERS_BEFORE_FREETMPS_IN_SGE 1024
+
PP(pp_substcont)
{
 dSP;
@@ -189,6 +192,13 @@
cx-sb_rxtainted |= 2;
sv_catsv(dstr, POPs);
+#ifdef HIOS_HACK_FREETMPS_IN_SGE
+   if( (cx-sb_iters % ITERS_BEFORE_FREETMPS_IN_SGE) == 0 ) {
+   /* shrink tmps stack */
+   FREETMPS;
+   SAVETMPS;
+   }
+#endif
/* Are we done */
if (cx-sb_once || !CALLREGEXEC(aTHX_ rx, s, cx-sb_strend,  
orig,

 s == m, cx-sb_targ, NULL,

__END_OF_PATCH__


# benchmark script -- modified so it runs on BSD-ish platforms
use strict;
use Time::HiRes qw/time gettimeofday tv_interval/;

my $t = [ gettimeofday() ];
my $i = 0;
my $s = . x 1_000_000;
printf length: %d\n, length($s);
my $started = time();
$s=~ s{ . }
{ my $x=.; ++$i % 100_000 or ps; $x }gex;
printf Total: %f seconds\n, time()-$started;

# 0  1   23  4 5
# USER   PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME  
COMMAND


sub ps{
my ($vst, $rss);
for my $ps (split /\n/, `ps awwux`){
my @ps = split /\s+/, $ps;
next if $ps[1] != $$;
($vst, $rss) = @ps[4,5];
}
my $tt = $t;
$t=[ gettimeofday ];
printf i=%d, interval=%f, vst=%d, rss=%d\n,
$i, tv_interval($tt,$t), $vst, $rss;
}
__END__

# Benchmark Result on FreeBSD 5.4-STABLE i386, 2GB RAM, Dual Xeon  
2.66GHz

# Got similar results on Mac OS X v10.4.2
# hack turned off
length: 100
i=10, interval=0.335474, vst=9584, rss=9084
i=20, interval=0.421420, vst=15064, rss=14572
i=30, interval=0.535432, vst=20292, rss=19808
i=40, interval=0.629692, vst=25204, rss=24720
i=50, interval=0.723880, vst=30156, rss=29676
i=60, interval=0.978336, vst=32688, rss=32216
i=70, interval=0.972722, vst=36660, rss=36188
i=80, interval=1.044227, vst=47692, rss=47232
i=90, interval=1.154285, vst=48768, rss=48312
i=100, interval=1.204723, vst=58036, rss=57588
Total: 8.211410 seconds

#define ITERS_BEFORE_FREETMPS_IN_SGE 8
length: 100
i=10, interval=0.309985, vst=5920, rss=5396
i=20, interval=0.426167, vst=6504, rss=5892
i=30, interval=0.328450, vst=7560, rss=6888
i=40, interval=0.319155, vst=9132, rss=8232
i=50, interval=0.322011, vst=9132, rss=8524
i=60, interval=0.334513, vst=11488, rss=10396
i=70, interval=0.333703, vst=11488, rss=10688
i=80, interval=0.344401, vst=11488, rss=10980
i=90, interval=0.353685, vst=15028, rss=13636
i=100, interval=0.356945, vst=15028, rss=13928
Total: 3.426557 seconds

#define ITERS_BEFORE_FREETMPS_IN_SGE 256
length: 100
i=10, interval=0.263990, vst=5272, rss=4772
i=20, interval=0.288537, vst=5316, rss=4820
i=30, interval=0.298329, vst=5708, rss=5212
i=40, interval=0.306359, vst=5848, rss=5352
i=50, interval=0.316509, vst=6136, rss=5640
i=60, interval=0.321965, vst=6328, rss=5832
i=70, interval=0.330730, vst=5940, rss=5440
i=80, interval=0.372460, vst=6820, rss=6324
i=90, interval=0.443334, vst=7012, rss=6516
i=100, interval=0.374078, vst=6376, rss=5828
Total: 3.311563 seconds

#define ITERS_BEFORE_FREETMPS_IN_SGE 1024
length: 100
i=10, interval=0.262606, vst=5276, rss=4776
i=20, interval=0.287707, vst=5504, rss=5004
i=30, interval=0.296727, vst=5412, rss=4912
i=40, interval=0.305621, vst=5508, rss=5008
i=50, interval=0.312880, vst=6096, rss=5596
i=60, interval=0.318939, vst=6244, rss=5744
i=70, interval=0.327908, vst=6340, rss=5840
i=80, interval=0.337890, vst=6716, rss=6220
i=90, interval=0.345272, vst=6908, rss=6412
i=100, interval=0.351104, vst=6132, rss=5632
Total: 3.141935 seconds

#define ITERS_BEFORE_FREETMPS_IN_SGE 4096
length: 100
i=10, interval=0.264656, vst=5412, rss=4912
i=20, interval=0.288359, vst=5632, rss=5132
i=30, interval=0.298809, vst=5540, rss=5040
i=40, interval=0.347351, vst=5636, rss=5136
i=50, interval=0.421626, vst=6224, rss=5724
i=60, interval=0.368024, vst=6416, rss=5920
i=70, interval=0.328497, vst=6608, rss=6112
i=80, interval=0.337559, vst=6036, rss=5532
i=90, interval=0.345616, vst=6132, rss=5628
i=100, interval=0.353223, 

s///ge; consumes PL_tmps_stack in its loop

2005-08-30 Thread YAMASHINA Hio
Hi.

A large amount of s///ge; consumes PL_tmps_stack in its loop.

This occues REPLACEMENT (right) part has statement ( eg. s//$x;$x/ge;).

Patch is follows:

diff -urN perl-5.8.7.orig/pp_ctl.c perl-5.8.7/pp_ctl.c
--- perl-5.8.7.orig/pp_ctl.c2005-04-22 23:12:38.0 +0900
+++ perl-5.8.7/pp_ctl.c 2005-08-30 10:55:05.0 +0900
@@ -188,6 +188,11 @@
if (!(cx-sb_rxtainted  2)  SvTAINTED(TOPs))
cx-sb_rxtainted |= 2;
sv_catsv(dstr, POPs);
+   if( (cx-sb_iters0x)==0 ) {
+   /* shrink tmps stack */
+   FREETMPS;
+   SAVETMPS;
+   }

/* Are we done */
if (cx-sb_once || !CALLREGEXEC(aTHX_ rx, s, cx-sb_strend, orig,

make test is also fine.

sample code is follows:

$ time ./perl -Ilib -MTime::HiRes=gettimeofday,tv_interval -le '
   my$t=[gettimeofday];my$i=0;s;my$s=.x5_000_000;
   print length: .length($s);s;
   $s=~s/./my$x=.;++$i%100 or s;$x/ge;
   s;
   sub s{
system(grep VmSize /proc/$$/status);
 my$tt=$t;$t=[gettimeofday];
 print i=$i, interval=.tv_interval($tt,$t)
   }'

original one results:

VmSize:22,024 kB i=0, interval=0.018394
VmSize:   101,408 kB i=100, interval=1.159893
VmSize:   180,948 kB i=200, interval=1.224288
VmSize:   260,356 kB i=300, interval=1.241251
VmSize:   339,764 kB i=400, interval=1.237665
VmSize:   419,304 kB i=500, interval=1.23893
VmSize:   414,420 kB i=500, interval=0.60233
real0m9.858s
user0m7.952s
sys 0m1.898s

patched one results:

VmSize:22,020 kB i=0, interval=0.018664
VmSize:28,192 kB i=100, interval=1.131531
VmSize:29,168 kB i=200, interval=1.145311
VmSize:30,144 kB i=300, interval=1.143441
VmSize:31,120 kB i=400, interval=1.151553
VmSize:32,096 kB i=500, interval=1.152435
VmSize:27,212 kB i=500, interval=0.007292

real0m5.774s
user0m4.679s
sys 0m1.083s


Result of evaluate is putted on stack and it is mortaled.
Stacked one is removed immediately when it concateneted
into substitute result.
But mortaled one is living in the PL_tmps_stack. And 
at end of s///ge; statement (pp_nextstate) releases all 
of them.

Just small code s//1/ge; does not occur this problem.
At least s//1;1/ge; needed.

./perl -Ilib -MO=Terse -le '$_=.x5_000_000; s/./$x/ge;'
PMOP (0x6119a0) subst
LOGOP (0x62db80) substcont
UNOP (0x62f2c0) null
LISTOP (0x611cb0) scope
OP (0x6305f0) null [174]
UNOP (0x611b60) null [15]
SVOP (0x611c70) gvsv  GV (0x628430) *x

./perl -Ilib -MO=Terse -le '$_=.x5_000_000; s/./$x;$x/ge;'
PMOP (0x6119a0) subst
LOGOP (0x62db80) substcont
UNOP (0x62fa90) null
LISTOP (0x611cb0) leave
OP (0x61db50) enter
COP (0x6305f0) nextstate
UNOP (0x611b60) null [15]
SVOP (0x611c70) gvsv  GV (0x628430) *x
COP (0x630660) nextstate
UNOP (0x62db00) null [15]
SVOP (0x62f2c0) gvsv  GV (0x628430) *x


./perl -le '$_=.x5_000_000;s/./1;1/ge;';
this code show 0.5% time loss, memory reduce is from 450M into 25M.
but s/./$x;$x/ge; shows 15% speedup.
much of s///ge with small string seems no extra time.


Regards.

-- 
YAMASHINA Hio [EMAIL PROTECTED]