This Week on perl5-porters - 20-26 February 2006
Nicholas Clark awarded a TPF grant to work on perl5 -- a new "maint"
snapshot available -- work on saving memory continues.
Topics of Interest
"DBI" fails to build on "blead"
Andreas Koenig noticed that "[EMAIL PROTECTED]" breaks "DBI", and thought
that it might be a good idea to have the smoke testers smoke test a
good handful of CPAN modules on a regular basis.
Nicholas Clark pointed out that DBI does lots of naughty things under
the covers, and was loathe to back out a change that enabled a
non-trivial amount of memory savings, and so he was hoping that DBI
could be patched instead. And suggested that if anyone could see to
smoking a good o' CPAN, it would be most appreciated.
Tim Bunce cheerfully admitted his guilt and patched DBI to Do The
Right Thing. Version 1.51 should be out soon. As it says in the
comments:
/* If we are calling an XSUB we jump directly to its C code and
* bypass perl_call_sv(), pp_entersub() etc. This is fast.
27244 breaks DBI, DBI breaks 27244
http://xrl.us/j9ou
How to save 3Kb
Through an extraordinary piece of luck, Nicholas Clark happened to
chance upon a simple technique to save 2720 bytes on the final perl
binary with no changes to any C files, simply by rearranging the "SV"
flag bits. He wondered if this was because the x86 architecture has
shorter instructions for 16-bit immediate constants (as opposed to
longer instructions for 32-bit constants).
Andy Dougherty recalled that Larry Wall developed 5.000 mainly on
"sparc", and that this issue has probably never appeared on anybody's
radar. Andy too noted a significant reduction on current "sparc"
hardware.
Nick Ing-Simmons suggested that the bits were probably allocated on a
least significant bit/first come, first served basis.
The joys of CISC
http://xrl.us/j9ov
"study" and tied variables
Nicholas Clark noticed that "study" will study any old thing you care
to feed it, and realised that if you gave it a tied variable, it might
be possible to pull the rug out from underneath it, simply because
what "study" saw and what the regular expression saw were not the same
things.
If this were the case, then
study $tied_var;
$tied_var =~ /$pattern/;
might fail, but remove the "study" line and it starts to work.
Nicholas was having difficulty, however, in coming up with a test case
to prove or disprove this theory. Rick Delaney responded with a nice
short snippet of code to show that Nicholas was right to be
suspicious. Andy Lester picked it up to transform it into a unit test.
Rick a added a bit more code to show that not only is "study"'s
behaviour is only stranger than we imagine, it is stranger than we can
imagine.
Studying ties
http://xrl.us/j9ow
Andy's test. applied by Rafael as #27271
http://xrl.us/j9ox
TPF awards a grant to Nicholas Clark
You may have heard the news, but The Perl Foundation has awarded a
grant to Nicholas Clark to work on various parts of Perl 5 for the
next three months.
Nicholas divided the matter into three parts
* Items for "blead"
* Items for "blead" that could make it back to "maint"
* Items for "blead" that should make it back to "maint"
The third item above is a large list of goodies, so let's hope that
Nicholas is successful.
I know it's in the boilerplate of this summary, but it's worth saying
again: if you find these summaries useful, consider making a donation
to the Perl Foundation. If it helps people like Nicholas, Rafael or
Robin Houston work on Perl full-time for a while then only good things
can come of it.
The details from the foundation
http://news.perlfoundation.org/2006/02/2006_q1_grant_votes.html
Contribute! Contribute!
https://donate.perlfoundation.org/index.pl?node=Contribution%20Info
Note, the original message to p5p had an excessive number of repeated
punctuation characters in the subject line, that led some poor soul's
brain-dead mail reader to classify it as spam. If you missed this item
on the mailing list, you may have to fish it out of your Junk folder.
Make perl Fast
http://xrl.us/j9oy
Part of the thread went sideways and started talking about spam.
Nonetheless, it was still an interesting discussion.
Off-topic discussion about spam
http://xrl.us/j9oz
Detecting unrealised constant subroutines
Last year, Nicholas Clark rewrote the way constant subroutines are
implemented, which led to significant memory savings. Adam Kennedy
realised that this may be interfering with an idiom used in his code
to whether a function (and thus a "constant" is defined):
if (defined *{$glob}{CODE}) { ... } # it is defined
Nicholas showed that that code fragment may have problems even as far
back as 5.005. Rafael suggested that a much better technique was to
use "exists" instead, such as
if (exists &{ref($obj).'::'.$method}) { ... } # no, it exists
Constantly learning new things
http://xrl.us/j9o2
Unwanted vivification
Steve Peters committed a new test as change #27287, to note that the
following program should not produce any output:
my %h;
foreach (@h{a, b}) {
# do nothing
}
print "$_\n" for keys %h;
Except that it does (as noted in bug #2166). Nicholas observed that
hash slices also showed the same behaviour when used as subroutine
arguments. Nick Ing-Simmons admitted to having used that feature.
It's not a bug!
http://xrl.us/j9o3
Shrinking the "body_details" table
Nicholas committed change #27290 to replace a field by "U8" instead of
"size_t". Jan Dubois wanted to know whether the cost of an unaligned
"U8" fetch would outweigh the slowdown incurred for what would appear
to be a very modest reduction in size.
Nicholas wasn't sure what the impact would be on "x86" but noted that
there was no impact on "ARM", and wondered whether (and how) he should
attempt to instrument cache use.
Do not cross the (cache) line
http://xrl.us/j9o4
A "maint" snapshot ("[EMAIL PROTECTED]")
With a couple of minor improvements to 5.8.8, Nicholas decided it was
time for a maintenance snapshot.
Smaller *and* faster
http://xrl.us/j9o5
Dealing with "op_next" and "op_sibling"
A couple of years ago, Jim Cromie made the observation that in the
process of turn Perl code into opcodes, there are two fields in
BASEOP, "op_next" and "op_sibling" and yet only one is in use at any
given time, and a back of the envelope calculation showed Jim that it
might be possible to reduce the amount of memory required to store
opcodes by 20 percent.
Jim took another look at the idea this week, this time with a patch to
implement the idea as a proof of concept, and requested comments.
This time, rather than doing away with "op_sibling", he just wanted to
push it away somewhere else, so that the result blob of opcodes would
be "denser" and thus more of them would fit in a cache line, which
could lead to a measurable increase in run-time performance.
A run-time option may be made available to drop it altogether, which
would reduce the memory footprint of the op-coded program (on the
understanding that one shouldn't expect the "B::" modules and the like
to continue to work afterwards).
Nicholas Clark did some arithmetic on pointers eliminated and needed,
and concluded that the result would in most cases would end up as a
net loss.
All the same, the conversation turned to the question of
optimisations, and the issue was of trying to gauge how much work it
would take to make the peep-hole optimiser truly pluggable. One could
then switch over to a more powerful optimiser that spent more time in
rewriting the op-code tree, in order to improve the run-time
performance of long-running programs.
A hare-brained scheme
http://xrl.us/j9o6
Patches of Interest
Sloppy "stat()" on Windows
Following on from the thread kicked off last week concerning trailing
slashes on directory names and their effects of the result of "stat"
operations in Win32, Jan Dubois delivered a patch to improve the
performance of "stat"ing, and the cost of potential problems with hard
linked files on NTFS partitions. Applied by Rafael as change #27283.
Jan's preliminary benchmarks showed a significant increase in speed.
It turns out that Perl has had the ability to create hard links under
Win32 for quite some time, the proviso being that the account from
which the process is run must have Backup/Restore privileges.
From last week
http://xrl.us/j9o7
Fast and sloppy
http://xrl.us/j9o8
"B", "CGI" and "ExtUtils::MM_Unix"
Joshua ben Jore rolled up a composite patch to tidy some warnings that
were cropping up in the test suite under "blead". Steve Peters tried
the patch but received a "you planned 55 tests but ran 2 extra"
warning. After Joshua delivered an updated patch, Steve applied it
with a couple of further tweaks.
http://xrl.us/j9o9
Joshua had another patch for B.pm and realised to his horror that he
had perltidied the source, so had to go back to fresh copy and redo
the patch. He asked if people minded if he tidied the code in the "B"
namespace as he worked on it.
Rafael said that he didn't mind, so long as formatting changes and
code changes were not mixed up in the same patch (in other words, send
in a patch that does a whitespace reformatting, and then send in a
second patch that changes the code).
http://xrl.us/j9pa
Making "SDBM_File" work with "-Duse64bitall" on Darwin
Dominic Dunlop supplied a patch to clear up the test failures relating
to "SDBM_File" when compiling with "-Duse64bitall" on Darwin 8.x (Mac
OS X 10.4.x). Patch applied without fuss by Rafael as change #27250.
http://xrl.us/j9pb
Insecure Dependency in t/op/utftaint
Dominic was also having trouble with $ENV{CDPATH} after change #27236;
the test was dying with a taint error, so he rewrote the code to
resolve the failure but was worried that it might have broken
something that the code was actually supposed to check. In any event,
Rafael applied the patch as change #27248.
http://xrl.us/j9pc
Fixing op/magic failures on "cygwin" after 1.5.19-4
Yitzchak Scott-Thoennes sent in a patch to deal with changes in $^X in
recent snapshots of "cygwin". Steve Peters applied the patch, but
H.Merijn Brand wondered whether the "s/\.exe//" in the patch should be
anchored with a "$".
Unused context warnings, revisited
Andy Lester sent in a new and improved version of his patch to get rid
of warnings of unused context. Better still, in unthreaded perls, the
"PERL_UNUSED_CONTEXT" vanishes to... nothing. Steve Peters liked the
patch so much, he applied it as #27300.
http://xrl.us/j9pd
Avoiding a "valgrind" error
Jarkko posted another patch that applied equally well to both "maint"
and "blead". The problem was this:
$_ = 'a';
s/a//e;
print eval '$&';
And he finally has a patch to solve the leak. Applied by Rafael as
#27270. Nicholas managed to cook up a snippet that snuck past Jarkko's
patch. Jarkko was pretty disgusted.
Basically, a lot of work has gone into trying to optimise unnecessary
copying of $&, but perhaps it's time to throw it all away and just
copy the string, all the time, everywhere.
The root of all evil
http://xrl.us/j9pe
Andy makes "SvREFCNT_inc" go faster
Andy Lester rewrote the "SvREFCNT_inc" macro that it used whenever the
reference count of an "sv" needs to be incremented. He noticed a tiny
improvement as a result, but was a bit surprised that the difference
wasn't greater. Andy's looking for people to take "blead" for a spin,
before and after change #27334 to see if anyone else can measure an
improvement.
Faster! Faster!
http://xrl.us/j9pf
He also consted gv.c and gv.h, most of which was applied.
http://xrl.us/j9pg
And replaced uses of constructs such as "(char*)NULL" by plain old
"NULL". Mark Jason Dominus warned that "NULL" needs to be cast when
passed as an argument to a variadic (varying number of arguments) or
unprototyped function. Andy cooked up a fresh patch.
http://xrl.us/j9ph
Zsban Ambrus noted a few discrepancies in perlutil.pod and provided a
patch to bring reality in-line with the documentation (or possibly the
other way around). Not (yet) applied.
http://xrl.us/j9pi
Watching smoke signals
Smoke [5.8.8] 27170 FAIL(F) MSWin32 WinXP/.Net SP2 (x86/2 cpu)
One of Steve Hay's smoke machine starting emitting black smoke. He
tried to track the problem down with Nicholas Clark but nothing
conclusive was found. Finally things seemed to settle down after the
machine was rebooted. Nicholas wondered if that step could be added
automatically between "make" and "make test".
http://xrl.us/j9pj
Smoke [5.9.4] 27265 FAIL(F) MSWin32 WinXP/.Net SP2 (x86/2 cpu)
Steve Hay had another run produce copious gouts of black smoke, *and*
leak lots of disgusting fluids on the floor. Nicholas wondered what he
had done to merit that. Steve analysed the problem and pin-pointed
change #27249 as the reason. Nicholas thought it was some kind of
alignment problem.
Worse, for all the tests that failed, they ran just fine under the
debugger. After recompiling everything with "/Zp4", Steve noticed that
the culprit was in fact #27248.
http://xrl.us/j9pk
New and old bugs from RT
Unusual "tell" results with ":crlf" layer and multibyte input (#38587)
Alex Davies posted a short test script that demonstrated what he
thought was a bug, but has apparently been Warnocked.
Slow Unicode regular expressions (#38595)
This bug report received a few comments this week. Sadahiro Tomoyuki
explained what was eating all the cycles and offered a 4 character
patch ("&& i", if you're curious) to be applied somewhere in the
bowels of mg.c.
Rafael was a bit worried by the implications of what Tomoyuki was
saying and countered with one-line patch to utf8.c, and Tomoyuki came
back with two ways in which Rafael's patch could be broken.
Nick Ing-Simmons consulted his store of Perl arcana and added another
data point to the picture. In some places of the code, it's easier to
indicate to the called function that it has to determine the length of
a buffer itself, rather than determining the length in the calling
function. Tcl and C++'s STL both use this trick.
But with all that said, I'm not that Unicode regexps are back up to
speed.
http://xrl.us/j9pm
"Storable" 2.15 "freeze"/"thaw" corrupts "qr//" on 5.8.8 (#38605)
David James noticed that if you take a "qr//" regexp, "freeze" and
then "thaw" it, you get garbage, or at the very least, something that
certainly won't match what the initial regexp did.
He noted that "Regexp::Storable" on CPAN fixes the problem, but until
you have a problem, you may not realise that "Regexp::Storable" is the
solution, and thus it would be better if that functionality were
available directly in "Storable". Yves Orton suggested that using
"Data::Dump::Streamer" might be more successful (which I incorrectly
named "...Dumper..." instead of "...Dump..." in last week's summary.
Shame on me).
http://xrl.us/j9pn
"Data::Dumper" dump core in 5.8.6, fixed by 5.8.7 (#38612)
Jarkko Hietaniemi noted that a shortish snippet of code used to cause
"Data::Dumper" to dump core, is now fixed in 5.8.7. I think this
latter point was sufficient for people to not bother to find out what
fixed it.
It works now
http://xrl.us/j9po
"lc", "uc" and "substr" misbehave with UTF-8 (#38619)
"benizi" filed a remarkably horrible bug using simple
"substr(lc($var),0)" constructs when "_utf8_on" (from "Encode") was
used. Andreas Koenig traced it back to a change (#18353) integrated by
Jarkko in December 2002 to perform cache the differences between
lengths and bytes (since UTF-8 is a method of storing variable-width
characters -- some take 1 byte, some take 2 or more).
Functions like "substr" and "index" stand to benefit from this
information, since it means they don't have to scan from the beginning
of the string to find the offset that interests them, they can dip
into the cache and find a much closer start position from which to
start counting. (Summariser's note: at least that's the way I
understand it. Corrections welcome if in fact I am propagating an
incorrect meme).
And as the check-in comment from Jarkko said: "Code this hairy is
bound to have hairy trolls hiding under it." It took more than three
years to shake the troll out into the open. Sadahiro Tomoyuki, Unicode
bug squasher *extraordinaire* immediately saw what was wrong, and
proposed a patch that was gratefully applied (since it meant one less
thing to tackle during his TPF grant) by Nicholas as change #27329.
http://xrl.us/j9pp
The true semantics of "/$pat/o" (#38625)
Ulrich Windl want to know what happens with "/$pat/o" when $pat has
two different values in two different lexical scopes. Dave Mitchell
set him straight and Hugo van der Sanden filled in the missing blanks.
http://xrl.us/j9pq
"tied" variables don't work with ".= <>" (#38631)
Nicholas discovered a bug with "tied" variables in 5.8: you cannot do
"$tied .= <FH>". Well, nothing is stopping you, but then again it
won't work as expected. Nicholas reasoned that it due to a change in
5.8 using the "rcatline" op, and some sort of deficit of appropriate
magic.
Andreas Koenig tracked the problem down to change #11634.
http://xrl.us/j9pr
Brendan O'Dea forwarded a bug from the Debian tracking system. It
turns out that something like "*a= $a = *b; $a = 42;" causes
spectacular fireworks to occur, all the way to 5.005. Abigail
discovered that it segfaults as far back as 5.000.
John E. Malmberg noted that this is now caught semi-gracefully by an
assertion in "blead". Nicholas started to understand what was going
wrong as the summary went to press.
Don't Do That Then
http://xrl.us/j9ps
Perl5 Bug Summary
1547 open items
http://xrl.us/j9pt
Overview
http://rt.perl.org/rt3/NoAuth/perl5/Overview.html
In Brief
Help Michael tear himself away from his Worlds of Warcraft addiction,
and make "blead" better at the same time. It's a win-win situation.
Schwern must pay
http://xrl.us/j9pu
Nicholas thought about merging "hasargs" and "lval" within the "struct
block_sub" structure that takes care of the business of subroutine
contexts. He then realised that savings would only kick in after
nesting 50 or so calls, and decided it was not worth the effort
http://xrl.us/j9pv
Nicholas lined up lvalue functions in the cross-hairs. There's an ugly
bit of code in "op.c" to deal with returning temporaries from lvalue
subroutines. If things were done differently, it would allow "PVGV"s
to be laid out differently, which would in turn resolve a certain
number of special cases in the code.
http://xrl.us/j9pw
Joshua ben Jore wanted to know if there was a way of distinguishing
between methods and function calls under the debugger. Nick
Ing-Simmons said there was *no* difference between the two. On
Perlmonks, Adrian Howard had noted that "Devel::Caller" used to be
able to do this, but its test suite fails on perls more recent than
5.8.4.
http://xrl.us/j9px
Dave Mitchell made it back safe and sound from India. When asked by
Nicholas whether he enjoyed himself, he replied that it was a bit of
curate's egg.
http://xrl.us/j9py
Long time readers of World Wide Words may recall the meaning, but if
not:
http://www.worldwidewords.org/qa/qa-cur1.htm
(WWW is a weekly mailing list devoted to, well, words. I recommend it
highly).
Adam Kennedy started to wonder about the implications of the new
constant subroutine optimisations in terms of "mod_perl".
Specifically, he wanted to know whether it was worthwhile to flesh
them out forcibly in the parent process, to allow a greater sharing of
VM pages between the kids. According to Nicholas, it would only be a
problem with "mod_perl" children that "eval" or "require" a lot of
code on the fly, which will kill page sharing in any event.
http://xrl.us/j9pz
Seungho Han had a couple of questions about using the "B::" modules,
and Joshua ben Jore provided all the necessary information
http://xrl.us/j9p2
Linda W spotted an inconsistency in the output of "-V" and wondered if
it was a bug or a feature. No takers.
http://xrl.us/j9p3
Nicholas Clark showed a short snippet of code involving typeglobs and
some really weird behaviour you don't want to know about. It has come
about as a result of fairly aggressive refactorings of the codebase
aimed at reducing the amount of memory required to store a scalar. I
think he also managed to explain why.
Undesirable emergent C<typeglob> behaviour
http://xrl.us/j9p4
Salvador Fandino discovered that the debugger chokes if you try to
dump a "tied" "glob" that lacks a "FILENO" method, and patched it to
handle things more gracefully. Applied by Rafael as change #27342.
http://xrl.us/j9p5
John L. Allen was trying to get "IO::Tty" to build on AIX, and was
having trouble with the constant "TIOCSCTTY", which, it turns out, was
being created in XS C code, rather than Perl, which may have something
to do with the problem.
http://xrl.us/j9p6
Anno Siegel added a couple of checks to the test suite to exercise the
stringification of hash keys in t/op/hashassign.t.
About this summary
This summary was written by David Landgren.
Last fortnight's summary gathered a few replies, mainly about the
ActiveState/NTFS hard links issue, and also Joshua Juran's work on Mac
OS/Lamp.
Stephen McCamant also provided some useful information concerning the
historical divergence of the code behind "&=" *versus* "|=" and "^=".
http://xrl.us/j9p7
Information concerning bugs referenced in this summary (as #nnnnn) may
be viewed at http://rt.perl.org/rt3/Ticket/Display.html?id=nnnnn
Information concerning patches to "maint" or "blead" referenced in
this summary (as #nnnnn) may be viewed at
http://public.activestate.com/cgi-bin/perlbrowse?patch=nnnnn
If you want a bookmarklet approach to viewing bugs and change reports,
there are a couple of bookmarklets that you might find useful on my
page of Perl stuff:
http://www.landgren.net/perl/
Weekly summaries are published on http://use.perl.org/ and posted on a
mailing list, (subscription: [EMAIL PROTECTED]). The
archive is at http://dev.perl.org/perl5/list-summaries/. Corrections
and comments are welcome.
If you found this summary useful or enjoyable, please consider
contributing to the Perl Foundation to help support the development of
Perl.
--
"It's overkill of course, but you can never have too much overkill."