This Fortnight on perl5-porters - 12-25 December 2005
Perl 5 turns 18 and is given some lovely birthday presents, such as
Perl 6's switch, smart match and say. Constants get smaller, faster
and cheaper. So with all that and more, it's probably time for 5.9.3
to escape from the lab.
Patching "sprintf", all the time, everywhere
The "sprintf" patch effort proceeded smoothly this week. Andy
Dougherty phoned in to give the patching of Solaris 8 a clean bill of
health. But Nicholas Clark wanted to get the other segmentation
violations nailed down as well, and suggested a new set of patches.
Which didn't work on Windows, and so the set was refined.
Dominic Dunlop reported perls 5.8.0 through 5.8.7 on Mac OS/X was
looking good, albeit with minor unrelated warnings. Steve Hay reported
success on Windows for 5.8.0, .2, .3 and .7. Nicholas published the
patches on CPAN.
http://xrl.us/jbwc
Gisle Aas modified the way "sprintf" obtains indexes and widths from
the format string to check for overflows. As a result numbers larger
than "IV_MAX+1" (already ridiculously large on 32 bit hardware) can no
longer be used directly to specify a width, but as a) such values are
probably not used in practice and b) there are still indirect means of
specifying Very Large Widths (such as %*d), it's not much of a loss.
http://xrl.us/jbwd
More on funny file names
Last week, Craig Berry pointed out an annoyance with "Pod::Simple":
its test suite contains directories with names such as
lib/Pod/Simple/t/test^lib. This is a hassle on VMS, because circumflex
is not a legal character for a filename, and he hoped that the
directories could be renamed by changing the circumflex to an
underscore.
This week, John E. Malmberg pointed out other problems of filename
portability. Nicholas Clark reported that the directories had been
renamed to something sensible, but was now struggling with trying to
get Perforce to delete the now-empty
"troublesomely^named^directories". Rafael came to the rescue with the
correct incantation.
http://xrl.us/jbwe
"strict 'refs'" doesn't apply inside "defined"
Nicholas Clark noticed that the following snippet produced no error:
use strict;
$a = "Not documented"; print defined @$a;
# prints nothing, no error message
$a = "Not documented"; print defined %$a;
# prints nothing, no error message
... which is certainly a bug, and was worried that production code
might be relying on this misbehaviour. Since applying "defined" to
arrays or hashes is vaguely deprecated, Rafael Garcia-Suarez thought
that it wasn't very important, and probably some odd side-effect of
the optimiser. Robin Houston came to the heart of the problem by
showing that that "defined()" doesn't autovivify stash entries when
used via a symbolic reference:
% perl -le '$n="z"; %$n; print(exists $::{$n} ? "yes" : "no")'
yes
% perl -le '$n="z"; defined%$n; print(exists $::{$n} ? "yes" : "no")'
no
... and also showed where in the source this was happening.
http://xrl.us/jbwf
Rafael Garcia-Suarez noticed that all the internal datatypes exhibit
similar behaviour, but when he patched the necessary functions to fix
it up an odd assortment of tests began to fail. One of the more icky
cases was "DBM_Filter", with the snippet:
C<if (!defined %{"${class}::"})>
Rafael want to at least fix up "defined($$foo)", but Robin was
doubtful, thinking that there's probably code Out There that relies on
this behaviour. And we got back to the comment in "Perl_ck_defined()"
that has a warning comment about not breaking "Tk" (This issue has
been hovering in the fringes of a number of threads in the past few
months).
Nicholas then posted a particularly brilliant analysis of the problem.
It's well worth reading if you want to get a feel for what goes on in
perl's guts
http://xrl.us/jbwg
It comes down to the fact that people are not so much trying to see
whether hashes (or. more precisely, stashes -- symbol tables hashes)
are defined *per se*, but rather they are trying to determine what
packages have been loaded below the current package (that is, "A::B"
wants to know if "A::B::C" or "A::B::D" are loaded. In the current
state of affairs, just *thinking* about "A::B::C" makes it spring into
life. So Rafael applied change #26370 to tweak the lexer to make it
all work again.
http://xrl.us/jbwh
"sprintf" of version objects
Gisle Aas filed bug #37897 regarding how "sprintf" deals with version
objects, as part of the fallout from the "webmin"/"syslog" debacle.
John Peacock was at a bit of a loss as to where to start looking, and
said that "sv_vcatpvfn()" was an example of overloaded operation gone
wrong. After mulling over the problem for a while, John came up with a
simpler approach that Gisle liked. So John went ahead an implemented
it. Gisle came back with a couple of suggestions to tighten it up.
H.Merijn Brand came up with a good test case:
# all on one line
($_=sprintf"%vx\n",$^V)=~s^\d\d.3\d.2^Just \
Anoth^;s$.3..2$r P$;s;.3.;rl Hacker,;;print
At the end of the thread, John reminded us about far more than we ever
wanted to know about version objects.
http://xrl.us/jbwi
"Pod::Man" creates empty man pages
Randal L. Schwartz wrote in to say that "podlators-2.0" broke an
interface assumption (a parser object could not be reused) which
caused a number of things (heavy lifters like "pod2man",
"ExtUtils::MakeMaker:MM" and "Module::Build") to fail. Russ Allbery
said that "pod2man" was fixed in the "podlators" distribution, but the
main reason is "Pod::Simple" objects are single use once only. The new
maintainer is interested in resolving the issue, but the solution is
not simple.
Ken Williams was ready to patch "Module::Build", but pointed out that
if someone was using "M::B" or "EU::MM" and after upgrading "Pod::Man"
noticed that their man pages stopped working, then their first reflex
would be to see about seeing what was wrong with "Pod::Man". It might
not be obvious that the correct fix would be to upgrade "M::B" or
"EU::MM". So at some point in time "Pod::Man" does need to do The
Right Thing.
http://xrl.us/jbwj
Calling perl's guts simultaneously from many threads
Every once in a while, someone pops into p5p with a very (on the
surface) simple question that turns out to be fiendishly complex. Mark
Glines won this week's award for the best question.
It turns out that Mark has written a Perl XS wrapper for the Fuse
project.
http://fuse.sf.net/
Fuse, as I understand it from Perl's perspective, is that you just
"use Fuse" and you create a filesystem whose entries can be mapped to
Perl routines. The Perl interface, until now, has been a
single-threaded implementation (mainly because it's been around since
5.6.1). Mark recently looked at 5.8's thread support and decided it
looked much better, and set about investigating how to make a
multiple-threaded version of Fuse. He had a number of questions.
Nick Ing-Simmons dived in with a number of detailed responses to
Mark's questions, explaining how perl's threads behave. In fact, with
a bit of help from Nick and Yitzchak Scott-Thoennes, Mark was well on
his way to getting it all work quite well.
http://xrl.us/jbwk
The Ponie Pump(?:king|queen) chair is up for grabs
Two and a half years ago, Fotango began sponsoring the Perl 6
development effort, in the shape of Ponie, or an implementation of
Perl 5 on Parrot. Nicholas worked on the project during his stay at
Fotango, but he's now moved on, and Ponie is looking for a new
manager. If you're interested, drop Nick and/or Jesse a line at
"[EMAIL PROTECTED]".
http://xrl.us/jbwm
Eliminating null macros.
Nicholas wondered whether it would be worth reducing the use of macros
by changing things like "Nullsv" to "(SV*)0", the reason being that
it's three things that people have to remember, and three things that
newcomers have to learn.
Andy Lester thought it might be better to replace the "Null[a-z]+"
variants with "NULL". Jan Dubois remembered a case where it does
matter: when using "vararg"-style functions, H.Merijn pointed out that
the explicit casts gives the compiler something bare "NULL"s don't:
hints about alignment. Specifically, on some architectures (ok, HP)
the following is safe:
long l; *((char *)&l) = 'a';
but
char a[160]; *((long *)a) = 100L;
may result in phase-of-the-moon errors. Nicholas applied a couple of
experimental patches to see what it would look like.
http://xrl.us/jbwn
PeekMessage() call in win32\win32.c win32_async_check fixed
Following on from last week's thread about "alarm()" on Win32, Jan
checked in some code to address the problems. Rafael had a bit of
trouble with the associated test suite, and it was all sorted out in
the end
http://xrl.us/jbwo
Change 26165 broke ext/threads/t/stress_re.t test on Win32
Steve Hay observed that somewhere between changes 26150 and 26185,
changes to sv.c caused the above test file to fail. After doing a
binary search among those change, he pinned the problem on change
26165, but couldn't see what in the patch was causing the problem.
Nicholas and Steve went over the parts of 26165 and began to apply
them individually, looking for mistakes. It turned out to be a
trailing comma at the end of a very heavily-casted subtraction.
After a lot of patient untangling of the initial patch, Nicholas got
it all sorted out. A fascinating case of debugging by mail. All fixed
up in 26378.
http://xrl.us/jbwp
Reducing ithreads overhead
Brendan O'Dea told of how Debian now builds a threaded perl by
default, and one of the biggest complaints is the slowdown that that
induces, and worse, segfaults. A couple of ideas where discussed, but
no firm conclusions were made.
http://xrl.us/jbwq
Smart match is done
Robin Houston wrapped up his work on the new smart match operator, the
"~~" operator from Perl 6. The patch was over 100 kilobytes. Nicholas
carefully applied two parts of it, that corrected a couple of spelling
errors in comments. which tickled Robin no end.
After having looked more carefully at the patch, Nicholas decomposed
into six discrete components. Rafael was a bit dubious about the "use
feature" approach to implementing smart match (and switch, and say)...
all of which are provided by this patch. He thought it a pity that a
pragma was required to get at recent coolness added to the language,
but backwards compatibility being what it was, that it was the best
way forward.
Yves Orton ironed a few kinks out the patch to get it running cleanly
under threads and Windows.
http://xrl.us/jbwr
Robin later followed up with another version of the patch
incorporating Yves's changes and a tweaked version of smart match
semantics (subject to more possible semantic changes when Larry and/or
Damian touch base). Applied by Rafael as change 26400 and a tweak in
26416.
http://xrl.us/jbws
Slimming down "constant"s
Nicholas Clark noted that, despite optimisations and inlining,
constants take up a lot of memory:
$ perl -MDevel::Size=size -le 'sub c () {3}; print size(\&c)'
434
... and wondered whether there was a viable way of reducing their
size. One approach would be to store the scalar value directly, but
that would require some collusion between the optimiser and the
"pp_entersub" opcode. Over seven years ago, Ilya Zakharevich produced
a patch to bring the cost of a subroutine declaration from 500 bytes
down to 100 (change 965), one possibly non-documented side effect of
which is:
$ perl -lwe 'sub foo {}; sub bar; sub baz ($); \
print "($_)" foreach $::{foo}, $::{bar}, $::{baz}'
(*main::foo)
(-1)
($)
Nicholas wondered whether it would be worth replacing this mechanism
by another whereby something else than a typeglob could appear in
symbol tables and could be used for storing constants. For modules
sych as "Socket" and "Fcntl" that implement constants internally, it
would become possible to generate true inlineable constants at require
time, rather than at run time.
The main stumbling block is that as a constant is secretly prototyped
as "()" behind the scenes, the trick is figure out how to distinguish
between a true prototyped routine and a constant value.
After some more legwork, Nicholas discovered that "pp_entersub" can be
fed just about any old thing, which would therefore require a
non-trivial amount of cooperation from higher-level ops. And the worst
of it is that as a general rule, most constant subs are never actually
called in any particular program.
The new scheme that Nicholas came up with is quite clever. Now, when a
reference to a constant value is stored in the symbol table, like
$foo::{bar} = \3;
It stays that way (occupying only as much space as a reference) until
it gets called upon. At that point, it springs into "sub foo::bar ()
{3}" and can be inlined as needed. What is more, any constant-building
mechanism that looks like this get the optimisation for free. But
wait! there's more! The first time Nicholas took the new code for a
test drive, he picked up an error (a missing semi-colon in
"Tie::File").
Benchmarking the new technique revealed that 1% less memory is
consumed when Fcntl is "require"d and a massive 12% less when "use"d.
And all its constants are now inlined. The "POSIX" shows even better
improvements (which will put to rest the meme about writing "use POSIX
()" to avoid importing the universe). Bonus!
Even better still, the memory saved is that much less memory needed to
be cloned when threads are created. A truly momentous change.
http://xrl.us/jbwt
Later on, Nicholas figured that it should be possible to teach
"Devel::Peek" how to dump out the value of constant subroutines, and
committed a patch (26404) to do just that. Dave Mitchell liked the
output.
http://xrl.us/jbwu
"//g" loops infinitely on tainted data
Steve Peters revived an old bug (#8262) where Kirrelly "Skud" Roberts
noticed that the follow snippet went into an infinite loop when
tainting is active
while($ENV{LANG} =~ m/(.)/g ) {
print "saw '$1'\n";
}
Sadahiro Tomoyuki noticed that it occurs with certain, but not all,
tainted values. For instance $ARGV[0] loops, but its copy "$copy =
shift" does not. Mike Guy reasoned that bad magic was causing the
trouble, no doubt some interaction between magic and taint causing
"pos()" to be reset.
This last clue was sufficient for the light-bulb to go off over Dave
Mitchell's head, and create a patch that fixes the problem.
http://xrl.us/jbwv
Perl is 18 years old
Dave Mitchell noted that Perl is now old enough to drink hard liquor,
vote and join the army. Randal Schwartz remarked that that's a long
time to be printing "Hello, world". Joshua Juran recommended that
Randal purchase a book called, um, "Learning Perl", currently in its
third edition. "brian d foy" suggested that Joshua buy a new copy,
since it's now in its fourth edition.
http://xrl.us/jbww
Plugins for Lint
Joshua ben Jore sent in a patch against blead to teach "B::Lint" to
accept plugins. The particular scratch he wanted to itch was the
ability to check for unchecked system calls and non-constant sprintf
formats. Rafael wondered whether Joshua planned to backport the
current "B::Lint" checks as plugins in the new regime. Joshua wasn't
convinced of the value in doing so, to which Hugo van der Sanden
pointed out that it would provide a good set of examples to show how
future plugin authors could write their own. But as the plugin
mechanism has a certain overhead, the cost would be too great to
convert everything over.
http://xrl.us/jbwx
Making the "sort" pragma lexically scoped
Robin Houston thought that the "use sort" pragma (as in "use sort
'_quicksort'", "use sort _'mergesort'") should be lexically scoped.
And having a couple of moments to spare after having written "switch"
and other goodies, whipped up a patch to do just that. As a result he
had to rewrite part of the test suite which was relying on the
previous behaviour, but that no-one minded that.
Nicholas did raise a concern about pragmata namespaces, which Robin
addressed in a followup patch. All applied by Rafael.
http://xrl.us/jbwy
Robin also supplied a patch to test for "%^H" (the hints hash) being
propagated into "eval".
http://xrl.us/jbwz
Is "perlivp" too fussy?
Gisle Aas doesn't like the way "perlivp" (Perl Installation
Verification Procedure) moans about incorrectly installed "*.ph"
files, which isn't really a problem under ActiveState Perl, and
supplied a patch to silence these warnings.
Rafael noted that "perlivp" doesn't play nicely in this day and age of
package-managed operating systems, and that Mandriva doesn't bother
distributing it. Gisle thought that another solution would be to get
rid of it altogether. Michael Cummings (who does Perl on Gentoo) voted
for either removing it, or at least updating it so that it at least
prints out a description of each test as it runs.
http://xrl.us/jbw2
Should "err" be a feature?
Up until now, "err" has been implemented as a weak keyword, which in a
nutshell means that it operates as a keyword only if a subroutine of
the same name has not been declared previously. Now that Robin has
introduced the "feature" pragma, he wondered whether it would not make
more sense to recast "err" as a feature.
In case you haven't been following Perl 6 too closely, "err" is to
"//" (defined or) as "and" is to "&&". Rafael quite rightly pointed
out that since "//" isn't a feature, (because it doesn't clash with
anything apart from an obscure regexp construct) it seems odd to
require "err" to be featured. The crux of the matter is the trade-off
between backwards compatibility and using new, ah, features.
Also proposed was the idea of "use feature ':5.9.3'", "use feature
':5.10'" and "use feature ':all'" as ways of getting everything new
that the interpreter has to offer, with possibly various degrees of
pain inflicted on getting your code to compile.
Sébastien Asperghis-Tramoni fired up his own private gonzui search
tool, and discovered that 69 distributions on CPAN currently define
"sub err".
Hot on the heels was the need for a command line switch that would
have the same effect, in order to "err", "say" or smart match in a
one-liner. Since the best candidate ("-f") has recently been used for
another purpose, Robin posted the list of unused command-line switches
b E g G H j J k K L N o O q Q r R y Y z Z
... and wondered which one would be the best for this purpose. Abigail
suggested -6 only not really. More interesting was Abigail's idea to
just enable the whole lot when "-e" is used, since in a one-liner it
would be highly doubtful that "err" ever needs to be diambiguated, or
else "-e" uses all features, and "-E" uses none.
Then it started to get really silly, with Tels suggesting to use æ (ae
ligature).
http://xrl.us/jbw3
In a followup post, Robin produced a patch that does the reverse of
Abigail's sensible idea: that is, "-E" is like "-e" with the addition
of all features.
Time to release 5.9.3?
After the discussion featuring "err", Rafael thought that with the
addition of all these new shiny, erm, features, it must be time to
roll out a new version of bleadperl to give it some wider exposure.
Some things that need to be done are: get feedback from Larry and
Damian on smart match, better (read: more) tests for modules like
"DBI" and "Tk" and implementing "feature 'err'".
Robin promised to wrap up the documentation for the work he's done so
far, on the understanding that semantics may change from here to 5.10,
and Rafael said that it might be nice to try for 1/1/2006 as a release
date. Robin checked in some documentation for switch and smart match a
few days later.
http://xrl.us/jbw4
A long discussion followed revolving around "err", and whether it
could be possible to promote it to a full keyword. Rafael disliked the
idea, because it would introduce source-level incompatibilities,
Gisle proposed cutting the Gordian knot by just simply removing the
"err" keyword altogether. H.Merijn said that this would remove a
certain amount of orthogonality from "||"/"or" and "&&"/"and". Yves
thought that "err" was a pretty dumb name, and put forward "dor" as an
alternative. Randy W. Sims concurred, and thought that "default" would
be better than "err". But as it comes from Perl 6, it appears we don't
have that luxury.
Yves's objection is that "err" sounds too much like error, as in
Danger! Danger!, when it really means "I half expect this to not work,
but that's ok. Another valid argument was that the "err" object in
Visual Basic is the way you throw errors, so VB refugees are going to
have a hard time unlearning it in Perl.
To summarise: just about everyone thinks that "err" sucks, but no-one
can think of anything better.
The summariser thinks that since the purpose of the operator is to let
you try out something, and if that doesn't work, get something else,
then it should be called the Jagger-Richards operator, because you
can't always get what you want, but you might find you get what you
need.
Rafael wondered whether, with Robin's implementation of switch in the
interpreter, whether it would be possible to fast-track the
deprecation of "Switch.pm". Robin pointed out that the semantics of
his switch are not quite the same as Damians source filter switch. In
the end, though, since 21 CPAN modules (again thanks to a search from
Sébastien's) already use Switch, it'll stay.
http://xrl.us/jbw5
Adding feature bundles and featuring "err"
After having done six Impossible Things before breakfast, Robin
finished up with a patch to implement feature bundles, as discussed
above.
http://xrl.us/jbw6
Should occurences of "I32" be replaced with "IV" or "STRLEN"?
After having started to look at the array index issue, Jan Dubois
noted that there are many other cases where signed/unsigned types are
used inconsistently.
Worse, in the case of 64-bit builds, some temporary copies of things
get shoved into hard-coded 32-bit variables, which admits the
possibility of nasty wrap-around errors.
Rafael wondered what impact the proposed change would have on the
recompilation of XS modules, and other checks that might cause
slowdowns.
http://xrl.us/jbw7
New Modules
Ken Williams uploaded a beta "PathTools" (3.14_02). This incorporates
the "stop at first directory in PATH" fix that was giving Nick
Ing-Simmons grief a few weeks ago. If all goes well, a 3.15 will be
out shortly.
Perl5 Bug Summary
Steve Peters closed out at least two bugs that I was paying attention
to, 7567 and 23907. And 23357 and 27319 as well.
Robert Spier noted that in the past seven weeks, all bugs have been
replied to. 1511 open tickets as of December 19.
http://xrl.us/jbw8
The list
http://rt.perl.org/rt3/NoAuth/perl5/Overview.html
In Brief
Sorting within a sort does not set $a, $b was filed as bug #36430 back
in June 2005. Robin (Houston, I suppose, the RT interface isn't clear
on the matter) poked at it in late October. Steve Peters applied a
change to fix it, only to realise that it had already been fixed by
change #25953.
http://xrl.us/jbw9
Jarkko Hietaniemi followed up on his "valgrind" findings from last
month and October with a short snippet that demonstrates ampersandy
leakage:
s/a//e;
print eval '$&';
http://xrl.us/jbxa
A problem using "local" with "threads::shared" (bug 37671) inched
forward a bit closer to resolution when Steve Peters realised that a
more restrictive approach was needed when dealing with magic and
Yitzchak agreed with the idea.
http://xrl.us/jbxb
Jim Cromie refined his work on "arenaroot"s.
http://xrl.us/jbxc
Thanks to last week's summary, Andy Lester realised that he had let a
message from Jim slip, and answered Jim's questions.
http://xrl.us/jbxd
Vadim Konovalov noted that "s{}{goto uncool;uncool:;}e" produces the
error "Can't find label uncool at -e line 1", but that "s{}{do{goto
uncool if 1;uncool:;}}e" (wrapping the inner statement in a "do"
block) works. After a moment of astonished silence, Chip Salzenberg
said that this was perhaps the most evil thing he'd ever seen.
http://xrl.us/jbxe
Rafael discovered that the "INSTALLSCRIPT" doesn't follow changes to
INSTALLDIR and that it's probably a bug. No-one complained about
possible backwards compatibility breakage, but for the time being
things are staying as they are.
http://xrl.us/jbxf
Andy Lester announced the "Sys::Syslog" and "sprintf" fixes on the
Perl foundation blog:
http://xrl.us/jbxg
http://news.perlfoundation.org/2005/12/updated_perl_modules_alleviate.html
Jerry D. Hedden posted bug #37946 concerning "refaddr"s of
"threads::shared" variables. Rather than staying constant, it
flip-flops between two addresses. Nicholas Clark explained why it was
so. Dave Mitchell scuffed his toe, looking for tuits.
http://xrl.us/jbxh
See also his other bug, #37919
http://xrl.us/jbxi
Yves Orton supplied a patch to fix the script embedded in patchlevel.h
in order to make it work on Windows.
http://xrl.us/jbxj
Andrew Savige started tracking down perl memory leaks with "valgrind"
and noticed a leak with constant subs with prototypes and asked for
some tips on how to proceed. Andrew, meet Jarkko. Jarkko, meet Andrew.
http://xrl.us/jbxk
Nicholas looked at how "Perl_yylex" tokenises version strings and
couldn't see how a certain piece of code should be reached. Robin
Houston supplied a one-liner to show how it could, and Nicholas
thanked him for the insight, noting that there now remained only
3000-odd lines of similarly incomprehensible code.
http://xrl.us/jbxm
About this summary
This summary was written by David Landgren. Sorry about the delay,
Santa ate all my tuits. No doubt Adriano will do a much better job
next week. This summary took me far too long to write, and in the
interest of matrimonial harmony, hasn't had the care it deserves
applied, so the quotient of typos, wordos and other oddities will
probably be higher than usual.
Information concerning bugs referenced in this summary (as #nnnnn) may
be viewed at http://rt.perl.org/rt3/Ticket/Display.html?id=nnnnn
Information concerning patches to maint or blead referenced in this
summary (as #nnnnn) may be viewed at
http://public.activestate.com/cgi-bin/perlbrowse?patch=nnnnn
If you want a bookmarklet approach to viewing bugs and change reports,
there are a couple of bookmarklets that you might find useful on my
page of Perl stuff:
http://www.landgren.net/perl/
Weekly summaries are published on http://use.perl.org/ and posted on a
mailing list, (subscription: [EMAIL PROTECTED]). The
archive is at http://dev.perl.org/perl5/list-summaries/. Corrections
and comments are welcome.
If you found this summary useful or enjoyable, please consider
contributing to the Perl Foundation to help support the development of
Perl.