This Week on perl5-porters - 5-11 June 2006
Perl has always been very good about builds. Lets me spend my time
fighting other software issues. -- Alan Olsen
Topics of Interest
Using "#ifdef" inside a macro
Discussion continued on this issue affecting mg.c. The essence of the
problem was a simple issue of the character used for separating paths
(such as "/" on Unix). Andy Dougherty suggested a way to solve the
problem upstream, at configure time. John E. Malmberg, ever the
devil's advocate, pointed out that this would be unlikely to fly on
VMS, since the path separator could either be "|", ":" or perhaps
something else again, depending on from which shell perl was being
run.
http://xrl.us/nf26
Failing 0.9% of the time
Jerry D. Hedden traced a random random thread test failure down to a
problem that was a variant on the birthday party paradox: put twenty
people in a room, and there's a better than even chance that two
people share the same birthday. In a similar vein, the tests would
fail if a random number was generated twice.
The original issue was that all threads of a program would generate
the same sequence of random numbers, which is contrary to many
people's ideas of randomness. This has since been fixed, albeit with
the above side effect in the test suite. Jerry applied a quick fix to
bring the statistical likelihood of bogus failures from 0.9% down to
0.003% by allowing one duplicate to occur.
Paul Johnson thought that accepting two or three duplicates would
ensure that the testing showed that the fix continued to work, while
pushing the threshold of failure down into the noise. Jerry thought of
another approach that would be even better still, but considered that
this current fix was good enough. Rafael Garcia-Suarez applied the
patch in any event.
Heads or tails
http://xrl.us/nf27
UTF-8 testing black smoke
A recent change (#28528 - abolishing "cop_io") has caused lots of
black smoke to issue forth from the smoke boxes. Nicholas Clark
cleaned up a number of problems, but he wasn't sure what the best way
was to correct a couple of the remaining failures.
Sadahiro Tomoyuki suggested one way. The problem is still unsolved. It
boils down to a question that anyone with a bit of Perl knowledge
could solve: how to tell the test harness that two different outputs
can be valid, expected output.
Over to you, dear reader
http://xrl.us/nf28
Compiling "mod_perl" on Suse Linux 10.1
Torsten Foertsch was having great difficulty dealing with the
"#define"s and "#ifdef"s in perl.h and was unable to compile
"mod_perl". The problem was figuring out what macros expanded into,
and whether one could typecast the macros and have it work as
expected. Apparently not, because on some platforms, the macro expands
into C code that looks like "if(1) ...".
http://xrl.us/nf29
"dor" *versus* "err"
Rafael has been using the new defined-or operator in "blead", and
noticed that he kept writing "use feature 'dor'" instead of "use
feature 'err'". After a while he grew tired of this, and added "dor"
as an alias for "err".
It's not dorky
http://xrl.us/nf3a
Action queues as an action item
David Nicol wanted to know whether his people were interested in
pursuing his idea of a queue-based mechanism for dealing with signal
management and possibly other stuff.
Nicholas ruled out any changes to the signal code in the maintenance
branch. He also remarked that there might be a race condition between
one thread receiving a signal, and another thread draining the queue,
since, when a signal is received, there not enough context to
determine which interpreter should handle it. And it would be very
hard to solve this in a portable manner.
David maintained that none of the problems were insurmountable.
http://xrl.us/nf3b
Losing signals
Nicholas took a closer look at the existing signals code, and
identified what he thought was the source of the allegations of lost
signals: a section of code lacking a mutex.
That's me in the spotlight
http://xrl.us/nf3d
Is Perl's "bsearch" bug free?
Dan Kogai wanted to know if Perl was not subject to the binary search
bug found by Joshua Bloch, who works at Google. The problem is that a
binary search works in part by taking the mid-point of a high and low
value, where the values are indices into the table being searched.
This has usually been performed until now by adding the two values
together and dividing by two. There is a problem if the two values
overflow the maximum integer size, which happens when you have about a
billion elements in an array. This is becoming more and more frequent
as machines become deeper and faster.
Andy Dougherty confirmed that perl's Quicksort implementation as it
stands does have this problem. The fix is probably, as Joshua
suggests, to take the difference of the two values, divide that by
two, and add that to the low value. Nonetheless, a lot of thought
should be put into exploring the boundary cases (and and even values,
odd and even array sizes and so on) to make sure nothing breaks.
Jan Dubois reminded people that even on 64-bit perls, the "av_*" API
uses "I32" for indices. But changing that causes a huge ripple through
the code base...
Sooner or later
http://xrl.us/nf3e
Joshua Bloch's blog post
http://xrl.us/ne99
"dlopen()" not found
David Wheeler posted an innocuous call for help to get Perl compiled
on a 64-bit Red Hat platform. It turned out that "Configure" lacked
the necessary smarts to figure out what library paths should be used.
Then followed a rather long discussion about Red Hat "spec" files, the
evils of 32- and 64-bit applications residing on the same box, the
extent to which which Linux distributors backported security patches
to older Perls and so on.
Looking for libs in all the wrong places
http://xrl.us/nf3f
The range operator ("..") *versus* Unicode
Dan Kogai ruefully discovered that ".." doesn't work with Unicode. For
instance, "('a'..'z')" produces all the letters of the (ASCII)
alphabet, but the same cannot be said of "("\N{GREEK CAPITAL LETTER
ALPHA}" .. "\N{GREEK CAPITAL LETTER OMEGA}")".
Even more surprising, though, was the fact that a simple work-around,
by way of a "map {chr}", was all that was needed to make things work.
Yitzchak Scott-Thoennes connected this issue to the definition of the
magic auto-increment operator and the fact that it is not geared to
dealing with this new-fangled Unicode stuff.
$s = 'a';
$s++; # now 'b'
Dan was unhappily willing to accept that one could not create Unicode
ranges, in which case the documentation should be amended to spell
this out clearly. On the other hand, he was even more unhappy by the
fact that in regular expressions and transliterations, Unicode ranges
do work.
Rafael entertained the idea of prolonging the magic to allow:
$s = "\N{omega}";
$s++; # now "\N{alpha}\N{alpha}"
but thought that the results might prove to be entertaining in all
sorts of unforeseen ways.
Sadahiro Tomoyuki reminded people to very careful when dealing with
Greek letters, such as final sigma between rho and sigma, and that
stigma or digamma aren't treated as following epsilon. Or the
opposite, I'm not sure.
It's all greek to me
http://xrl.us/nf3g
Patches of Interest
A better Aho-Corasick, and lots of benchmarks
Yves Orton sent in a new, improved patch for adding the Aho-Corasick
pattern matching algorithm to the regular expression engine. He also
bundled a perl_bench.pl program that he had developed to benchmark
multiple Perl versions.
Yves thought that some of the observed slowdown in 5.9.4 might be due
to Dave Mitchell's work in eliminating recursion from the engine.
Nonetheless, he was optimistic about the situation, saying that there
is a lot of room for improvements, and was targeting being at least as
fast as 5.6.1.
In a followup, he rejigged the code and had the performance running
faster than Python (not that we're competing or anything).
Yves develops on Windows, using MSVC as his main compiler, and as
such, unwittingly used some MSVC-isms that gave "gcc" and other
compilers indigestions. Other porters came to the rescue and
straightened it all out.
/n[iiiiii]ce/
http://xrl.us/nf3h
Andy Lester tossed in some cleanups to regexec.c and regcomp.c. Yves
took them onboard, along with an observation from Nicholas that showed
that the patch disagreed with threads, and produced a new version.
http://xrl.us/nf3i
Yves also tidied up and shipped out a more current version of a
pluggable 5.10 regexp engine for 5.8.1.
Overhaul
http://xrl.us/nf3j
Onwards, "const"ing soldier
Andy Lester delivered a patch for toke.c that added a range of
assorted goodnesses.
http://xrl.us/nf3k
and more accumulated cleanups in files too numerous (well, four) to
mention,
http://xrl.us/nf3m
which prompted Nicholas to make a few remarks about "int"s and pointer
sizes and so forth.
http://xrl.us/nf3n
Moving right along, Andy added some home-grown "const"ing to dump.c,
http://xrl.us/nf3o
and silenced a couple of "signed"/"unsigned" warnings in toke.c and
op.c. Not applied, apparently.
http://xrl.us/nf3p
Andy wrapped up with a tidying of "sv_dup" in sv.c. It looked big and
scary, but Rafael fearlessly applied it anyway.
http://xrl.us/nf3q
Tidying "PL_cshname" and "PL_sh_path"
Jan Dubois tweaked to code to deal with the problem of installing a
Perl distribution on a system that has only "/sbin/sh" and not
"/bin/sh", and refactored the code that deals with "exec" failures,
regardless of the shell ("sh" or "csh") used.
Hide in your shell
http://xrl.us/nf3r
Watching the smoke signals
A number of smokes elicited a certain amount of comment:
Smoke [5.9.4] 28368 FAIL(XF) linux 2.6.15-23-386 [debian] (i686/1 cpu)
http://xrl.us/nf3s
Smoke [5.9.4] 28372 FAIL(F) linux 2.6.15-23-386 [debian] (i686/1 cpu)
http://xrl.us/nf3t
Smoke [5.8.8] 28233 FAIL(F) MSWin32 WinXP/.Net SP2 (x86/2 cpu)
http://xrl.us/nf3u
And time is not what I have on my side to go any further.
New and old bugs from RT
"perlop": mention why "print !!0" doesn't (#33765)
Steve Peters and Dave Mitchell kicked the tyres on this bug, deciding
that what was needed was a paragraph that explained what values
boolean operations return.
A call to quills
http://xrl.us/nf3v
More leaks à la "eval "sub { \$foo = 22 "" (#37231)
Nicholas Clark dared to defy Dave Mitchell with yet another way of
producing an "eval" leak. Dave went medieval on him, and then just as
it was getting exciting, vanished into the big room with the blue
ceiling. While Dave wasn't looking, Nicholas shook out a few more
weird cases when warnings are made fatal.
King leer
http://xrl.us/nf3w
"defined"-ness of substrings disappear over repeated calls (#39247)
Rafael had been close paying attention to what Sadahiro Tomoyuki had
said about bugs and patches in "blead" and "maint" that dealt with
substrings losing their "defined"-ness, for he reverted the change
that a) broke this language feature and b) was no longer necessary
anyway.
Same as it ever was
http://xrl.us/nf3x
Regexp match fails on empty pattern (#39339)
This innocuous bug report attracted a surprising amount of comment, on
the subtlety of "/ /", "//", '' and ' ', and how these all do slightly
different things to "split", on how to deal with warnings,
"usesitecustomize" and more besides. In the end, Ronald Fischer asked
for the report to be closed, and that was the end of that.
http://xrl.us/nf3y
"sort" with custom subname and prototype "($$)" segfaults intermittently
(#39358)
This appeared to be a curious bug on the surface, dealing as it did
with doubly-indirected arrays. H.Merijn Brand noted that this no
longer dumped core in "blead", instead contenting itself to bail out
with a cryptic "Bizarre copy of UNKNOWN in aelem".
Graham Barr was the first to realise that the reason was due to the
fact an in-place sort was being performed, and assigning the results
to another array made it work. Salvador Fandiño then correctly
identified the optimisation that was causing things to go wrong: you
cannot refer to the contents of the array within the comparator during
an in-place sort.
Dave Mitchell saw how to fix it, and said he would do so when he had a
few spare moments.
Out of sorts
http://xrl.us/nf3z
Bug in toke.c (eval in subst) (#39365)
B, Carter discovered that "s//#/e" does slightly naughty things in
5.8.0, and apparently it still behaves the same way in "blead".
http://xrl.us/nf32
Perl5 Bug Summary
7 open, 7 closed, stable at 1488
http://xrl.us/nf33
Pick a winner
http://rt.perl.org/rt3/NoAuth/perl5/Overview.html
New Core Modules
* Test-Harness version 2.62 was released by Andy Lester.
http://xrl.us/nf34
* Version 0.64 of "version" was released by John Peacock.
http://xrl.us/nf35
* Sys-Syslog version 0.15 was release by Sébastien Aperghis-Tramoni.
http://xrl.us/nf36
In Brief
Rafael made "IPC::Open3" call "POSIX::_exit" upon "exec" failures.
(Bug #39252).
http://xrl.us/nf37
The thread concerning Perl "make" failures on HP-UX went on for far
longer than anyone expected. (Bug #39269)
http://xrl.us/nf38
Lee Goddard was having trouble installing "LWP" and was directed to
the appropriate forum. (Bug #39334).
http://xrl.us/nf39
Jarkko Hietaniemi declaimed that in regcomp.c, thou shalt not index
with "char" (since that can be "signed" or "unsigned"). The thread
degenerated into reminiscences of ye olde computers I have known and
loved, before veering into "alt.kinky.bizarre".
http://xrl.us/nf4a
Russ Allbery had not quite finished his quest to deal with quotes,
both single and double, escaped or not, in "C<>" sections and
elsewhere.
The great escape
http://xrl.us/nf4b
Rafael suggested that the question about "Pod::Usage"'s usage of $0
could be side-stepped, choosing efficiency over accuracy.
A sly escape
http://xrl.us/nf4c
Jerry D. Hedden tidied up "threads" a bit more, dealing with HP-UX
10.20's non-standard "pthread_attr_getstacksize" and dealing with the
absence of "threads::shared" during testing.
http://xrl.us/nf4d
Joshua ben Jore wondered if op/stat.t failures on an NFS-mounted
directory could be attributed to slight clock differences between the
local and remote machines. Rafael recalled having encountered similar
issues in the past.
This is the hour
http://xrl.us/nf4e
Steve Peters taught "Configure" that "icc" is not "gcc".
The drama unfolds
http://xrl.us/nf4f
Peter Scott noticed an odd limitation in charnames.pm that caused
named escapes in "eval" to fail. Rafael removed it in "blead".
http://xrl.us/nf4g
Yitzchak Scott-Thoennes fixed "blead" so that exhausting "<>" in
"BEGIN" no longer causes an "ARGVOUT" used only once warning.
http://xrl.us/nf4h
Mohammad Yaseen continued his quest on building stuff on IBM big iron.
He now has a 5.8.7 up and running, but was having trouble getting the
test suite for "OS390-Stdio-0.008" to run cleanly. Robert Zielazinski,
who has years of experience with MVS pointed to a couple of gotchas
that Yaseen should be aware of.
Man Versus System
http://xrl.us/nf4i
He also learnt all about %SVf in perl, what it does, and why you would
want to.
http://xrl.us/nf4j
Anno Siegel delivered a new iteration of support for inside-out
classes. Abigail queried a point of nomenclature.
Making a hash of it
http://xrl.us/nf4k
Nicholas Clark showed how to get "Math-Pari" compiling again under
"blead".
http://xrl.us/nf4m
Ferdinand Bolhar-Nordenkampf offered a number of suggestions for
enhancements, that the porters duly considered.
http://xrl.us/nf4n
Jarkko suggested turning the "PL_perlio_fd_refcnt" into an "HV". Dave
Mitchell thought that things were already too complicated.
http://xrl.us/nf4o
About this summary
This overdue summary was written by David Landgren.
Last week's summary, here:
http://xrl.us/nf4p
was carefully read by Dr. Ruud, who was kind enough to point to a
webbed version of a "c.l.p.m" thread discussing Unicode matters that I
mentioned I was not was not able to access, where, it appears that "It
seems that utf8 extends the core perl parser in some interesting
ways."
Sadahiro Tomoyuki explained what was going on, and Dr. Ruud started to
look through the other "encoding" bugs to see if they were
manifestations of a related cause. Tomoyuki pointed out yet another
bizarre Unicode inconsistency, dealing with Unicode-encoded variable
names.
If you want a bookmarklet approach to viewing bugs and change reports,
there are a couple of bookmarklets that you might find useful on my
page of Perl stuff:
http://www.landgren.net/perl/
Weekly summaries are published on http://use.perl.org/ and posted on a
mailing list, (subscription: [EMAIL PROTECTED]). The
archive is at http://dev.perl.org/perl5/list-summaries/. Corrections
and comments are welcome.
If you found this summary useful, please consider contributing to the
Perl Foundation to help support the development of Perl.
--
"It's overkill of course, but you can never have too much overkill."