from:"Tom Christiansen"

Re: When do named subs bind to their variables? (Re: Questionable scope of state variables ([perl #113930] Lexical subs))

2012-07-07 Thread Tom Christiansen

Father Chrysostomos via RT perlbug-comm...@perl.org wrote
   on Sat, 07 Jul 2012 17:44:46 PDT: 

 I’m forwarding this to the Perl 6 language list, so see if I can find
 an answer there.

I do have an answer from Damian, which I will enclose below, and a 
Rakudo result for you.

 [This conversation is about how lexical subs should be implemented in
 Perl 5.  What Perl 6 does may help in determining how to iron out the
 edge cases.]

[...]

 This question might be more appropriate:  In this example, which @a
 does the bar subroutine see (in Perl 6)?

 sub foo {
 my @a = (1,2,3);
 my sub bar { say @a };
 @a := [4,5,6];
 bar();
 }

The answer to your immediate question is that if you call foo(), 
it prints out 456 under Rakudo.

Following is Damian's answer to my question, shared with permission.

--tom

From:  Damian Conway dam...@conway.org
To:Tom Christiansen tchr...@perl.com
CC:Larry Wall la...@wall.org
Date:  Sun, 08 Jul 2012 07:17:19 +1000
Delivery-Date: Sat, 07 Jul 2012 15:19:09
Subject:   Re: my subs and state vars
In-Reply-To:   22255.1341691089@chthon

X-Spam-Status: No, score=-102.6 required=4.5 
tests=BAYES_00,RCVD_IN_DNSWL_LOW,
   USER_IN_WHITELIST autolearn=ham version=3.3.0

X-Google-Sender-Auth: UHLwfgo2kyvv2prdl6qJm-RfLF8
Content-Type:  text/plain; charset=ISO-8859-1

 It looks like perl5 may be close to having my subs, but a puzzle
 has emerged about how in some circumstances to treat state
 variables  within those.  [I'm pretty sure that perl6 has thought
 this through thoroughly, but [I] am personally unfamiliar with the
 outcome of said contemplations.]

 I bet you aren't, though.  Any ideas or clues?

The right things to do (and what Rakudo actually does) is to treat
lexical subs as lexically scoped *instances* of the specified sub
within the current surrounding block.

That is: a lexical sub is like a my var, in that you get a new one
each time the surrounding block is executed. Rather than like an our
variable, where you get a new lexically scoped alias to the same package
scoped variable.

By that reasoning, state vars inside a my sub must belong to each
instance of the sub, just as state vars inside anonymous subs belong to
each instance of the anonymous sub.

Another way of thinking about what Perl 6 does is that:

my sub foo { whatever() }

is just syntactic sugar for:

my foo := sub { whatever() }

That is: create a lexically scoped Code object and alias it at run-time
to an anonymous subroutine. So the rules for state variables inside
lexical subs *must* be the same as the rules for state variables inside
anonymous subs, since they're actually just two ways of creating the
same thing.

With this approach, in Perl 6 it's easy to specify exactly what you want:

sub recount_from ($n) {

my sub counter {
state $count = $n;   # Each instance of counter has its own 
count
say $count--;
die if $count == 0;
}

while prompt recount $n  {
counter;
}
}

vs:

sub first_count_down_from ($n) {

state $count = $n;   # All instances of counter share a common 
count

my sub counter {
say $count--;
die if $count == 0;
}

while prompt first count $n  {
counter;
}
}

Feel free to forward the above to anyone who might find it useful.

Damian

Re: When do named subs bind to their variables? (Re: Questionable scope of state variables ([perl #113930] Lexical subs))

2012-07-07 Thread Tom Christiansen

Father Chrysostomos via RT perlbug-comm...@perl.org wrote
   on Sat, 07 Jul 2012 18:54:15 PDT: 


Thank you.  So the bar sub seems to be closing over the name @a (the
container/variable slot/pad entry/whatever), rather than the actual
array itself.

Since I don't  have it installed, could you tell me what this does?

All three of those say the same thing:

123
456

--tom

Re: Underscores v Hyphens (Was: [perl6/specs] a7cfe0: [S32] backtraces overhaul)

2011-08-24 Thread Tom Christiansen

Darren Duncan dar...@darrenduncan.net wrote on Wed, 24 Aug 2011 11:18:20 PDT:

 Smylers wrote:
 Could we have underscores and hyphens mean the same thing? That is, Perl
 6 always interprets illo-figut and illo_figut as being the same
 identifier (both for its own identifiers and those minted in programs),
 with programmers able to use either separator on a whim?

 I oppose this.  Underscores and hyphens should remain distinct.

 That would seem to be the most human-friendly approach.

 I disagree.  More human friendly is if it looks different in any way then it 
 is 
 different.  (I am not also saying that same-looking things are equal, given 
 Unicode's redundancy.)

Your mentioning of Unicode is poignant.  In Unicode properties, you are not
supposed to have to worry about these things.For example, from UTS#18:

Note: Because it is recommended that the property syntax be lenient
  as to spaces, casing, hyphens and underbars, any of the
  following should be equivalent: \p{Lu}, \p{lu}, \p{uppercase
  letter}, \p{uppercase letter}, \p{Uppercase_Letter}, and
  \p{uppercaseletter}


Simillarly, since this applies to property names as well as to property
values, these are all the same:

\p{GC  =Lu}
\p{gc  =Lu}
\p{General Category=Lu}
\p{General_Category=Lu}
\p{general_category=Lu}
\p{general-category=Lu}
\p{GENERAL-CATEGORY=Lu}
\p{generalcategory =Lu}
\p{GENERALCATEGORY =Lu}

I'll let you permute the RHS on your own. :)

However, I use the opposite of that sort of loose matching of identifiers
in my own code.  For example, when I make a named character alias, I always
use lowercase so that it looks different from an official one.

use charnames :full, :alias = {
e_acute = LATIN SMALL LETTER E WITH ACUTE,
ae  = LATIN SMALL LETTER AE,
smcap_ae= LATIN LETTER SMALL CAPITAL AE,  # this is a lowercase 
letter
AE  = LATIN CAPTIAL LETTER AE,
oe  = LATIN SMALL LIGATURE OE,
smcap_oe= LATIN LETTER SMALL CAPITAL OE,  # this is a lowercase 
letter
OE  = LATIN CAPITAL LIGATURE OE,
};

I don't make E_ACUTE and eacute also work there.  However, there is a
new :loose that does do that, but I suspect I shan't use it, since I use
both ae and AE differently in existing code.

--tom

UCA and NFC/NFD issues in pattern matching

2011-02-23 Thread Tom Christiansen

I have two points.  First, this excerpt from Synopsis 6:

The :m (or :ignoremark) modifier scopes exactly like :ignorecase except
that it ignores marks (accents and such) instead of case. It is equivalent
to taking each grapheme (in both target and pattern), converting both to
NFD (maximally decomposed) and then comparing the two base characters
(Unicode non-mark characters) while ignoring any trailing mark characters.
The mark characters are ignored only for the purpose of determining the
truth of the assertion; the actual text matched includes all ignored
characters, including any that follow the final base character.

The :mm (or :samemark) variant may be used on a substitution to change the
substituted string to the same mark/accent pattern as the matched string.
Mark info is carried across on a character by character basis. If the right
string is longer than the left one, the remaining characters are
substituted without any modification. (Note that NFD/NFC distinctions are
usually immaterial, since Perl encapsulates that in grapheme mode.) Under
:sigspace the preceding rules are applied word by word.  In perl5, one must
manually run two matches on all data.

First: I notice that ignoring marks (and such) and ignoring case are both
differently strengthed effects of the Unicode Collation Algorithm.  What
about simply allowing folks to specify which of the four (or more, I guess)
levels of UCA equivalence/folding they want?

Second: I'm not altogether reassured by the parenned bit about NFD/NFC
being immaterial.  That's because I've been pretty annoying lately in perl5
with having to manually run *everything* through a double match every time,
and I can't avoid it by prenormalizing.  I'm just hoping that perl6 will
handle this better.

It's usually like this:

NFD($data) =~ $pattern
NFC($data) =~ $pattern

Or if you know your data is NFD:

$data  =~ $pattern
NFC($data) =~ $pattern

Or if you know your data is NFC:

NFD($data) =~ $pattern
$data  =~ $pattern

That's because even if your data in a known state with respect to
normalization, if your pattern admits both NFD and NFC forms, which it
would if read in from a file etc, then you have to run them both.

For example, suppose you read a pattern whose characters are specified
indirectly/symbolically:

$pattern = q\xE9; # LATIN SMALL LETTER E WITH ACUTE

or 

$pattern = qe\x{301}; # e + COMBINING ACUTE ACCENT

It would be ok if those were literal characters, because you
could just NFD the patterns and be done.  But they're not.  So
in order for


$data =~ $pattern

to work properly with both, you really have to do a guaranteed
double-convert/match each time.  This is rather unfortunate, to put it
mildly.  What you really want is a pattern compile flag that imposes
canonical matching, and does this correctly even when faced with named
characters, etc.

My read of S06 suggests that this will not be an issue.  I do wonder
what happens when you want to match just the combining part.  Does
that fail in grapheme mode?  It shouldn't: you *can* have standalones.
But then we're back to partial matches in the middle of things, which
is something that plagues us with full Unicode case-folding.  This is
the 

\N{LATIN SMALL LIGATURE FFI} =~ /(f)(f)/i

problem, amongst others.  Seems that you are going to get into the
same dilemma if you allow matching partial graphemes in grapheme mode.

Hm.

--tom

Perl6 regexes and UTS#18

2011-02-05 Thread Tom Christiansen

Has anybody specifically looked at how Perl6 regexes might map to
the various requirements of UTS#18, Unicode Regular Expressions?

http://unicode.org/reports/tr18/

I ask because to my inexperienced eye, quite a few perl6isms are
*much* better at this than in perl5 obtain, and so I wondered
whether this was by conscious intent and design.  Is/Was it?

I'm also curious whether there are active plans to address the
tr18 requirements in perl6 regexes.  It would be a wonderful
feather in perl6's cap to be able to legitimately claim Level 2
or even Level 3 compliance, since besides perl5, only ICU right
now manages even Level 1, with everybody else *very* far behind.

TR18 specifies three levels of support (Basic, Extended, and Tailored),
with each having specific, reasonably well-defined requirements:

  =Level 1: Basic Unicode Support
   RL1.1Hex Notation
   RL1.2Properties 
   RL1.2a   Compatibility Properties  
   RL1.3Subtraction and Intersection 
   RL1.4Simple Word Boundaries  
   RL1.5Simple Loose Matches   
   RL1.6Line Boundaries   
   RL1.7Supplementary Code Points

  =Level 2: Extended Unicode Support
   RL2.1Canonical Equivalents   
   RL2.2Default Grapheme Clusters  
   RL2.3Default Word Boundaries   
   RL2.4Default Loose Matches
   RL2.5Name Properties 
   RL2.6Wildcard Properties

  =Level 3: Tailored Unicode Support
   RL3.1Tailored Punctuation
   RL3.2Tailored Grapheme Clusters 
   RL3.3Tailored Word Boundaries  
   RL3.4Tailored Loose Matches   
   RL3.5Tailored Ranges 
   RL3.6Context Matching   
   RL3.7Incremental Matches   
 ( RL3.8Unicode Set Sharing )
   RL3.9Possible Match Sets  
   RL3.10   Folded Matching 
   RL3.11   Submatchers

thanks,

--tom

Re: Unicode Categories

2010-11-10 Thread Tom Christiansen

Patrick wrote at 12:15pm CST on Wednesday, 10 November 2010:

 Sorry if this is the wrong forum. I was wondering if there was a way to
 specify unicode
 categorieshttp://www.fileformat.info/info/unicode/category/index.htmin
 a regular expression (and hence a grammar), or if there would be any
 consideration for adding support for that (requiring some kind of special
 syntax).

 Unicode categories are done using assertion syntax with is followed by
 the category name.  Thus isLu (uppercase letter), isNd (decimal digit), 
 isZs (space separator), etc.

 This even works in Rakudo today:

$ ./perl6
 say 'abcdEFG' ~~ / isLu /
E

 They can also be combined, as in +isLu+isLt  (uppercase+titlecase).
 The relevant section of the spec is in Synopsis 5; search for Unicode
 properties are always available with a prefix.
 
 Hope this helps!

Actually, that quote from Synopsis raises more questions than it answers.

Below I've annonated the three output groups with (letters):

% uniprops -a A
U+0041 ‹A› \N{ LATIN CAPITAL LETTER A }:
 (A)\w \pL \p{LC} \p{L_} \p{L} \p{Lu}
 (B)AHex ASCII_Hex_Digit All Any Alnum Alpha Alphabetic ASCII Assigned
Cased Cased_Letter LC Changes_When_Casefolded CWCF
Changes_When_Casemapped CWCM Changes_When_Lowercased CWL
Changes_When_NFKC_Casefolded CWKCF Lu L Gr_Base Grapheme_Base Graph
GrBase Hex XDigit Hex_Digit ID_Continue IDC ID_Start IDS Letter L_
Latin Latn Uppercase_Letter PerlWord PosixAlnum PosixAlpha
PosixGraph PosixPrint PosixUpper Print Upper Uppercase Word
XID_Continue XIDC XID_Start XIDS
 (C)Age:1.1 Block=Basic_Latin Bidi_Class:L Bidi_Class=Left_To_Right
Bidi_Class:Left_To_Right Bc=L Block:ASCII Block:Basic_Latin
Blk=ASCII Canonical_Combining_Class:0
Canonical_Combining_Class=Not_Reordered
Canonical_Combining_Class:Not_Reordered Ccc=NR
Canonical_Combining_Class:NR Decomposition_Type:None Dt=None
East_Asian_Width:Na East_Asian_Width=Narrow East_Asian_Width:Narrow
Ea=Na Grapheme_Cluster_Break:Other GCB=XX Grapheme_Cluster_Break:XX
Grapheme_Cluster_Break=Other Hangul_Syllable_Type:NA
Hangul_Syllable_Type=Not_Applicable
Hangul_Syllable_Type:Not_Applicable Hst=NA
Joining_Group:No_Joining_Group Jg=NoJoiningGroup
Joining_Type:Non_Joining Jt=U Joining_Type:U
Joining_Type=Non_Joining Script=Latin Line_Break:AL
Line_Break=Alphabetic Line_Break:Alphabetic Lb=AL Numeric_Type:None
Nt=None Numeric_Value:NaN Nv=NaN Present_In:1.1 Age=1.1 In=1.1
Present_In:2.0 In=2.0 Present_In:2.1 In=2.1 Present_In:3.0 In=3.0
Present_In:3.1 In=3.1 Present_In:3.2 In=3.2 Present_In:4.0 In=4.0
Present_In:4.1 In=4.1 Present_In:5.0 In=5.0 Present_In:5.1 In=5.1
Present_In:5.2 In=5.2 Script:Latin Sc=Latn Script:Latn
Sentence_Break:UP Sentence_Break=Upper Sentence_Break:Upper SB=UP
Word_Break:ALetter WB=LE Word_Break:LE Word_Break=ALetter

What that means is that the B properties are properties from 
the *General* category.  They may all be referred to as \p{X} 
or \p{IsX}, \p{General_Category=X} or \p{General_Category:X}, 
and \p{GC=X} or \p{GC:X}.

I have a feeling that your synopsis quote is referring only to 
type B properties alone.  It is not talking about type C properties, 
which must also be accounted for.

--tom

Re: Unicode Categories

2010-11-10 Thread Tom Christiansen

Patrick wrote:

:  * Almost. E.g. isL would be nice to have as well.
:
: Those exist also:
:
:  $ ./perl6
:   say 'abCD34' ~~ / isL /
:  a
:   say 'abCD34' ~~ / isN /
:  3
:  

They may exist, but I'm not certain it's a good idea to encourage
the Is_XXX approach on *anything* except Script=XXX properties.  

They certainly don't work on everything, you know.

Also, I can't for the life of me why one would ever write isL when
Letter is so much more obvious; similarly, for isN over Number.  
Just because you can do so, doesn't mean you necessarily should.

http://unicode.org/reports/tr18/#Categories

The recommended names for UCD properties and property values are in
PropertyAliases.txt [Prop] and PropertyValueAliases.txt [PropValue].
There are both abbreviated names and longer, more descriptive names.

It is strongly recommended that both names be recognized, and that
loose matching of property names be used, whereby the case
distinctions, whitespace, hyphens, and underbar are ignored.

Furthermore, be aware that the Number property is *NOT* the same
as the Decimal_Number property.  In perl5, if one wants [0-9], then
one expresses it exactly that way, since that's a lot shorter than
writing (?=\p{ASCII})\p{Nd}, where Nd can also be Decimal_Number.

Again, please that Number is far broader than even Decimal_Number,
which is itself almost certainly broader than you're thinking.

Here's a trio of little programs specifically designed to help scout
out Unicode characters and their properties.  They work best on 5.12+,
but should be ok on 5.10, too.

--tom


unitrio.tar.gz
Description: application/tar

Perl6 and accents

2010-05-17 Thread Tom Christiansen

Exegesis 5 @ http://dev.perl.org/perl6/doc/design/exe/E05.html reads:

  # Perl 6
  /  alpha - [A-Za-z] + /   # All alphabetics except A-Z or a-z
# (i.e. the accented alphabetics)

[Update: Would now need to be +alpha - [A..Za..z] to avoid ambiguity
with Texas quotes, and because we want to reserve whitespace as the first
character inside the angles for other uses.]

Explicit character classes were deliberately made a little less convenient
in Perl 6, because they're generally a bad idea in a Unicode world. For
example, the [A-Za-z] character class in the above examples won't even
match standard alphabetic Latin-1 characters like 'Ã', 'é', 'ø', let alone
alphabetic characters from code-sets such as Cyrillic, Hiragana, Ogham,
Cherokee, or Klingon.

First off, that i.e. the accented alphabetics phrasing is quite incorrect!  
Code like /[^\P{Alpha}A-Za-z]/ matches not just things like

00C1 LATIN CAPITAL LETTER A WITH ACUTE
00C7 LATIN CAPITAL LETTER C WITH CEDILLA
00C8 LATIN CAPITAL LETTER E WITH GRAVE
00E5 LATIN SMALL LETTER A WITH RING ABOVE
00F1 LATIN SMALL LETTER N WITH TILDE

but also of course:

00AA FEMININE ORDINAL INDICATOR
00B5 MICRO SIGN
00BA MASCULINE ORDINAL INDICATOR
00C6 LATIN CAPITAL LETTER AE
00D0 LATIN CAPITAL LETTER ETH
00DE LATIN CAPITAL LETTER THORN
00DF LATIN SMALL LETTER SHARP S
00E6 LATIN SMALL LETTER AE
00F0 LATIN SMALL LETTER ETH
01A6 LATIN LETTER YR
01BA LATIN SMALL LETTER EZH WITH TAIL
01BC LATIN CAPITAL LETTER TONE FIVE
01BF LATIN LETTER WYNN
02C7 CARON
0391 GREEK CAPITAL LETTER ALPHA
0410 CYRILLIC CAPITAL LETTER A

and many, many more.

I'm also disappointed to see perl6 spreading the notion that accent
is somehow a valid synonym for 

diacritical marking 
diacritic marking 
diacritic mark
diacritic 
mark

It's not.  Accent is not a synonym for any of those.  Not all marks are
accents, and not all accents are marks.

I believe what is meant by accent is NFD($char) =~ /\pM/.  Fine: then
say with diacritics, not with accents.

Also, there are many combining characters that aren't accents by any
stretch of term, such as 20E3 COMBINING ENCLOSING KEYCAP, to name just one.
Only three code points have official names that include ACCENT, and even
these are dubious.

Finally, I note also that people use the Alpha property too loosely.  Note
the caron and such above.  One probably wants the LC property instead.

--tom

use charnames ();
use Unicode::Normalize;
for $cp ( 1 .. 0x ) {
$orig  = chr($cp);
$canon  = NFD($orig);  # NFKD gives diff results
## if ($orig =~ /[^\P{Alpha}A-Za-z]/) {
if ($orig =~ /\p{LC}/  $canon !~ /^[A-Za-z]/) {
printf(%c %04X %s\n, $cp, $cp, charnames::viacode($cp));
}
}

Re: Amazing Perl 6

2009-05-29 Thread Tom Christiansen

· Quoth Larry:

˸ So let’s not make the mistake of thinking something
˸ longer is always less confusing or more official.

⋮ I already have too much problem with people thinking the
⋮ efficiency of a perl construct is related to its length.

So you’re saying the Law of Parsimony has its uses… a̲n̲d̲ abuses? ☻

--tom

-- 
ENTIA 
  NON · SVNT 
   M V L T I P L I C A N D A 
PRÆTER 
 N̳E̳C̳E̳S̳S̳I̳T̳A̳T̳E̳M̳

Re: Files, Directories, Resources, Operating Systems

2008-11-27 Thread Tom Christiansen

In-Reply-To: Message from Mark Overmeer [EMAIL PROTECTED] 
   of Thu, 27 Nov 2008 08:23:50 +0100. [EMAIL PROTECTED] 

* Tom Christiansen ([EMAIL PROTECTED]) [081126 23:55]:

 On Wed, 26 Nov 2008 11:18:01 PST.--or, for backwards compatibility,
 at 7:18:01 p.m. hora Romae on a.d. VI Kal. Dec. MMDCCLXI AUC,
 Larry Wall [EMAIL PROTECTED] wrote:

 SUMMARY: I've been looking into this sort of thing lately (see p5p),
  and there may not even *be* **a** right answer.  The reasons
  why take us into an area we've traditionally avoided.

 What a long message...

It *was*?  That was approaching a medium in my epistolary (and RFC) world,
the one unrelated to PostIt notes.  I can therefore see you've never been
FMTEYEWTK'd, and thus also to all outward appearances, we've not made each
other's acquaintance.  I'm tchrist; pleased to meet you.

Read the //www.unicode.org/reports/tr10/ treatise, as I have repeatedly 
done, and you will quickly reassess your length calls.  This is not
necessarily a good thing.  Neal Stephenson can do the same, and of
far lesser utility.

--tom

Re: Files, Directories, Resources, Operating Systems

2008-11-27 Thread Tom Christiansen

In-Reply-To: Message from Darren Duncan [EMAIL PROTECTED] 
   of Wed, 26 Nov 2008 19:34:09 PST. [EMAIL PROTECTED] 

 Tom Christiansen wrote:

  I believe database folks have been doing the same with character data, but
  I'm not up-to-date on the DB world, so maybe we have some metainfo about
  the locale to draw on there.  Tim?

 AFAIK, modern databases are all strongly typed at least to the point
 that the values you store in and fetch from them are each explicitly
 character data or binary data or numbers or what-have-you; and so,
 when you are dealing with a DBMS in terms of character data, it is
 explicitly specified somewhere (either locally for the data or
 globally/hardcoded for the DBMS) that each value of character data
 belongs to a particular character repertoire and text encoding, and so
 the DBMS knows what encoding etc the character data is in, or at least
 it treats it consistently based on what the user said it was when it
 input the data.

Oh, good then.  That's what I'd heard was happening, but wasn't sure since
I've steared clear of such beasties since before it was true.

I wish our filesystems worked that way.  But Andrew said something to me
last week about Ken and Dennis writing quite pointedly that while you
*could* use the f/s as a database, that you *shouldn't*.  I didn't know
the reference he was thinking of, so just nodded pensively (=thoughtfully).

  There is ABSOLUTELY NO WAY I've found to tell whether these utf-8
  string should test equal, and when, nor how to order them, without
  knowing the locale:
  
  RESUME,
  Resume
  resume
  Resum\x{e9}
  r\x{E9}sum\x{E9}
  r\x{E9}sume\x{301}
  Re\x{301}sume\x{301}

  Case insensitively, in Spanish they should be identical in all regards.
  In French, they should be identical but for ties, in which case you
  work your way right to left on the diactricals.

 This leads me to talk about my main point about sensitivity etc.

 I believe that the most important issues here, those having to do with
 identity, can be discussed and solved without unduly worrying about
 matters of collation;

It's funny you should say that, as I could nearly swear that I just showed
that identify cannot be determmined in the examples above without knowing
about locales.  To wit, while all of those sort somewhat differently, even
case-insensitively, no matter whether you're thinking of a French or a
Spanish ordering (and what is English's, anyway?), you have a a more
fundadmental = vs != scenario which is entirely locale-dependent.

If I can make a RESUME file, ought I be able to make a distcint
r\x{E9}sum\x{E9} or re\x{301}sume\x{301} file in a case-ignorant
filesystem? There is no good answer, because we might think it
reasonable to

lc(strip_marks($old_fn)) eq lc(strip_marks($new_fn))

Theee problem of what is or is not a mark varies by locale,

*  Castilian doesn't think ~ is a mark; Portuguese does, and 
   so if you strip marks, you in Castilian count as the same
   two letters that it deems disinct, but in Portuguese, you
   incur no lasting harm.

*  Catalan doesn't think ¸ is a mark; French does. and so if you strip
   marks, you in Catalan count as the same two letters that it deems
   disinct, but in French or Portuguese, you incur no lasting harm.

*  Modern English (usually) decomposes æ into a+e, but OE/AS and
   Icelandic do not.

*  Moreover, Icelandic deems é and e to be completely
   different letters altogether.  If you strip marks, you 
   count as the same letters that that language does not.
   Similarly with ö, which is at the end of their alphabet,
   (like ø in some), and nowhere near o or ó.  BTW, those
   are three separate letters, not variants.

*  And in OE/AS you could have a long mark on an asc (say ash for the
   atomic *letter* æ).  If split into a and e and stripped of marks, it
   woudn't make any sense at all.

Case in point: Ælene Frisch, whom many of you doubtless know, insists her
name be spelt as I have written it.  She does not want Aelene Frish, for
she considers her forename to have 5 letters in it, not 6.  But Unicode
doesn't give us a title case version of that (did AS?), suggesting it a
ligature not a digraph.  

But if we have a file called ÆLENE, may be assume it the same in a case-
insensitive sense to both aelene and  ælene?

I can only go on code-points, because I don't want to deal with ß and SS
and Ss.  Case-folding file systems are just begging for trouble, and I just
don't know what to do.  Think of the 3 Greek sigmata.

 identity is a lot more important than collation, as well as a
 precondition for collation, and collation is a lot more difficult and can
 be put off.

I agree everything with everthing save and can be put off.  I would like
you to be right.  I should truly wish to be mistaken.  And I don't know
what we have for prior (cough) art.

 respect to dealing with a file system, generally

Re: Smooth numeric upgrades?

2008-10-20 Thread Tom Christiansen

On Mon, 06 Oct 2008 at wee small hour of 02:20:22 EDT 
you, Michael G Schwern [EMAIL PROTECTED], wrote:

 Darren Duncan wrote:

 [2] Num should have an optional limit on the number of
 decimal places it remembers, like NUMERIC in SQL, but
 that's a simple truncation.

 I disagree.

 Any numeric operations that would return an irrational number
 in the general case, such as sqrt() and sin(), and the user
 desires the result to be truncated to an exact rational number
 rather than as a symbolic number, then those operators should
 have an extra argument that specifies rounding, eg to an exact
 multiple of 1/1000.

 That seems like scattering a lot of redundant extra arguments
 around.  The nice thing about doing it as part of the type is
 you just specify it once.

 But instead of truncating data in the type, maybe what I want
 is to leave the full accuracy inside and instead override
 string/numification to display only 2 decimal places.

This is currently something of an annoyance with Math::Complex.
It needs a way of specify epsilon.

If you ask for both sqrt()s of 4, you get

(2, -2+2.44929359829471e-16i)

in Cartesian but in Polar:

( [2,0], [2,pi] )

Is the problem that it's working in Polar and the conversion to
Cartesian is off by a wee bit?  I would really like to get
Cartesian answers of (2, -2), not that -2e-16i silliness.

If you ask for both roots of -4, you get

Cartesian:
( 1.22464679914735e-16+2i, -3.67394039744206e-16-2i )
Polar:
( [2,pi/2], [2,-1pi/2] );

But I'd like a Cartesian return of (2i, -2i).  
And a Polar return of ([2,pi/2],[2,-pi/2]).

It's worse still with the 10 roots of 2**10:

The 10 roots of 1024 are:
CRTSN:  1: 2
POLAR:  1: [2,0]
CRTSN:  2: 1.61803398874989+1.17557050458495i
POLAR:  2: [2,pi/5]
CRTSN:  3: 0.618033988749895+1.90211303259031i
POLAR:  3: [2,2pi/5]
CRTSN:  4: -0.618033988749895+1.90211303259031i
POLAR:  4: [2,3pi/5]
CRTSN:  5: -1.61803398874989+1.17557050458495i
POLAR:  5: [2,4pi/5]
CRTSN:  6: -2+2.44929359829471e-16i
POLAR:  6: [2,pi]
CRTSN:  7: -1.61803398874989-1.17557050458495i
POLAR:  7: [2,-4pi/5]
CRTSN:  8: -0.618033988749895-1.90211303259031i
POLAR:  8: [2,-3pi/5]
CRTSN:  9: 0.618033988749894-1.90211303259031i
POLAR:  9: [2,-2pi/5]
CRTSN: 10: 1.61803398874989-1.17557050458495i
POLAR: 10: [2,-1pi/5]

The 10 roots of -1024 are:
CRTSN:  1: 1.90211303259031+0.618033988749895i
POLAR:  1: [2,0.314159265358979]
CRTSN:  2: 1.17557050458495+1.61803398874989i
POLAR:  2: [2,0.942477796076938]
CRTSN:  3: 1.22464679914735e-16+2i
POLAR:  3: [2,pi/2]
CRTSN:  4: -1.17557050458495+1.61803398874989i
POLAR:  4: [2,2.19911485751286]
CRTSN:  5: -1.90211303259031+0.618033988749895i
POLAR:  5: [2,2.82743338823081]
CRTSN:  6: -1.90211303259031-0.618033988749895i
POLAR:  6: [2,-2.82743338823081]
CRTSN:  7: -1.17557050458495-1.61803398874989i
POLAR:  7: [2,-2.19911485751286]
CRTSN:  8: -3.67394039744206e-16-2i
POLAR:  8: [2,-1pi/2]
CRTSN:  9: 1.17557050458495-1.6180339887499i
POLAR:  9: [2,-0.942477796076938]
CRTSN: 10: 1.90211303259031-0.618033988749895i
POLAR: 10: [2,-0.31415926535898]

 Note, a generic numeric rounding operator would also take the
 exact multiple of argument rather than a number of digits
 argument, except when that operator is simply rounding to an
 integer, in which case no such argument is applicable.

 Note, for extra determinism and flexibility, any operation
 rounding/truncating to a rational would also take an optional
 argument specifying the rounding method, eg so users can
 choose between the likes of half-up, to-even, to-zero, etc.
 Then Perl can easily copy any semantics a user desires,
 including when code is ported from other languages and wants
 to maintain exact semantics.

 Yes, this is very important for currency operations.

 Now, as I see it, if Num has any purpose apart from Rat,
 it would be like a whatever numeric type or effectively a
 union of the Int|Rat|that-symbolic-number-type|etc types, for
 people that just want to accept numbers from somewhere and
 don't care about the exact semantics.  The actual underlying
 type used in any given situation would determine the exact
 semantics.  So Int and Rat would be exact and unlimited
 precision, and maybe Symbolic or IRat or something would be
 the symbolic number type, also with exact precision
 components.

 That sounds right.  It's the whatever can conceivably be
 called a number type.

I think you might be surprised by what some people conceive 
of by numbers. :-(

--tom

#!/usr/bin/perl

use strict;
use warnings;

use Math::Complex;

my $STYLE = NORMAL;
# my $STYLE = HACKED;

unless (@ARGV) {
die usage: $0 number rootcount\n;
} 

my ($number, $rootcount) = @ARGV;

$number = cplx($number);  

die $0: $number poor for rooting\n if !$number;
die $0: $rootcount should be

Re: Smooth numeric upgrades?

2008-10-05 Thread Tom Christiansen

In-Reply-To: Message from Nicholas Clark [EMAIL PROTECTED] 
   of Sun, 05 Oct 2008 22:13:14 BST. [EMAIL PROTECTED] 

 Studiously ignoring that request to nail down promotion and demotion, I'm
 going to jump straight to implementation, and ask:

 If one has floating point in the mix [and however much one uses rationals,
 and has the parser store all decimal string constants as rationals, floating
 point enters the mix as soon as someone wants to use transcendental functions
 such as sin(), exp() or sqrt()], I can't see how any implementation that wants
 to preserve infinite precision for as long as possible is going to work,
 apart from
 
storing every value as a thunk that holds the sequence of operations that
were used to compute the value, and defer calculation for as long as
possible. (And possibly as a sop to efficiency, cache the floating point
outcome of evaluating the thunk if it gets called)

Nicholas Clark

My dear Nicholas, 

You mentioned sin(), exp(), and sqrt() as being transcendental functions,
but this is not true!  Perhaps you meant something more in the way of their
being--um, irrational.  but far be it from me to risk using so loaded a
word in reference to anyone's hypothetical intentions or rationale! :-)

While all transcendentals are indeed irrationals, the opposite relationship
does *not* apply.  It's all really rather simple, provided you look at it 
as a brief, binary decision-tree.

= All reals are also one of either rational or irrational:

 +  Rational numbers are those expressible as the RATIO of I/J, 
where I is any integer and J any non-zero integer. 

 -  Irrationals are all other reals *EXCEPT* the rationals.

= All irrationals are also one of either algebraic or transcendental:

 +  Algebraic numbers are solutions to polynomial equations of a single 
variable and integer coefficients.  When you solve for x in the 
polynomial equation 3*x**2 - 15 == 0, you get an algebraic number.

 -  Transcendentals are all other irrationals *EXCEPT* the algebraics.

Thinking of the sine function and its inverse, I notice that
sin(pi/2) == 1 and asin(1) is pi/2.  Pi is *the* most famous 
of transcendental numbers, and sin() is a transcendental function.

Thinking of the exponential function and its inverse, I notice that
exp(1) == e and log(e) == 1.  And e, Euler's number, is likely the 
#2 most famous transcendental, and exp() is a transcendental function.

However, we come now to a problem.  

If you solved the simple equation I presented above as one whose solution
was by definition *not* a transcendental but rather an algebraic number,
you may have noticed that solution is 5**(1/2), better known as sqrt(5).
So that makes sqrt(5) an algebraic number, and sqrt() is an algebraic
function, which means therefore that it is *not* a transcendental one.

Q.E.D. :-)

Ok, I was teasing a little.  But I'd now like to politely and sincerely
inquire into your assertion that floating point need inevitably enter the
picture just to determine sin(x), exp(x), or sqrt(x).

Your last one, sqrt(), isn't hard at all.  Though I no longer recall the
algorithm, there exists one for solving square roots by hand that is only a
little more complicated than that of solving long division by hand.  Like
division, it is an iterative process, somewhat tedious but quite well-defined
easily implemented even on pen and paper.  Perhaps that has something to do
with sqrt() being an algebraic function. :-) j/k

As for the two transcendental functions, this does ask for more work.  But
it's not as though we do not understand them, nor how to derive them at
need from first principles!  They aren't magic black-ball functions with
secret look-up tables that when poked with a given input, return some
arbitrary answer.

We *know* how to *do* these!  

Sure, many and probably most solutions, at least for the transendentals, do
involve power series, and usually Taylor Series.  But this only means that
you get to joyfully sum up an infinite sequence of figures receding into
infinity (And beyond! quoth Buzz), but where each figure in said series 
tends to be a reasonably simple and straightforward computation.

For example, each term in the Taylor Series for exp(x) is simply x**N / N!,
and the final answer the sum of all suchterms for N going from 0 to infinity.  
Its series is therefore 

 x**0 / 0! # er: that's just 1, of course :-)
  +  x**1 / 1! 
  +  x**2 / 2! 
  +  x**3 / 3! 
  +  x**4 / 4! 
  + 
  + + + + + + + + + + ad infinitum.

For sin(x), it's a bit harder, but not much: the series is a convergent one
of alternating sign, running N from 0 to infinity and producing a series 
that looks like this:

 (x**1 / 1!)# er: that's just x, of course :-)
   - (x**3 / 3!) 
   + (x**5 / 5!) 
   - (x**7 / 7!) 
   + (x**9 / 9!) 
   -
   + - + - + - + - + ad infinitum.

Each term in the sin(x) series is still a comparitively easy one, 
reading much better on paper than on the computer with

Re: Smooth numeric upgrades?

2008-10-04 Thread Tom Christiansen

In-Reply-To: Message from Michael G Schwern [EMAIL PROTECTED]
   of Sat, 04 Oct 2008 02:06:18 EDT. [EMAIL PROTECTED]

 Larry Wall wrote:
 The status of numeric upgrades in Perl 6 is fine.  It's rakudo that
 doesn't do so well.  :)

 As another datapoint:

 $  pugs -e 'say 2**40'
 1099511627776
 $ pugs -e 'say 2**50'
 1125899906842624
 $ pugs -e 'say 2**1100'
 1358298529049385849277351428359266778603493846931744549748519669727813SNIP

 That's good [1] to hear, thanks.

 I don't think of Int as a type that automatically upgrades.  I think
 of it as an arbitrarily large integer that the implementation can in
 some cases choose to optimize to a smaller or faster representation,

 Oh don't worry, I do.  I just got so flustered when I saw Rakudo do the
 same thing that Perl 5 does I was worried this got lost somewhere along
 the line.

 [1] We need a polite way to say less bad.

!
ah
fab
yay
good
cool
ayup
d'oh!
helps
tasty
yummy
smooth
cheers
better
yippee!
soothes
pleases
niftier
relieves
mediates
inspires
mollifies
mitigates
clarifies
my mistake
oh, right!
great!
delightful
not to worry
oh be joyful!
'tain't so bad
calms my qualms
less sub-optimal
cheers my spirit
soothes my nerves
dispells my doubts
heartens my resolve
cushions the cudgel
dismisses my dismay
drives out the dread
comforts me to learn
inspires me with hope
restores my confidence
gladdens my good humor
brightens my rainy day
alleviates my concerns
mollifies my misgivings
alleviates my confusion
puts down the false alarm
perks/plucks up my courage
pacifies my preoccupations
banishes my paranoia-demons
felicitates my facilitation
facilitates my felicitation
assuages my misapprehensions
shows I was worrying too much
encourages me; is encouraging
warms the cockles of my heart
offers hope for a better world
trounces my tetchy trepidations
dispells my misplaced anxieties
sure puts a spiffier shine on it 
makes molehills out of mountains
eases up on my nerves a fair bit
not nearly so gnarly as I'd feared
'tis not too late to seek a newer world
way better than I'd half-begun to suspect
patches the potholes in my crumbling wetware
softens the imagined blow that wasn't even there to start with
serenades such sweet sonnets as to nullify nervous nellies' natterings

Re: Allowing '-' in identifiers: what's the motivation?

2008-08-11 Thread Tom Christiansen

I'm still somewhat ambivalent about this, myself.  My previous
experience with hyphens in identifiers is chiefly in languages that
don't generally have algebraic expressions, e.g. LISP, XML, so it will
take some getting used to in Perl.  But at least in Perl's case the
subtraction conflict is mitigated by the fact that many subtraction
expressions will involve sigils;  $x-$y can't possibly be a single
identifier.

People use nonadic functions (nonary operators? where non = 0, not 9)
without parens, and get themselves into trouble for it.

% perl -E 'say time-time'
0

% perl -E 'say time-do{sleep 3; time}'
-3

% perl -E 'say time +5'
1218475824

% perl -E 'say time -5'
1218475817

% perl -E 'say time(-5)'
syntax error at -e line 1, near (-
Execution of -e aborted due to compilation errors.
Exit 19

--tom

Re: Exegesis 7: Fill Justification

2004-03-01 Thread Tom Christiansen

On Tue, Mar 02, 2004 at 10:01:11AM +1100, Damian Conway wrote:
: That's a *very* interesting idea. What do people think?

I think anyone who does full justification without proportional
spacing and hyphenation is severely lacking in empathy for the reader.
Ragged right is much easier on the eyes--speaking as someone who had
their seventh eye operation today.

At least aesthetically, yes, it sure does look better ragged.  I do
wonder why that is, though.  Could it be that the unevenness of the
inserted fixed-width spacing looks rough?  Or is maybe because with
long lines, one's eye might get lost, being slower to tell one line
from the next?  That's certainly a reason for have shorter columns.

In a message of mine to p5p of 4-Nov-2003 [EMAIL PROTECTED],
I showed (but did not mention) how this sort of can be done without
inserting any spurious spaces whatsoever, even in a long paragraph:

 Well, no.  Mark answered so quickly after I did, and covered so much of it
 so succinctly, that I backed off again.  It seems to me that he and I have
 both for a long time yearned for a perliotut; I don't believe either of us
 has ever fleshed out more than an outline, though.  IO is a subject that's
 not always easy to figure out how to get the best handle on (ENOPUN).  For
 one thing, it's steeped in Unix lore and tradition, and it requires either
 knowing or else teaching quite a bit of C programming that would otherwise
 be completely irrelevant to Perl.  For example, when you see someone lseek
 zero bytes from the current position in Perl, you know they're remembering
 the ANSI C requirement of a seek falling between switching from reading to
 writing or vice versa.  As always, you're subject to all the silly bugs in
 your libc runtime system and in your kernel; for example, we tried to have
 all buffers flushed before a fork() to avoid duplicate output in the child
 by calling fflush(0) from C, the intent being to flush data still there in
 stdio buffers.  Unfortunately, on some platforms, you'll accidentally toss
 not just pending output, but also pending input.  Thus the case where read
 on STDIN was called with 2 against asdf\n, you'd still have the df\n yet
 to read get completely trounced.  This is incorrect behaviour, at least as
 far as the goal of flushing pending output buffers before forking.  Sadly,
 there really are a zillion little things like this, and these are just the
 exceptions, not the core functionality that you'd like to teach people for
 learning IO.  Blocking and buffering are tricky; did you remember that the
 output commands can also block?  Think about sending something down a pipe
 where the reader on the other end is slow or busy.  That's why with select
 you also have a slot for output handles you want to know whether are ready
 for IO.  It just goes on and on.  It would be easier to hand out copies of
 Stevens than to write perliotut, but that's too embarrassing and annoying.

However, I fear this isn't really readily automated; sorry to interrupt. :-)

--tom

PS: Ok, maybe one *could* do it, but that would still require a whole lot of
PhD-ish NLP work, and surely Damian's too engaged now for the diversion.

Re: == vs. eq

2003-04-05 Thread Tom Christiansen

When you write:

(1..Inf) equal (0..Inf)

I'd like Perl to consider that false rather than having a blank look
on its face for a long time.  

The price of that consideration would be to give the Mathematicians
blank looks on *their* faces for a very long time instead.  Certainly,
they'll be quick to tell you there are just as many whole numbers
as naturals.  So they won't know what you mean by equal up there.

The Engineers might also be perplexed if you make that false, but
for rather different reasons.  I think that you will have to define
your equal operator in a way contrary to their respective
expectations, because both have ways of thinking that could quite
reasonably lead them to expect the answer to the expression above
to come out to be false, not true.

Practically speaking, I'm not sure how--even whether--you *could*
define it.  One is tempted to attempt something like saying that
operator equal is true of a *lazy*, *infinite* list if and only
if elements 0..N of both lists were each themselves equal, and
then only blessedly finite values of N.

But where then do you stop?  If N is 1, then (1..Inf) and (1..Inf:2)
are ok.  If N is 2, meaning to check both the first pair from each
list and also the second one, they aren't.  However, if N is true,
(1..Inf) and (1, 2..Inf:2) are certainly ok.

In fact, that definition seems trivial to break.  Given a known N
steps of comparison, the lazy lists (1..Inf) and (1..N-1, (N..Inf:2))
would both test equal in the first N positions and differ in position
N+1.  Therefore, we can always break any operator that tests the
first N positions of both lazy lists, and thus that definition would
be wrong.

The reason Mr Engineer might expect false would be if they thought you were
eventually testing against Inf.  Due to his experience in numerical
programming, he sees NaN and Inf having certain behaviors that no pure
mathematician would even countenance.  On a system whose system nummifier
knows things like Inf and NaN already, you see this happening even today.
Witness real Perl:

% perl -e 'printf %g\n, NaN'
nan
% perl -e 'printf %g\n,  1 + NaN'
nan
% perl -e 'printf %g\n, 42 * NaN'
nan
% perl -e 'printf %g\n, Nan == NaN'
0
% perl -e 'printf %g\n, 1+Nan == NaN'
0
% perl -le 'printf %g\n, Inf'
inf
% perl -le 'printf %g\n, 1+Inf'
inf
% perl -le 'printf %g\n, 2+Inf'
inf
% perl -e 'printf %g\n, Inf == Inf'
1
% perl -e 'printf %g\n, Inf == -Inf'
0
% perl -e 'printf %g\n, 1+Inf == Inf'
1
% perl -e 'printf %g\n, Inf + Inf' 
inf
% perl -e 'printf %g\n, Inf * Inf'
inf
% perl -e 'printf %g\n, Inf / Inf'
nan

=begin ASIDE

Yes, it's platform dependent what you'll get:

mach1%  perl -le 'printf On $^O, NaN == NaN is %g\n, Nan == NaN'
On openbsd, NaN == NaN is 1

mach2% perl -le 'printf On $^O, NaN == NaN is %g\n, Nan == NaN'
On linux, NaN == NaN is 0

I believe that's because the libc on openbsd doesn't nummify string NaN
to any special IEEE float, whereas the redhate one did.

I am truly hoping that on Perl6, comparing apples with greed will mean
that you're testing NaN with itself, that testing NaN and *anything*
including another Nan with == will get you into trouble no matter what your
platform, and that that trouble will be the same irrespective of platform.

=end ASIDE

In other words, if you treat Inf as any particular number (which Mr
Mathematician stridently yet somewhat ineffectually reminds you that are
*not* allowed to do!), then you may get peculiar results.  Heck, you could
probably then get Mr Engineer to agree that the lazy lists (1..Inf) and
(0..Inf) are the same in the *last* N positions for all values of N, and
since you could just select N to be equal (ahem) to the length (ahem**Inf)
of your list, they must be equal. :-)

Mr Mathematician, purist that he is, has of course long ago thrown up his
hands in disgust, contempt, or both, and stormed out of the room.  To him,
most of those Perl examples given above were utter nonsense: how can you
say 1+Inf?  It bothers him to talk about Inf+1, and 1..Inf will be
problematic, too, since to say 1..Inf is also to say there must exist some
number whose successor is Inf.  And of course, there isn't.  Which is why
Inf is not a valid operand for numerical questions in Mr Mathematician's
platonically purist world of ideas.  But practical Mr Engineer has defined
his own Inf in which you can do limited otherwise apparently numerical
operations, because it was *practical* for him to do so.  He had work to
do, and needed some new rules.

While Mr Mathematician won't put up with comparing numbers and infinities,
he's quite comfortable with comparing *infinities* themselves.  He's
comfortable with infinite sets, and he's comfortable with infinite series,
too, which is what these lists seem to be.  I'm not sure that his
experience with infinite series will help us much here, because you see, 
those

Re: == vs. eq

2003-04-05 Thread Tom Christiansen

You can define is very easily:  two lists are equal if the ith element of 
one list is equal to the ith element of the other list, for all valid 
indices i.

The problem is that you've slipped subtly from a well-known creature, like
1..10, a finite set of ten distinct integers, to a quite a different sort
of beast entirely, 1..Inf, which while notationally similar to the first,
does not share some very fundamental properties.  For example, it no longer
has an integral membership count, that is, a length.  This is problematic
if one is not quite careful.

As for whether you can *evaluate* this test in bounded time, that depends. 
Computers are incapable of storing truly infinite lists, so the lists will 
have finite internal representations which you can compare.

Is it possible that finite internal representations will differ in
internal representation yet produce identical series?  It seems to me that
some meta-analsys would be required if this is possible.  If it is not
possible, then that means that every distinct series has a distinct
internal representation.  Certainly this is not true lexically: I can
use many variant lexical representations to produce the same infinite
series.  For example:

(1 .. X, X+1 .. Inf)
(1 .. Y, Y+1 .. Inf)

Those define identical list, for any natural numbers X and Y, even as 
compile-time constants.  However, save for special case of X==Y, I do
not expect their internal representation to be the same.  

As for two dynamically generated infinite lists (which you can't easily 
compare, for example if they're based on external input)... it will either 
return false in finite time, or spend infinite time on determining they're 
indeed equal.

I suppose you could classify

(1 .. X, X+1 .. Inf)
(1 .. Y, Y+1 .. Inf)

as dynamically generated infinite lists, but again, given constants X and
Y, they really needn't be.  Even a run-time thing like the list (1 .. $X)
shouldn't actually need to spend infinite time on determining its
equivalence to (1 .. $Y).  However, there are more interesting
possibilities: generic iterator functions that you repeatedly call and
which produce successors that aren't generally recognizable.  Remember the
old flipflop, as in

if ( 1 .. /^$/ )  {  }
if ( /foo/ .. /bar/ ) {  }
if ( f() .. g() ) {  } 

You could, a think, have an infinite list that was really some fancy
interface to a dynamic interator of some sort.  I know it's interesting,
but whether this would be sufficiently useful to justify its complexity is
rather less obvious.  But if you did have such a list, where stepping 
down it implicitly called some sort of -ITER method or whatnot, then 
for those I could see the intractability of finite evaluation, since it's
perfectly conceivable that it wouldn't terminate.  Another pitfall is
non-reproduceability; think about readline() as an iterator on a stream 
object whose underlying file descriptor is not seekable.  But I'm not sure
that the any of the sorts of lists we've been talking about have to have
that problem.  But I don't know whether we can be clever enough to step
around infinite evaluation through some sort of higher-level analysis.

A clever compiler could move things around.  Maybe it could change 

for ($i = 1; $i = 10_000; $i++)  

into

for $i ( 1 .. 10_000 ) 

and then perhaps take advantage of that construct's lazy evaluation.
Given general purpose lazy evaluation, you could start doing things 
like thinking of 

for ($i = 1; $i = fn(); $i++)  
for $i ( 1 .. fn() ) 

and making instead a list or array whose members are 

( 1 .. fn() )

However, do you evaluate fn() only once or repeatedly?

Hm.

If it were repeatedly, then I do see what you mean by dynamically
generated infinite lists.

In other words, if you treat Inf as any particular number (which Mr
Mathematician stridently yet somewhat ineffectually reminds you that are
*not* allowed to do!), then you may get peculiar results.

There is no problem with doing that, as long as you define what you want 
it to do.

Well, sure, you could let Inf = MAXINT + 1 for example, and then define
things as you want them to act, but that doesn't mean that this resulting
Inf is either what people think of as a Number nor what they think of as
Infinity.  See below on IEEE, which found it very useful to something of
the like.

Remember, most of mathematics is just an invention of humans :)

I believe we are indeed trying to define what we want it to do, no?

So sure, you can create a new infinite set by conjoining some new elements
to an existing one.  That's what all the numberic sets are, pretty much.
Do be careful that the result has consistent properties, though.

(crap about testing first/last N elements)

testing the first/last N elements is not the same as testing the whole list

for all N  :)

Mr Mathematician, purist that he is, has of course long ago thrown up his
hands in disgust, contempt, or both, and stormed out of the room

Re: == vs. eq

2003-04-05 Thread Tom Christiansen

Unless I'm very wrong, there are more whole numbers than natural 
numbers. An induction should prove that there are twice as many.

We're probably having a language and/or terminology collision.  By natural
numbers, I mean the positive integers.  By whole numbers, I mean the
natural numbers plus the number zero.   Since both sets have infinite
members, each has just as many members as the other has.  It just *looks*
like the whole numbers have one more.  But they don't, you know, because 
Inf+1 == Inf, as IEEE shows us in their seminal treatise on How to Lie
With Computers under IEEE Floating Point.

It's not really relevant to figuring out how to evaluate equality testing
on unbounded lists in Perl, but I think that your inductive proof would
lead you to conclude the opposite of what you're thinking.  You can pick a
first member of both sets.  Then you can pick a second member of both sets.
Then a third, then a fourth, and so and so forth for all cardinal numbers.
Even though your list of pairings one from each set itself stretches to
infinity (not that that means it actually stops somewhere, of course, as
though infinity were a place; I mean it just stretches ever upwards without
bound), then I think induction will convince you that in the resulting
pair-list, there are no missed members from either set.  So we are
comfortable saying that there are just as many of one as the other; well,
*I* am comfortable saying that, at least, and I hope you are, too.  :-)
It's initially a bit disturbing, though, when you realize that this
necessarily leads to saying there are just as many multiple of two as
there are of, oh, eight.  Maybe that's why Cantor died mad. :-)

--tom

Re: == vs. eq

2003-04-05 Thread Tom Christiansen

The IEEE-float-style infinities are quite sufficient for most purposes

One thing I agree is that writing  1..Inf  is a *bit* sloppy since the 
range operator  n..m  normally produces the numbers i for which 
n = i = m  while  n..Inf  gives  n = i  Inf

but I can live with it

I could sure save myself a lot of typing by reading ahead to message N+1
before answering message N. :-)

--tom

PS:  For all N.  :-):-)

Re: Barewords and subscripts

2002-01-26 Thread Tom Christiansen


Maybe there will be a Perl 6 rule forcing the keys to be quoted, but it
won't be because of the no barewords rule.  If there were such a rule, I
presume you'd also apply it to the LHS of =?

There is another way to resolve the ambiguity of foo meaning either
foo or foo() depending on current subroutine visibility.  This
would also extend then to issue of $hash{foo} meaning either
$hash{foo()} or $hash{foo}.  Just use parens.  

Oh, I know, I know.  I can already hear the mass reaction now: Oh,
horrors! cry the masses from every timezone.  But let's think about
it anyway.

Perl's historical optionality of explicit parentheses to delimit a
function's argument list is, like its similar optionality of explicit
quotation marks, a source of ambiguity.  And while ambiguity can
be a source flexibility, expressibility, and convenience, it can
also have a darker side that would be better relegated to obfuscated
programming contests than to production-calibre code.

In my experience, many programmers would prefer that all functions
(perhaps restricted to only those of no arguments to appease
hysterical cetaceans?) mandatorily take (even empty) parens.  Thus,
shift() in lieu of shift, no matter whether it's as a hash subscript
or the left-hand operand of the comma arrow, or whether it's floating
around free, outside of any such autoquoting construct.

Since this matter has now been mentioned, I would like to suggest that 
there lurk other related and perhaps even more important ramifications to 
the current optionality of parentheses than the one concerning strings.

Witness:

% perl -MO=Deparse,-p -e 'push @foo, reverse +1, -2'
push(@foo, reverse(1, -2));

% perl -MO=Deparse,-p -e 'push @foo, rand +1, -2'
push(@foo, rand(1), -2);

% perl -MO=Deparse,-p -e 'push @foo, time +1, -2'
push(@foo, (time + 1), -2);

[ Gr.  That should read time(). ]

% perl -MO=Deparse,-p -e 'push @foo, fred +1, -2'
push(@foo, ('fred' + 1), -2);

Do you see what I'm talking about?  The reader unfamiliar with the
particular context coercion templates of the functions used in code
like

use SpangleFrob;
frob @foo, spangle +1, -2;

can have no earthly idea how that will even *parse*.  This situation
seems at best, unfortunate.

I'm sure that if it were somehow possible to require proper placement of all
those parens, even with something like the hypothetical and wholly optional
use strict 'parens', that this would raise the hackles of many a current
Perl programmer.  But perhaps this owes more to the fact that those folks
do not have to explain or justify this particular--well, let's be charitable
and merely call it an issue--to those whom it befuddles or annoys than it
owes to any legitimate convenience or desirable functionality.  When you get
to see non-wizards repeatedly stumble on these ambiguities on a regular basis,
this whole situation can quickly become a source of frustration, embarrassment,
or both.  Whether this scenario inspires apologetics or apoplectics is not
consistently predictable.

However, if one were simply *able* to write something like

use SpangleFrob;
use strict 'parens';  # subsumed within a blanket use strict

frob(@foo, spangle(1, -2));
frob(@foo, spangle(1), -2);
frob(@foo, spangle() + 1, -2);

then, without even inflicting grievous harm on compile-time checking of
arguments, one could at least always readily discern which arguments went
where--which hardly seems an undesirable goal, now does it?

Nevertheless, even that wouldn't help in being able to know whether
that's really  meaning

frob(@foo, 
frob(   \@foo, 
frob( scalar @foo, 

It all would depend upon the existence of coercion templates such
as frob(@...), frob(\@...), and frob($...).  Sadly, there's no
B::Deparse switch to tell you under which scenario your operating,
but that's all probably best left for a semi-separate discussion
(if at all).

The devil's advocate might suggest that not knowing which of the
three treatments of @foo silently occurred in the frobbing function
call--which they with some credibility assert a desirable goal--goes
hand in glove with not knowing whether spangle is here acting as a
list-op, as a un(ary)-op, or as a non(e)-op.  But considering that
such devils need no help in their advocacy, I shan't bother to do
so myself.  :-)

--tom

Re: Perl 5's non-greedy matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen


 More generally, it seems to me that you're hung up on the description 
 of "*?" as "shortest possible match".  That's an ambiguous 

Yup, that's a bit confusing.  It's really "start matching as soon as
possible, and stop matching as soon as possible".  (The usual greedy
one is, of course, "keep matching as long as possible".)  The initial
invariant part, "start as soon as possible", is the de facto and de
jure (at least POSIX 1003.2, but probably also Single Unix)
definition, and therefore rather non-negotiable.

It's like people who write /^.*fred/ instead of /.*fred/.  They
are forgetting something critical: where the Engine starts the serach.

--tom

Re: Perl 5's non-greedy matching can be TOO greedy!

2000-12-15 Thread Tom Christiansen


Have you thought it through NOW, on a purely semantic level (in isolation
from implementation issues and historical precedent), 

I've said it before, and I'll say it again: you keep using 
the word "semantic", but I do not think you know what that word means.

--tom

Re: RFC 357 (v1) Perl should use XML for documentation instead of POD

2000-10-04 Thread Tom Christiansen


POD, presumably.  Or maybe son-of-POD; it would be nice to have better
support for tables and lists.

We did this for the camel.   Which, I remind the world, was 
written in pod.

''tom

Re: RFC 357 (v1) Perl should use XML for documentation instead of POD

2000-10-02 Thread Tom Christiansen


No-one ever did suggest adding « and » to the list of matched delimiters
that q() etc support, did they? :-)

I did.

Does Unicode define bracket pairings for character sets? ducks

$ grep ^Prop /usr/local/lib/perl5/5.6.0/unicode/Props.txt

does not seem very helpful, but this may not be much of a proof.

--tom

Re: RFC 357 (v1) Perl should use XML for documentation instead of POD

2000-10-02 Thread Tom Christiansen


- Done right, it could be easier to write and maintain

Strongly disagree.

- Why make people learn pod, when everyone's learning XML?

Because it is simple.  It is supposed to be simple.
It is not supposed to do what you want to do.
In fact, it is suppose to NOT DO what you want to do.

- Pod can be translated into XML and vice versa

Then do that.

- Standard elements could be defined and utilized with the
  same or greater ease than pod for build and configuration.

/pod
  NameModule::Name/Name
  Version0.01/Version
  Synopsisshort description/Synopsis
  Description
name=head1 long description/name
section
  name=head2 heading/name
  list type="ordered" symbol="1"
itemfoo/item
  /list
  Type in some text here...
/section
  /Description
  AuthorEliott P. Squibb/Author
  MaintainerJoe Blogg/Author
  Bugsnone/Bugs
  CopyrightDistributed under same terms as Perl/Copyright
  section
namedefine your own section/name
blab here
  /section
/pod

That is an excellent description of why THIS IS COMPLETE 
MADNESS.  

--TOM

Re: RFC 325 (v1) POD and comments handling in perl

2000-09-29 Thread Tom Christiansen


It really is not feasible to relax the pod
requirement that pod diretives begin with
an equals to allow them to begin with a 
pound sign as well, for to do so would expose
an untold number of programs to unpredictable 
effects.  I also don't really see any advantage.

And yes, I'm sure I'm days behind.  I have no
choice.

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Term::ReadKey inclusion

2000-09-29 Thread Tom Christiansen


It is unreasonably complicated to do single-character
input in a portable fashion.  We should therefore
include the Term::ReadKey module in the standard
distribution.

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 308 (v1) Ban Perl hooks into regexes

2000-09-28 Thread Tom Christiansen


I consider recursive regexps very useful:

 $a = qr{ (? [^()]+ ) | \( (??{ $a }) \) };

Yes, they're "useful", but darned tricky sometimes, and in
ways other than simple regex-related stuff.  For example,
consider what happens if you do

my $regex = qr{ (? [^()]+ ) | \( (??{ $regex }) \) };

That doesn't work due to differing scopings on either side
of the assignment.  And clearly a non-regex approach could
be more legible for recursive parsing.

--tom

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 143 (v2) Case ignoring eq and cmp operators

2000-09-28 Thread Tom Christiansen


This RFC still has silly language that discounts what
has been said before.  

1) It calls
uc($a) eq uc($b)
"ugly", despite their being completely intuitive and legible
to even the uninitiated.

2) It then proposes "eq/i" without the least blush, despite
   how incredibly ugly and non-intuitive and, if I may,
   syntactically perverse such a notion is.

--tom

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

my and local

2000-09-28 Thread Tom Christiansen


As we sneak under the wire here, I'm hoping someone
has posted an RFC that alters the meaning of my/local.
It's very hard to explain as is.  my is fine, but local
should be changed to something like "temporary" (yes, that
is supposed to be annoying to type) or "dynamic".

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 277 (v1) Eliminate unquoted barewords from Perl entirely

2000-09-28 Thread Tom Christiansen


Try thinking of it this way: it's only a bareword if 
it would make use strict whinge at you.  Thus, the
constructs you cited are all non-uses of barewords,
such as in use Foo or require Foo or Foo = 1, or
even $x{Foo}.  And I have proposed (nonRFC) that
Foo-bar() also be not a bareword.  Yes, I know 
strict doesn't carp about it, but that could be Foo().

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Murdering @ISA considered cruel and unusual

2000-09-28 Thread Tom Christiansen


I strongly agree with the opinion that we should try and get away from
special variables and switches in favor of functions and pragmas.
Witness 'use base' instead of '@ISA', 'use warnings', and so on.

Huh?  Why???  Perl's use of @ISA is beautiful.  It's an example
of code reuse, because we don't need no stinking syntax!

use base is, or can be, pretty silly -- think pseudohashes, 
just for one.

The general sentiment you espouse obviously has a line beyond
which you don't intend to cross.   The question is where
that line lies.

--tom, who knows that it's hard to read his mail, but it's
   even harder to write it

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 277 (v1) Eliminate unquoted barewords from Perl entirely

2000-09-28 Thread Tom Christiansen


So what's left?

print STDERR "Foo";

We have a proposal to turn STDERR into $STDERR, and it looks likely it'll go
through.

It is?  I certainly hope not.  It makes as much sense to 
do that as to force a dollar sign on subroutines.

   sub $foo { ... }

or 

   sub 'foo' { ... }

Heck, maybe everyone should be forced to write

   *foo = sub { ... };


$time = time;
print;

If use strict 'subs' is in effect you're guaranteed these are subroutine
calls, or compile-time errors.  If it isn't you get a nice little warning. 
Perhaps the stringification should be removed entirely, and the syntax
always be a subroutine call.

Eek, that's what I want to kill.  I want you to HAVE to 
write that as

$time = time();

with the parens.  The lack of parens is the root of MANY
an evil in perl.

--tom

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 48 (v4) Replace localtime() and gmtime() with date() and utcdate()

2000-09-28 Thread Tom Christiansen


Certainly numbers should never be "zero-padded"!

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-27 Thread Tom Christiansen


This is screaming mad.  I will become perl6's greatest detractor and
anti-campaigner if this nullcrap happens.  And I will never shut up
about it,
either.  Mark my words.

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: Perl6Storm: Intent to RFC #0101

2000-09-27 Thread Tom Christiansen


./sun4-solaris/POSIX.pm:sub isatty {
./sun4-solaris/B/Deparse.pm:sub is_scope {
./sun4-solaris/B/Deparse.pm:sub is_state {
./sun4-solaris/B/Deparse.pm:sub is_miniwhile { # check for one-line loop
(`foo() while $y--')
./sun4-solaris/B/Deparse.pm:sub is_scalar {
./sun4-solaris/B/Deparse.pm:sub is_subscriptable {
./CGI.pm:sub isindex {
./CPAN.pm:sub is_reachable {
./CPAN.pm:sub isa_perl {
./Pod/Select.pm:sub is_selected {
./ExtUtils/Embed.pm:sub is_cmd { $0 eq '-e' }
./ExtUtils/Embed.pm:sub is_perl_object {

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: Perl6Storm: Intent to RFC #0101

2000-09-27 Thread Tom Christiansen


You suggested:

 file($file, 'w');  # is it writeable?

That's really insane.  The goal was to produce code that's legible. 
That is hardly better.  It's much worse than is_writable or writable
or whatnot.  Just use -w if that's what you want.

--tom

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 259 (v2) Builtins : Make use of hashref context for garrulous builtins

2000-09-27 Thread Tom Christiansen


 grep -l Class::Struct */*.pm
Class/Struct.pm
File/stat.pm
Net/hostent.pm
Net/netent.pm
Net/protoent.pm
Net/servent.pm
Time/gmtime.pm
Time/localtime.pm
Time/tm.pm
User/grent.pm
User/pwent.pm

Please check those out for precedent and practice.

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 290 Remove -X

2000-09-27 Thread Tom Christiansen


One doesn't remove useful and intuitive syntax
just because Mr Bill never put it into MS-BASIC!

I merely passingly suggested that there be a 
use English style alias for these.  They are, however,
wholly natural to millions of people, and should not
be harrassed.  (NB: 10 million Linux weenies alone)
Still, twould be nice to have -rw and -rx and stuff, too. :-)
BTW, -s(FH)/2 is still wickedly broken.

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Better Security support (was: RFC 290 (v1) Remove -X)

2000-09-27 Thread Tom Christiansen


The -wd syntax (writeable directory) is nicer than file($file, "wd").
But anyway, there's hardly anything wrong with -w  -d.  Don't
understand
the complaint.

One thing I would really like to see is better security support.  Look
at the Camel-III's security chapter, File::Temp, and the is_safe
stuff I've done lately.

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 303 (v1) Keep Cuse less, but make it work.

2000-09-27 Thread Tom Christiansen


Don't change "use less" to "use optimize".  We don't
need to ruin the cuteness.

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

RFC 307 (v1) PRAYER - what gets said when you Cbless something

2000-09-27 Thread Tom Christiansen


Goodness, no, don't call it "PRAYER".   The blessing
is one of corporate approval, not ecclesiastical deprecationem.
Please don't piss people off.

Visit our website at http://www.ubswarburg.com

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-21 Thread Tom Christiansen


No, in that wonderfully consistent Perl documentation, it's "undef" not SNIP
is only used to refer to (as you pointed out in another post)

   the null string
   the null character
   the null list

Those use null as an adjective.  This RFC proposes an addition to Perl tSNIP

   the null

This uses null as a noun, and it has a different meaning than undef.

A null is a null byte, or a null character.  Period.  You are
completely out of your mind if you expect to co-opt an extant term
for this screwed up notion of yours.  I place my faith in Larry 
not to fuck up the language with your insanity.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-21 Thread Tom Christiansen


 In Perl, this is the null string:""
 In Perl, this is the null character: "\0"
 In Perl, this is the null list:  ()

In RFC 263, this is the null:  null

That's a different word for a different concept.  No conflict, if you
learn the way the RFC speaks.

Wrong.  Just plain wrong.

 It's a shame you don't like it, but this is the way we speak.

What's this we and you business?  I'm a perl user too.

Who can't speak.

 If you wish to make sense of the documentation, you must learn
 its language.

The documentation isn't all that consistent about everything, either.
Perhaps you, personally, are more so, and if so, perhaps you should help
rewrite the documentation to make it as perfectly consistent as yourself.

Thank you very much, but I just did that.  It's called Camel-3.

I allowed that you might want to call it the null string, and I'm allowed
to read "null string" and think "empty string", and I'm just as right as
you are.  You must not have a cohesive argument to make, if you resort to
insults in an attempt to make points.

You haven't heard insults.  Here are insults: you are a stupid idiot.
And I am incredibly glad that within hours, I'm about to spend three solid 
weeks afraid from such a fucked up blathering fool.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-21 Thread Tom Christiansen


 By your "reasoning", we can just add infinitely more things that
 take twice a few pages to explain.

You took that to an illogical extreme conclusion.  Clearly you can't add
everything to the language.  However, it is clear by the set of currently
submitted RFCs that more people think suggesting additions to Perl is a better
use of their time than suggesting subtractions.

Bullshit.   

If it takes several pages just to introduce your new, fucked-up
notion of "false", you've done something wrong.  If it then again
takes several pages just to introduce how your new, fucked-up notion
of "false" is different from the existing ones, you've done something
wrong.

Guess what?

You've done something wrong.

 Perl is already too hard.

So make it easier.  Where are your RFCs to remove things?

They're right here in my edit buffer.  I will simply explain them
to Larry directly.  You won't even get the chance to waste my time.

Fortunately, I have every reason to believe that Larry will reject
your idiotic notion of false that grew out of a cancerous complexity
in an obscure niche of programming has no business burdening users
with its incredibly lame-ass naming and confusing behavior.

--tom

Re: Beefier prototypes (was Re: Multiple for loop variables)

2000-09-21 Thread Tom Christiansen


Could the prototype people please report whether Tim Bunce's issues with 
prototypes have been intentionally/adequately addressed?

I'm not a prototype person (in fact RFC 128 makes it a hanging offence
to use that confusing word in connection with parameter lists! ;-)
Could someone please recapitulate Tim's issues?

The long story is here:

http://www.perl.com/pub/language/misc/bunce.html

The short story includes details that involve how to permit

sub fn($$$)

to work with

fn(@foo)

where @foo==3, which won't be known till runtime.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-21 Thread Tom Christiansen


 Russ:

 About the only piece of code of mine that this would affect are places
 where I use ++ on an undef value, and that's not a bad thing to avoid as a
 matter of style anyway (usually I'm just setting a flag and = 1 would work
 just as well; either that, or it's easy enough to explicitly initialize
 the counter to 0).

 Philip:

Depends. While it is possible to initialise counters in the canonical
"have I seen this before" situation, it's more convenient the way it is at
the moment:

$seen{$word}++;

looks, to me, nicer than

$seen{$word} = (exists $seen{$word}) ? 1 : $seen{$word} + 1;

er, flip that.

or

if(defined($seen{$word})) { $seen{$word}++ } else { $seen{$word} = 1 }

or similar.

In general, if you can get away with a simpler expression, it's better.
For example, 

if ($foo  is_whatnot($foo)) 

is inferior to 

if ($foo) 

just as

if (!$foo  !is_whatnot($foo)) 

is inferior to 

unless ($foo)

"Inferior by what metric?" you ask?  Complexity.  

Larry wrote (in Camel-3) that

...the autoincrement will never warn that you're using undefined values,
because autoincrement is an accepted way to define undefined values.
^^^

So I think you're safe there.

He also wrote:

The C|| can be used to set defaults despite its origins as a
Boolean operator, since Perl returns the first true value.  Perl
programmers manifest a cavalier attitude with respect to truth,
since the line above would break if, for instance, you tried
to specify a quantity of 0.  But as long as you never want to
set either C$quality or C$quantity to a false value, the
idiom works great.  There's no point in getting all superstitious
and throwing in calls to Cdefined and Cexists all over the
place.  You just have to understand what it's doing.  As long
as it won't accidentally be false, you're fine.

Simple true and simple false are best if your goal is simplicity.
Sometimes you need more than that.  So you write functions.  Or,
if you're into the quirks of using strange magic of occasionally
dubious charm, then through operationally overloaded objects.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-21 Thread Tom Christiansen


How can you convince anyoone if you say you would not use it. For any feature
enhancement to perl, unless there is a strong case for how it makes
the labguage easier and better it is just not going to happen.

It's not as though Tim Bunce has been hollering for this, which is a 
bad sign.

--tom

PERL6STORM - tchrist's brainstorm list for perl6

2000-09-21 Thread Tom Christiansen

t people don't get surprised.

=item perl6storm #0053

Make DIRH call readdir() just as FILEH calls readline().

=item perl6storm #0054

Add dup() and dup2() style stuff to give legible ways of
handling FH and =FH.

=item perl6storm #0055

Make it clean and easy to push input and output stream filters as
alteratives to forkopen.  For example:

push_filter STDOUT, { s/^/READY/ }

is like calling program with

prog | perl -pe 's/^/READY/'

or fancy forkopen tricks.  Allow ways to look at filters
on stack.

=item perl6storm #0056

Remove the complicated "extended" regex features (?...) Or rewrite
them in a non-regex spec-y way.  (?{...}) and (??{...}) come to
mind.

=item perl6storm #0057

ADD MORE TOOLS  Code devel and analysis tools.  Maybe PPT, too.

BTW: I can't make "perlman fred" be accessible as "perl -man fred"
but I want to.  That's because the an.pm pragma won't be loaded
without a read -e or script. So "fred" must exist as a path, for
all values of "fred".  That's annoying.

=item perl6storm #0060

formats and html doesn't mix nicely (not wysiwig).   Neither do any
hidden chars, like ESC-blah.  Can we do more for generating simple
clean transparent xml?

=item perl6storm #0061

Make CPAN.pm not hate me.

=item perl6storm #0062

Can there be an anal mode that detects anything that might xU
(raise exception as unimplemented on some plats?)

=item perl6storm #0063

Core the portopath manippers.  Too slow.  Fix their stupid
names:  catfile sucks.  it doesn't `cat file`.

=item perl6storm #0064

Do something about microsoft's CRLF abomination.

=item perl6storm #0065

Make indirect objectable built-ins overloadable/overrideable/inheritable
by object type.

=item perl6storm #0066

Allow next/last/redo in do{} per C.

=item perl6storm #0067

Where's ferror()?   Can we raise exceptions on them?

use io_errors;  # wrong name

or maybe

STDOUT-raise_on_error

=item perl6storm #0070

Make tacit fclose stdout detect failure.  

END { close STDOUT || die "close STDOUT: $!" }

But was there some horrible gotcha with this?

=item perl6storm #0071

How do you prototype split?  print?

=item perl6storm #0072

"Fix" the $ prototype.  No coerce on @ or %.  Fix for
lists.  fn($$) should permit fn(foo()) to mean fn((foo())[0,1]),
which is damned annoying to write.  likewise fn(@foo[0,1]),
which freaks.

=item perl6storm #0073

kill bareword strings entirely.

=item perl6storm #0074

make all the built-ins take perl style interfaces, not C ones.
eg: notice how larry changed openlog from C to Perl.  having
to write "O_blah | O_blah" hurts.

=item perl6storm #0075

Make a way for regex switches not to be single lettered.

re_match( EXPR, REGEX, FLAGS )

$gotit = 
re_match($line = readline(), 
 qr/^foo.*bar/, 
 REG_ICASE | REG_NEWLINE)

But now we're back to ugly O_ or'ing.  See regcomp(3).

=item perl6storm #0076

Allow ASCII characters to be specified symbolically.
Too retro?  chr(NUL), chr(SOH), chr(STX).  Or are those
already chars, and one would do ord(SOH) instead?  Module
would suffice.

=item perl6storm #0077

make open(FH, "|cmd|") just work -- call open2 etc.

=item perl6storm #0100

add python and java sections to perltrap.

=item perl6storm #0101

Just like the "use english" pragma (the modern not-yet-written
version of "use English" module), make something for legible
fileops.

is_readable(file) is really -r(file)

note that these are hard to write now due to -s(FH)/2 style
parsing bugs and prototype issues on handles vs paths.

=item perl6storm #0102

Make "my sub" work.

Make nested subs work.

=item perl6storm #0103

Finally implement the less pragma.

use less 'memory';

etc.   Right now, you can say silly things.

use less 'sillyiness';

What about use more?  Or is that just no less

use less 'magic';
no  more 'magic';

=item perl6storm #0104

Look at the deep magic seen in some of the examples in Camel-3's
OO and tie chapters and in perltootc.  Consider what to canonize
into a simpler-to-get-at mechanism, just as plum engendered much
in perl5.

=item perl6storm #0105

Learn to count in decimal.

=back

=head1 BUGS

None.  These are features.

=head1 AUTHOR

Tom Christiansen

Re: PERL6STORM - tchrist's brainstorm list for perl6

2000-09-21 Thread Tom Christiansen


=item perl6storm #0106

Safe "signals"!  (not syssigs,really)

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-21 Thread Tom Christiansen


Now, that's not accurate either.  "NUL" is simply a normalized form of "null",
because all the ASCII special characters have three upper-case letter names.
There is no doubt that the ASCII guys meant "null" by this.

All other matters aside, kindly consider this simple one: If ever
you thought homophones were bad, imagine then how to rely upon
nothing more than mere case distinction, an ancillary artifact of
our system of writing, in two distinct terms whose usages are not
radically different from each other but whose meanings most certainly
are, is an endeavour virtually guaranteed to be frequently misheard
and thus misconstrued when those terms are used in spoken discourse--as
they inevitably shall be.

--tom

Re: \z vs \Z vs $

2000-09-20 Thread Tom Christiansen


 "TC" == Tom Christiansen [EMAIL PROTECTED] writes:

 Could you explain what the problem is?

TC /$/ does not only match at the end of the string.
TC It also matches one character fewer.  This makes
TC code like $path =~ /etc$/ "wrong".

Sorry, I'm missing it.

I know.  

On your "longest match", you are committing the classic error of thinking
green more important than eagerness.  It's not.

This is unrelated to /m.

Go back and read all the insanities we (mostly gbacon and your
truly) went through to fix the 5.6 release's modules.  People coded
them *WRONG*.  Wrong means incorrect behaviour.  Sometimes this
even leads to security foo.

BOTTOM LINE: You cannot use /foo$/ to say "does the string end in `foo'?".
You can't do that.  You can't even use /s to fix it.  It doesn't fix it.

This is an annoying gotcha.  Larry once said that he wished he had made  \Z
do what \z now does.  One would like $ to (be able to) mean "ONLY AT END OF
STRING".

--tom

EXAMPLE 1:

--- /usr/local/lib/perl5/5.00554/File/Basename.pm   Mon Jan  4 13:00:53 1999
+++ /usr/local/lib/perl5/5.6.0/File/Basename.pm Sun Mar 12 22:24:29 2000
@@ -37,10 +37,10 @@
 "VMS", "MSDOS", "MacOS", "AmigaOS" or "MSWin32", the file specification 
 syntax of that operating system is used in future calls to 
 fileparse(), basename(), and dirname().  If it contains none of
-these substrings, UNIX syntax is used.  This pattern matching is
+these substrings, Unix syntax is used.  This pattern matching is
 case-insensitive.  If you've selected VMS syntax, and the file
 specification you pass to one of these routines contains a "/",
-they assume you are using UNIX emulation and apply the UNIX syntax
+they assume you are using Unix emulation and apply the Unix syntax
 rules instead, for that function call only.
 
 If the argument passed to it contains one of the substrings "VMS",
@@ -73,7 +73,7 @@
 
 =head1 EXAMPLES
 
-Using UNIX file syntax:
+Using Unix file syntax:
 
 ($base,$path,$type) = fileparse('/virgil/aeneid/draft.book7',
'\.book\d+');
@@ -102,7 +102,7 @@
 The basename() routine returns the first element of the list produced
 by calling fileparse() with the same arguments, except that it always
 quotes metacharacters in the given suffixes.  It is provided for
-programmer compatibility with the UNIX shell command basename(1).
+programmer compatibility with the Unix shell command basename(1).
 
 =item Cdirname
 
@@ -111,8 +111,8 @@
 second element of the list produced by calling fileparse() with the same
 input file specification.  (Under VMS, if there is no directory information
 in the input file specification, then the current default device and
-directory are returned.)  When using UNIX or MSDOS syntax, the return
-value conforms to the behavior of the UNIX shell command dirname(1).  This
+directory are returned.)  When using Unix or MSDOS syntax, the return
+value conforms to the behavior of the Unix shell command dirname(1).  This
 is usually the same as the behavior of fileparse(), but differs in some
 cases.  For example, for the input file specification Flib/, fileparse()
 considers the directory name to be Flib/, while dirname() considers the
@@ -124,12 +124,22 @@
 
 
 ## use strict;
-use re 'taint';
+# A bit of juggling to insure that Cuse re 'taint'; always works, since
+# File::Basename is used during the Perl build, when the re extension may
+# not be available.
+BEGIN {
+  unless (eval { require re; })
+{ eval ' sub re::import { $^H |= 0x0010; } ' }
+  import re 'taint';
+}
+
+
 
+use 5.005_64;
+our(@ISA, @EXPORT, $VERSION, $Fileparse_fstype, $Fileparse_igncase);
 require Exporter;
 @ISA = qw(Exporter);
 @EXPORT = qw(fileparse fileparse_set_fstype basename dirname);
-use vars qw($VERSION $Fileparse_fstype $Fileparse_igncase);
 $VERSION = "2.6";
 
 
@@ -162,23 +172,23 @@
   if ($fstype =~ /^VMS/i) {
 if ($fullname =~ m#/#) { $fstype = '' }  # We're doing Unix emulation
 else {
-  ($dirpath,$basename) = ($fullname =~ /^(.*[:\]])?(.*)/);
+  ($dirpath,$basename) = ($fullname =~ /^(.*[:\]])?(.*)/s);
   $dirpath ||= '';  # should always be defined
 }
   }
   if ($fstype =~ /^MS(DOS|Win32)/i) {
-($dirpath,$basename) = ($fullname =~ /^((?:.*[:\\\/])?)(.*)/);
-$dirpath .= '.\\' unless $dirpath =~ /[\\\/]$/;
+($dirpath,$basename) = ($fullname =~ /^((?:.*[:\\\/])?)(.*)/s);
+$dirpath .= '.\\' unless $dirpath =~ /[\\\/]\z/;
   }
-  elsif ($fstype =~ /^MacOS/i) {
-($dirpath,$basename) = ($fullname =~ /^(.*:)?(.*)/);
+  elsif ($fstype =~ /^MacOS/si) {
+($dirpath,$basename) = ($fullname =~ /^(.*:)?(.*)/s);
   }
   elsif ($fstype =~ /^AmigaOS/i) {
-($dirpath,$basename) = ($fullname =~ /(.*[:\/])?(.*)/);
+($dirpath,$basename) = ($fullname =~ /(.*[:\/])?(.*)/s);
 $dirpath = './' unless $dirpath;
   }
   e

Re: \z vs \Z vs $

2000-09-20 Thread Tom Christiansen


That was my second thought. I kinda like it, because //s would have two
effects:

 + let . match a newline too (current)

 + let /$/ NOT accept a trailing newline (new)

Don't forget /s's other meaning.

--tom

Re: RFC 212 (v1) Make length(@array) work

2000-09-20 Thread Tom Christiansen


What I said was: making length(@array) "work" would be catering to
novice people *coming from C*. We shouldn't. Not that much. In Perl, a
string is not an array.

I'm pretty sure it's not just the people coming from C who expect this.

This all points to the bug^H^H^Hdubious feature which is the sub($)
context template as applied to named arrays and hashes.  Requiring
an explicit conversion would help a lot.  Or so it seems.

--tom

Re: RFC 153 (v2) New pragma 'autoload' to load functions and modules on-demand

2000-09-20 Thread Tom Christiansen


This will make programs highly nonportable.  You can't easily know what 
modules they really need.

--tom

Re: RFC 12 (v2) variable usage warnings

2000-09-20 Thread Tom Christiansen


And what about $$x?

Dang, are we back to this incredible confusion about what it is to be
defined in Perl.?

undef $a;

That is now UNINITIALIZED.  So is this:

$a = undef;

You have initialized it to undef.  There is no reasonable difference.

Solution:

Remove all references from the language to defined and undef.
People just aren't smart enough to understand them.  Change
defined() to read has_a_valid_initialized_scalar_value().  Change
undef() to "operator_to_uninitialize_a_variable".  Touch luck
on the chumps who can't type well.  They pay for their brothers'
idiocy.

repeat until blue:

  INITIALIZED ==   DEFINED
UNINITIALIZED == UNDEFINED


--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


Nathan Wiger wrote:
 
 ...a "use tristate" pragma which obeys blocks

bka "lexically scoped".  If I'm not mistaken, pragmas *are* lexically scoped.

They *can* be.  They needn't be.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


The semantics for NULL is different, read the SQL standard.  

Perl has no business contaminating itself with SQL.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


Unlike undef, which gets assigned to uninitialized variables, NULL is only
used by choice.  So you only need deal with NULL when there is the
possibility that it needs to be handled in some special way, and might exist
as a value in the expression being handled.

This can be done without being in the language.  Return a ref to a
blessed object whose stringification or numification method raises
an exception.  

The novice need not use NULL until he is an expert, or is dealing with
databases.  As an expert, it is not hard to understand the difference, and if
dealing with databases, there is a definite need to understand the
difference.

I completely disbelieve.  Changing the fundamental nature of what
a VALUE is in Perl is hardly something you can hide.  The amount
of pain people seem to go through already understanding this stupid
spectre out of database hell is sufficient to run in terror.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


 no strict;
 $a = undef;
 $b = null;

Perl already has a null string: "".

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


Perl has *one* out-of-band value.  It doesn't need more.  That
doesn't mean that perhaps some rare sorts of programming might not
benefit from fancy weirdnesses.  That's what modules are for.
You don't need to complicate the general language to get what
you want.  Don't make others pay for your problems.

1) all otherwise uninitialized variables are set to undef

Wrong.  You cannot say that an aggregate is undef.  Scalar
variables--not all variables, just scalar variables alone--hold the
uninitialized value, henceforth known as the antiïinitialized value,
if they were last initialized to the antiïinitialized value, or if
they haven't been initialized at all--in which case, I suppose, you
*might* choose to call it _a_n_t_eïinitialized instead of antiïinitialized,
but then you'll get people wanting to split those up again.

2) under "use strict", use of undef in expressions is diagnosed with a warning

Wrong.  You are thinking, perhaps, of `use warnings', not `use strict'.
In particular, 

use warnings qw'uninitialized';

3) undef is coerced to 0 in numeric expressions, false in boolean expressions,
and the empty string in string expressions.

I'm not happy with your use of "coerce".  There's no mutation.  It simply
*is* those things.  It's not quite kosher to claim that undef gets "coerced"
to false in Boolean expresions.  The antiïinitialized value *is* a false
value.  The only false number is 0, and therefore the antiïinitialized 
numeric value is 0.  Yes, we have two false strings--lamentably--but since
we need a canonical one (eg the result of 1 == 2), we choose "".

You also forgot this:

4) The antiïinitialized value is autovivified to a true value when
used that value is (legally) used lvaluably.  

Notice also this:

% perl -le 'use warnings; $a = 1 == 2; print $a-[1] ? "good" : "bad"'
bad

% perl -le 'use strict;   $a = 1 == 2; print $a-[1] ? "good" : "bad"'
Can't use string ("") as an ARRAY ref while "strict refs" in use at -e line 1.
Exit 255

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


$a = null;
$b = ($a == 42);
print defined($b)? "defined" : "not defined";

would print "not defined", maybe?

In a sane world of real (non-oo-sneaky) perl, the "==" operator returns 
only 1 or "".  Both are defined.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


It only takes a few pages, and a few truth tables to explain NULL.
It should only take a few pages and a few examples, to explain the
difference between undef and null.

Ah, so the cost of this is twice a few pages of explanation, plus truth 
tables and examples?  Are you mad?

I can think of no better proof that this is the Wrong Thing than 
your very own words.  Thank you.

---tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


* Tom Christiansen ([EMAIL PROTECTED]) [21 Sep 2000 05:49]:
  no strict;
  $a = undef;
  $b = null;

 Perl already has a null string: "".

Looks more like a string of no length than a null string.

Well, it's not.  That's a null string.  You're thinking of "\0", 
a true value in Perl.

Here are the canonical definitions:

NULL STRING:
A string containing no characters, not to be confused with
a string containing a null character, which has a positive
length.

NULL CHARACTER:
A character with the ASCII value of zero.  It's used by C
and some Unix syscalls to terminate strings, but Perl allows
strings to contain a null.

NULL LIST:
A list value with zero elements, represented in Perl by ().

--tom

Re: RFC 85 (v2) All perl generated errors should have a unique identifier

2000-09-20 Thread Tom Christiansen


 "TC" == Tom Christiansen [EMAIL PROTECTED] writes:

 Currently many programs handle error returns by examining the text of
 the error returned in $@. This makes changes in the text of the error
 message, an issue for the backwards compatibility police.

TC eval {  fn() };
TC if ($@ == EYOURWHATHURTS) {  } 
TC sub fn { die "blindlesnot" }

I don't understand what you are trying to say.

I'm saying that you can't know what to check for, because you don't
know who generated the exception.  Can you use your fancy constants?

And what is "core"?  Compiler?  Interpreter?  Utilities?  Pragmata?
Modules?

Citing IBM as a reference is enough to drive a lot of us away screaming.

Try errno.h or sysexits.h  Notice how much nicer this is.  Few
values, but usable in varied places.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


That's not much different than the cost of undef, so I fear it proves
nothing, universally.

YOU OVERQUOTEDsen wrote:

YOU OVERQUOTEDkes a few pages, and a few truth tables to explain NULL.
YOU OVERQUOTEDonly take a few pages and a few examples, to explain the
YOU OVERQUOTED between undef and null.
YOU OVERQUOTED
YOU OVERQUOTEDcost of this is twice a few pages of explanation, plus truth
YOU OVERQUOTEDexamples?  Are you mad?
YOU OVERQUOTED
YOU OVERQUOTED of no better proof that this is the Wrong Thing than
YOU OVERQUOTEDwn words.  Thank you.
YOU OVERQUOTED
YOU OVERQUOTED
YOU OVERQUOTED
YOU OVERQUOTED
YOU OVERQUOTED
YOU OVERQUOTED
YOU OVERQUOTEDe on the right track,
YOU OVERQUOTEDn over if you just sit there.
YOU OVERQUOTED  -- Will Rogers
YOU OVERQUOTED
YOU OVERQUOTEDFree Internet Access and Email__
YOU OVERQUOTED.netzero.net/download/index.html

By your "reasoning", we can just add infinitely more things that
take twice a few pages to explain.

Perl is already too hard.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


 For example, assuming this code:

$name = undef;
print "Hello world!" if ($name eq undef);

So don't do that.  Use Cdefined $name if you want to ask that question.

That's why I want to change the names of these things.  The current
situation invites errors such as seen previously.  

Actually, one almost wants a warning on "=undef", too.  Well, some uses.  

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


Tom Christiansen wrote:

  no strict;
  $a = undef;
  $b = null;

 Perl already has a null string: "".

That's an empty string.  In any case, if you really want to call it a
null string, that's fine, just a little more likely to be
misinterpreted.  

In Perl, this is the null string:""
In Perl, this is the null character: "\0"
In Perl, this is the null list:  ()

It's a shame you don't like it, but this is the way we speak.
If you wish to make sense of the documentation, you must learn
its language.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


 I'm not happy with your use of "coerce".  There's no mutation.  It simply
 *is* those things.

Fine.  So, in particular, it _isn't_ null.

Of course it's null.  That's why it has length zero.  Stop speaking
SQL at me.  I'm speaking Perl.

 4) The antiïinitialized value is autovivified to a true value when
 used that value is (legally) used lvaluably.

If, by "true value" in the above, you mean a value other than undef whicSNIP
interpreted as boolean false, then I think I understand what you said.  SNIP
enough to have said it, which is why I used coerce.

No, I mean this:

undef $a;
@$a = ();
if ($a) { . } # always true

It's the lvaluable deref that autoinitializes.

--tom

Re: RFC 12 (v2) variable usage warnings

2000-09-20 Thread Tom Christiansen


But that doesn't even matter that much here; I'm saying that if the
compiler can definitely determine that you are using an uninitialized
variable, it should warn. 

...

$x is a global. The compiler cannot detect all possible assignments to
or accesses of globals, so it never warns about them.

If you inserted my $x at the top of that code, it would most likely
produce the "possible use" warning. Or not; this is a simple enough case
that it might be able to infer the right answer.

I am certainly not saying that the "possible use" warning should be
enabled by default. But please, argue over that one separately from the
others. It's the most likely to annoy.

 Or:
 foo();
 print $x;
 
 Generate a warning, or not?  Which one? Remember, foo() may initialize $x.

Same thing. If $x is lexical, it gives a definite warning. If $x is a
global, it says nothing. You're right; I need to point this out in the
RFC.

Careful:

sub ouch {
my $x;
my $fn = sub { $x++ };
register($fn);
print $x;
} 

--tom

Re: RFC 12 (v3) variable usage warnings

2000-09-20 Thread Tom Christiansen


Which is silly, because you shouldn't have to say '$x = $x = 3' when you
mean '$x = 3'.  Just because there's a real reason behind it doesn't make it
any less silly.

I'd like to see where this can happen.  Sounds like someone forgot to
declare something:

our $x;
$x = 2;

--tom

Re: RFC 12 (v2) variable usage warnings

2000-09-20 Thread Tom Christiansen


Anything else? Any opinion on whether eval "" should do what it does
now, and be invisible for the purposes of this analysis; or if it should
be assumed to instead both use and initialize all visible variables? The
former produces more spurious warnings, the latter misses many errors.

You have to assume eval STRING can do anything.

--tom

Re: RFC 12 (v3) variable usage warnings

2000-09-20 Thread Tom Christiansen


It happens when I don't bother to declare something. My company has
several dozen machines with an 'our'-less perl, and 'use vars qw($x)' is
a pain. As is $My::Package::Name::x.

Far, far easier to fix behavioral problems than to hack Perl.

--tom

Re: RFC 85 (v2) All perl generated errors should have a unique identifier

2000-09-20 Thread Tom Christiansen


Ok, so you want message catalogues, and not solely on Perl but anything
in the distribution.  You should say that.

--tom

Re: RFC 12 (v2) variable usage warnings

2000-09-20 Thread Tom Christiansen


Tom Christiansen wrote:
 
 Anything else? Any opinion on whether eval "" should do what it does
 now, and be invisible for the purposes of this analysis; or if it should
 be assumed to instead both use and initialize all visible variables? The
 former produces more spurious warnings, the latter misses many errors.
 
 You have to assume eval STRING can do anything.
 
 --tom

"have to"? Perl5 doesn't.

You mean "perl".

% perl -we '$x = 3; $v = "x"; eval "\$$v++"'
Name "main::x" used only once: possible typo at -e line 1.

Non sequitur.  And no, I don't have time.

Re: RFC 12 (v3) variable usage warnings

2000-09-20 Thread Tom Christiansen


Tom Christiansen wrote:
 
 It happens when I don't bother to declare something. My company has
 several dozen machines with an 'our'-less perl, and 'use vars qw($x)' is
 a pain. As is $My::Package::Name::x.
 
 Far, far easier to fix behavioral problems than to hack Perl.
 
 --tom

Not sure what you mean, since this RFC _adds_ a warning in this case. In
fact, with the proposed change, my trick to avoid punishment for my
misbehavior would no longer work.

The point is that if

$x = 3;

elicits a warning...

that you should declare the variable properly, of course.

--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-20 Thread Tom Christiansen


But I see code in the XML modules that check defined (@array)

They're buggy and wrong. 

--tom

Re: RFC 12 (v2) variable usage warnings

2000-09-20 Thread Tom Christiansen


Have a nice day.  And thanks for all the fish.

\z vs \Z vs $

2000-09-19 Thread Tom Christiansen


What can be done to make $ work "better", so we don't have to
make people use /foo\z/ to mean /foo$/?  They'll keep writing
the $ for things that probably oughtn't abide optional newlines.

Remember that /$/ really means /(?=\n?\z)/. And likewise with \Z.

--tom

Re: RFC 259 (v1) Builtins : Make use of hashref context for garrulous builtins

2000-09-19 Thread Tom Christiansen


It's hard to remember the sequence of values that the following
builtins return:

stat/lstat
caller
localtime/gmtime
get*

and though it's easy to look them up, it's a pain to look them up
Every Single Time.

Moreover, code like this is far from self-documenting:

use File::stat;

if ((stat $filename)[7]  1000) {...}

if ((lstat $filename)[10]  time()-1000) {...}

use Time::localtime;

if ((localtime(time))[3]  5) {...}

use User::pwent;

if ($usage  (getpwent)[4]) {...}

use Net::hostent;
@host{qw(name aliases addrtype length addrs)} = gethostbyname $name;

Don't have one for that.
warn "Problem at " . join(":", @{[caller(0)]}[3,1,2]) . "\n";

It is proposed that, when one of these subroutines is called in the new
HASHREF context (RFC 21), it should return a reference to a hash of values,
with standardized keys. For example:

Which is what the modules listed above do.  And more.

--tom

Re: RFC 258 (v1) Distinguish packed binary data from printable strings

2000-09-19 Thread Tom Christiansen


On 19 Sep 2000, Perl6 RFC Librarian wrote:

 Distinguish packed binary data from printable strings

What defines a "printable" string?  What if I'm working in an environment
that can "print" bytes that yours can't?  Specifically I'm wondering how
this proposal handles Unicode.

Perl should fly far and fast from starting down the bumpy road where
that data is strongly typed in the mythical and deceptive text-vs-binary
sense, for that path is one littered with the frustrations of a
legion of programmers stretching from the plodding mainframe operating
systems of our grandfathers to the toybox legacies of our less
fortunate brethren of today.  Heed the wisdom of the Unix and the
song of the C: Let your data be simply data, homogenously clean.
To do otherwise is to suffer unending inanities and combinatoric
misconnects of noncongruent types.

--tom

Re: RFC 255 (v2) Fix iteration of nested hashes

2000-09-19 Thread Tom Christiansen


This RFC proposes that the internal cursor iterated by the Ceach function 
be stored in the pad of the block containing the Ceach, rather than
being stored within the hash being iterated.

Then how do you specify which iterator is to be reset when you wish 
to do that?  Currently, you do this by specifying the hash.  If the
iterator is no longer affiliated with the hash, but the opcode node,
then what are you going to do?  

--tom

Re: RFC 258 (v1) Distinguish packed binary data from printable strings

2000-09-19 Thread Tom Christiansen


Perhaps what you're truly looking for is a generalized tainting mechanism.

--tom

Re: RFC 255 (v2) Fix iteration of nested hashes

2000-09-19 Thread Tom Christiansen


Just to note: in version 2 of the RFC, it's associated with the pad of
the block in which the Ceach appears.

then what are you going to do?  

The short answer is that there is no "manual" reset of iterators.

I am concerned about that.

sub fn(\%) {
my $href = shift;
while (my($k,$v) = each %$href) {
return if something's funny;
} 
} 

Now, imagine you call

fn(%foo);
fn(%bar);

and there's a premature exit.  Isn't the second fn() going to not
only be at the wrong spot, but still worse, at the wrong hash?

Or do you plan for all block exits to clear all their iterators?

What happens then in this code:

for my $hr (\(%foo, %bar, %glarch)) {
push @first_keys, scalar each %$hr;
} 

There's no block exit there.

--tom

Re: RFC 258 (v1) Distinguish packed binary data from printable strings

2000-09-19 Thread Tom Christiansen


Tim Conrow wrote:
 
 Tom Christiansen wrote:
  Perhaps what you're truly looking for is a generalized tainting mechanism.
 
 Sounds cool, but I have only the vaguest idea what you (may) mean. Pointers?
 RFCs? Examples? Hints?

Sorry for the clutter, but I didn't want to come off too clueless. I know what
tainting is, I just don't know what you mean by generalized tainting. If it's
been discussed before I'd love to see a pointer to the thread.

You want to have more properties that work like tainting does: a
per-SV attribute that is enabled or disabled by particular sorts
of expressions, sometimes dependent upon the previous presence or
absence of that property, other times, not so.

--tom

Re: RFC 76 (v2) Builtin: reduce

2000-09-19 Thread Tom Christiansen


   $sum = reduce {$_[0]+$_[1]} 0, @numbers || die "Chaos!!";

Note with the || that way, it'll die immediately if @numbers is empty,
even before destroying the universe.

Yes, but why are you passing the size of the array in there?

--tom

Re: RFC 76 (v2) Builtin: reduce

2000-09-19 Thread Tom Christiansen


Why not just check @numbers?

--tom

Re: RFC 76 (v2) Builtin: reduce

2000-09-19 Thread Tom Christiansen


Following Glenn's lead, I'm in the process of RFC'ing a new null()
keyword and value 

As though one were not already drowning in a surfeit of subtly
dissimilar false values.

--tom

Re: RFC 76 (v2) Builtin: reduce

2000-09-19 Thread Tom Christiansen


Ummm...Maybe I'm missing something, but how does reduce() know the
difference between

$sum = reduce ^_+^_, 0, @values;

unshift @values, 0;
$sum = reduce ^_+^_, @values;

You know, I really find it much more legible to consistently write
these sorts of thing with braces around their code block, just as

@x = map { $_ * 3 }, 4, 5;

is infinitely better than 

@x = map $_ * 3, 4, 5;

--tom

Re: RFC 12 (v2) variable usage warnings

2000-09-19 Thread Tom Christiansen


The warning for the use of an unassigned variable should be "use of
uninitialized variable C$x". 

The problem with that idea, now as before, is that this check happens 
where Perl is looking at a value, not a variable.  Even were it possible
to arduously modify Perl to handle explicitly named simple variables,
there's much more to consider.

if ( fx() == fy() ) { }

For one.

--tom

Re: RFC 85 (v2) All perl generated errors should have a unique identifier

2000-09-19 Thread Tom Christiansen


Currently many programs handle error returns by examining the text of
the error returned in $@. This makes changes in the text of the error
message, an issue for the backwards compatibility police.

eval {  fn() };

if ($@ == EYOURWHATHURTS) {  } 

sub fn { die "blindlesnot" }


--tom

Re: RFC 263 (v1) Add null() keyword and fundamental data type

2000-09-19 Thread Tom Christiansen


Currently, Perl has the concept of Cundef, which means that a value is
not defined. One thing it lacks, however, is the concept of Cnull,
which means that a value is known to be unknown or not applicable. These
are two separate concepts.

No, they aren't.

--tom

Re: RFC - Interpolation of method calls

2000-09-18 Thread Tom Christiansen


I doubt anyone's arguing that they're not function calls. What I find
"surprising" is that Perl doesn't DWIM here. It doesn't encourage data
encapsulation or try to make it easy:

   my $weather = new Schwern::Example;
   print "Today's weather will be $weather-{temp} degrees and sunny.";
   print "And tomorrow we'll be expecting ", $weather-forecast;

You are wicked and wrong to have broken inside and peeked at the
implementation and then relied upon it.

If method calls interpolated, this would be easier. Instead, it
encourages you to provide direct hash access to your data since it's
much easier to use that way.

I find myself wanting to say:

   print "Thanks, $cgi-param('name') for your order!";
   print "It matched" if /$config-get_expression/;

Oh joy: now Perl has nested quotes.  I *hate* nested quotes.
They're terrible.  See the shell for how icky this is.

Rather than:

   print "Thanks, " . $cgi-param('name') . " for your order";

What's the big deal?  How does it hurt you to do that?  And why
are you catting it instead of simply passing a list?

--tom

Re: RFC - Interpolation of method calls

2000-09-18 Thread Tom Christiansen


As Nate pointed out:  print "$hash-{'f'.'oo'}" already works fine and
the world spins on.

That is no argument for promoting illegibility.

--tom

Re: RFC 252 (v1) Interpolation of subroutines

2000-09-18 Thread Tom Christiansen


Subroutines calls should interpolate in double-quoted strings and similar
contexts.

print "Sunset today is at sunset($date)";

interpolates to:

print 'Sunset today is at '.sunset($date);

Huh?  And what if it's a built-in?  What if it's not quite a built-in,
but an import?  What if you don't *know* whether it's a built-in?

I cannot but wonder what kind of childhood abuse leads programmers
to expect that double quotes shouldn't count for squat anymore.
If you don't like 'em this much, you should quit using them.

--tom

Re: RFC 252 (v1) Interpolation of subroutines

2000-09-18 Thread Tom Christiansen


Surely the next request will be to make anything that works outside
of quotes work inside of them, completely erasing the useful visual
distinction.  Why should operators, after all, be any different
from functions?

print "I have Fooey-fright($n) frobbles.\n";
print "I have snaggle($n) frobbles.\n";
print "I have abs($n) frobbles.\n";
print "I have $x+$y frobbles.\n";

What's the use of quotes these days anyway?

--tom

Re: Beefier prototypes (was Re: Multiple for loop variables)

2000-09-18 Thread Tom Christiansen



[This somewhat elderly draft was found lying about an edit
 buffer, but I do not believe it was ever sent yet.]

Now, the possibility to either pass individual scalars to a sub, or an
array, (or several arrays, or a mixture of arrays and scalars) and Perl
treating them as equivalent, that is pretty much the most important
feature of Perl. IMO. Perl would not be Perl without it.

Well, "most important" is an interestingly strong way of phrasing it.

But how to deal with variadicity in an intuitive fashion is hard.
You seem to have to sacrifice compile-time knowledge, or else
programmer-convenience.  Tim Bunce had some ideas on this once.

I still almost always end up first using no protos and then employing
extensive run-time comparisons, such as this sequence might illustrate:

confess "need args" unless @_;

confess "need even args" unless @_ % 2 == 0;

confess "keys mustn't be refs" 
if grep { ref },   @_[map {  2*$_} 0.. int($#_/2)] }

confess "values must be hashrefs" 
if grep { reftype($_) ne 'HASH' }, @_[map {1+2*$_} 0.. int($#_/2)] }

confess "values must be Frobulants" 
if grep { $_-isa("Frobulant")  }, @_[map {1+2*$_} 0.. int($#_/2)] }

I should like to see the context coercer née prototype that satisfies
criteria such as these.  Yes, I cannot imagine that Damian doesn't already have
a syntax for such :-) but what about compile-time versus run-time issues?

Could the prototype people please report whether Tim Bunce's issues with 
prototypes have been intentionally/adequately addressed?


--tom

Re: RFC 244 (v1) Method calls should not suffer from the action on a distance

2000-09-18 Thread Tom Christiansen


   foo-bar($baz, $coon)
 should be made synonymous with
   foo-bar $baz, $coon
 
 I can see no ambiguity in this call, but it not always works with Perl5.

Arrow invocation does not a listop make.  Only indirect object invocation
style does that.

print STDOUT $foo, $bar, $glarch;

is a list op.

STDOUT-print $foo, $bar, $glarch;

is not, and, in fact, is a syntax error.  You *must* use parens for
the arrow invocation's arguments.  You *may* use them with I/O style.

--tom

Re: $a in @b (RFC 199)

2000-09-18 Thread Tom Christiansen

From: Tom Christiansen [mailto:[EMAIL PROTECTED]]
 From: Jarkko Hietaniemi

 I find this urge to push exceptions everywhere quite sad.

 Rather. 

 Languages that have forgotten or dismissed error returns, turning
 instead to exceptions for everything in an effort to make the code
 "safer", tend in fact to produce code that is tedious and annoying.

There seems to be some general consensus that some people would like to be
able to short-circuit functions like grep. Do you see no need for the code
block equivalent of Cnext/Clast/Credo?

What, you mean like 

Loop controls don't work in an Cif or Cunless, either, since
those aren't loops.  But you can always introduce an extra set
of braces to give yourself a bare block, which Idoes count
as a loop.

if (/pattern/) {{
last if /alpha/;
last if /beta/;
last if /gamma/;
# do something here only if still in if()
}}

--tom

1 2 3 >

1 - 100 of 273 matches

Mail list logo