Re: Perl 6 Summary for week ending 20020728
pdcawley [EMAIL PROTECTED] writes: Bugger, I used Lquestionnaire|... and pod2text broke it. http:[EMAIL PROTECTED]/msg10797.html perlpodspec sez you can't use L...|... with a URL, and I'm guessing that I just didn't look at that case when writing the parsing code in pod2text because of that. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: library assumptions
Melvin Smith [EMAIL PROTECTED] writes: I would expect that should be fine, stdarg is one of the 4 headers that are guaranteed by ANSI C89 even on a free standing environment (read embedded targets, etc.) Its integral to C, and if you don't have it, I suppose the question would be why we should port to it. Basically, whether you can use stdarg.h is directly tied to whether you want to support KR compilers. If you want to support KR, you have to allow for the possibility of varargs.h instead. If you are willing to require an ANSI C compiler (which I believe was the decision already made), stdarg.h is safe. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Misc portability cleanups
Brent Dax [EMAIL PROTECTED] writes: Ouch. They actually expect you to be able to do anything useful without the other headers? It might actually be easier to just implement the headers ourselves on platforms that don't have them... The provisions for free-standing implementations in the C standard are primarily aimed at people using C as portable assembler for writing embedded applications. Viewed in that light, it makes sense that your microwave may not have stdio. :) The idea is to define a subset of C that can still be called C that doesn't include all of the extra stuff that's unnecessary when all you're ever doing is reading and writing bytes from special memory locations. I think it's a real mistake for Microsoft to not provide a hosted implementation for something like WinCE. But that's just random complaining; unfortunately, one can't solve portability routines by expressing one's opinion that the decisions various platforms made were stupid. :) -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Misc portability cleanups
Melvin Smith [EMAIL PROTECTED] writes: 5- Other misc includes that should be wrapped in ifdefs are: sys/types.h, sys/stat.h, fcntl.h (btw parrot.h includes fcntl.h twice, once inside an ifdef and then by default). What platform doesn't have sys/types.h? It's one of the few headers that I've *never* seen wrapped in ifdef even in code meant to compile on extremely strange systems. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: typedefs
Brent Dax [EMAIL PROTECTED] writes: Parrot_Foo for external names, FOO for internal names, struct parrot_foo_t for struct names. POSIX reserves all types ending in _t. I'm not sure that extends to struct tags, but it may still be better to use _s or something else instead to avoid potential problems. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: typedefs
Brent Dax [EMAIL PROTECTED] writes: Russ Allbery: # POSIX reserves all types ending in _t. I'm not sure that extends to # struct tags, but it may still be better to use _s or something else # instead to avoid potential problems. My understanding is that it only reserves types that start with 'int' or 'uint' and end with '_t'. You might wanna check that, though... I believe that's the C standard. I thought POSIX went farther and reserved everything. Interestingly, however, it looks like the just released current version of POSIX has backed away from that and instead only reserves certain prefixes depending on the header files included and reserves posix_*, POSIX_*, and _POSIX_* everywhere. Besides, what's the probability it'll be a problem if we prefix all struct names with 'parrot_'? Very low. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: on parrot strings
Hong Zhang [EMAIL PROTECTED] writes: I disagree. The difference between 'e' and 'e`' is similar to 'c' and 'C'. No, it's not. In many languages, an accented character is a completely different letter. It's alphabetized separately, it's pronounced differently, and there are many words that differ only in the presence of an accent. Changing the capitalization of C does not change the word. Adding or removing an accent does. The Unicode compability equivalence has similar effect too, such as half width letter and full width letter. You'll find that the Unicode compatibility equivalence does nothing as ill-conceived as unifying e and e', for very good reason because that would be a horrible mistake. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: on parrot strings
Bryan C Warnock [EMAIL PROTECTED] writes: On Monday 21 January 2002 16:43, Russ Allbery wrote: Changing the capitalization of C does not change the word. Er, most of the time. No, pretty much all of the time. There are differences between proper nouns and common nouns, but those are differences routinely quashed as a typesetting decision; if you write both proper nouns and common nouns in all caps as part of a headline, the lack of distinction is not considered a misspelling. Similarly, if you capitalize the common noun because it occurs at the beginning of the sentence, that doesn't transform its meaning. Whereas adding or removing an accent is always considered a misspelling, at least in some languages. It's like adding or removing random letters from the word. re'sume' and resume are two different words. It so happens that in English re'sume' is a varient spelling for one meaning of resume. I don't believe that regexes should try to automatically pick up varient spellings. Should the regex /aerie/ match /eyrie/? That makes as much sense as a search for /resume/ matching /re'sume'/. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: brief RANT (on warnings)
Dan Sugalski [EMAIL PROTECTED] writes: So, I'm turning off the unused parameter warning for now to shut the .ops file compiles up. After that point, all submitted patches must generate no more warnings than were currently being generated, and all submitted patches must *not* generate warnings in the areas they patch. The warnings about unused variables quickly become useful again if you're willing to tag things with __attribute__((__unused__)) (generally wrapped in a convenient macro). I use: /* __attribute__ is available in gcc 2.5 and later, but only with gcc 2.7 could you use the __format__ form of the attributes, which is what we use (to avoid confusion with other macros). */ #ifndef __attribute__ # if __GNUC__ 2 || (__GNUC__ == 2 __GNUC_MINOR__ 7) # define __attribute__(spec) /* empty */ # endif #endif /* Used for unused parameters to silence gcc warnings. */ #define UNUSED __attribute__((__unused__)) and then writing things like: int foo(int bar UNUSED) actually serves to add additional documentation as well as shutting up warnings. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: recent win32 build errors
Dan Sugalski [EMAIL PROTECTED] writes: This is jogging my memory some. Jarkko passed on his gcc switch list from hell to me a while back--let me dig it out and add them in. This is *not* going to be pretty for the next few days... Here are some notes on what I've managed to live with: ## Warnings to use with gcc. Default to including all of the generally ## useful warnings unless there's something that makes them unsuitable. In ## particular, the following warnings are *not* included: ## ##-ansi Requires messing with feature test macros. ##-Wconversion Too much unsigned to signed noise. ##-Wredundant-decls Too much noise from system headers. ##-Wtraditional We assume ANSI C, so these aren't interesting. ##-Wundef Too much noise from system macros. ## ## Some may be worth looking at again once a released version of gcc doesn't ## warn on system headers. The warnings below are in the same order as ## they're listed in the gcc manual. We suppress warnings for long long ## because of lib/snprintf.c; all uses of long long should be hidden behind ## #ifdef HAVE_LONG_LONG. WARNINGS= -pedantic -Wall -W -Wshadow -Wpointer-arith \ -Wbad-function-cast -Wcast-qual -Wcast-align \ -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes \ -Wnested-externs -Wno-long-long The comment is a little dated, as I believe gcc 3.0 no longer warns on system headers, so -Wredundant-decls possibly could get pulled back in. -Wundef is a style thing. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Windows compile problems
Dan Sugalski [EMAIL PROTECTED] writes: At 08:59 AM 10/25/2001 -0400, Andy Dougherty wrote: Then we probably should change Parrot's name of BOOL. I'd suggest Bool_t, modeled after perl5's Size_t (and similar types). Sounds like a good idea. IIRC, all types ending in _t are reserved by POSIX and may be used without warning in later versions of the standard. (This comes up not infrequently in some of the groups I read, but I unfortunately don't have a copy of POSIX to check for myself and be sure.) -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Revamping the build system
Robert Spier [EMAIL PROTECTED] writes: On Tue, 2001-10-23 at 20:52, Russ Allbery wrote: Dan Sugalski [EMAIL PROTECTED] writes: Once we build miniparrot, then *everything* can be done in perl. Having hacked auto* stuff, I think that'd be a good thing. (autoconf and friends are unmitigated evil hacks--people just don't realize how nasty they are because they never need to look inside) I've looked inside a lot, and I definitely do not agree. But maybe you've not seen autoconf 2.50 and later? Russ- Could you expand on this? 2.50 seems to be at least 80% the same as the previous versions, with very similar m4 syntax, some new macros added, some old macros removed, some old bugs fixed, some new bugs added. I'm not sure what there is to expand on. I've looked at 2.50, and it definitely doesn't look like an unmitigated evil hack to me. It looks like a collection of tests for various standard things that packages need to know to compile, put together about as well as I can imagine doing that for the huge variety of tests one has to deal with. I haven't worked with metaconfig instead, but I have to say that I find it way easier to deal with autoconf than to deal with metaconfig. (I know this is heresy in the Perl community. *grin*) I've maintained the autoconf configuration for a reasonably large package (INN), but not one that requires portability to Windows -- at the same time, last time I checked, Configure doesn't really deal with portability to non-Unix systems either, being a shell script itself. Perl seemed to just bypass it in favor of pre-generated results. But I could be behind on the state of the art. The shell script it generates is butt-ugly, but that's the price of insane portability. I'm not as fond of automake or libtool, but libtool at least lives up to what it says it does, and takes care of a bunch of portability issues that are rather obscure and difficult to deal with. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Revamping the build system
Russ Allbery [EMAIL PROTECTED] writes: I'm not sure what there is to expand on. I've looked at 2.50, and it definitely doesn't look like an unmitigated evil hack to me. It looks like a collection of tests for various standard things that packages need to know to compile, put together about as well as I can imagine doing that for the huge variety of tests one has to deal with. I haven't worked with metaconfig instead, but I have to say that I find it way easier to deal with autoconf than to deal with metaconfig. That was horribly unclear. What I meant to say was that I find it way easier to deal with autoconf output than metaconfig output. (As part of my day job, I maintain a site-wide installation of hundreds of packages here at Stanford.) Perl at least does have a non-interactive way of running configure, making it about as good as an autoconf configure script. Other packages that use metaconfig, like elm and trn, are absolutely obnoxious to compile. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Revamping the build system
Dan Sugalski [EMAIL PROTECTED] writes: Making the build system (past the initial bootstrap of microparrot) all perl would make building modules on systems without a build system of their own (like, say, the Mac, as I found trying to install Coy and Quantum::Superposition on the 5.6.1 alpha the other night... :) and it'll let us skip some of the more awkward bits of make. I can certainly see the features of that approach. It just seems like quite a lot of work. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Revamping the build system
Brent Dax [EMAIL PROTECTED] writes: What about little inline things? AUTO_OP sleep(i|ic) { #ifdef WIN32 Sleep($1*1000); #else sleep($1); #endif } This reminds me. gcc is slowly switching over to writing code like that as: if (WIN32) { Sleep($1*1000); } else { sleep($1); } or the equivalent thereof instead of using #ifdef. If you make sure that the values are defined to be 0 or 1 rather than just defined or not defined, it's possible to write code like that instead. This has the significant advantage that the compiler will continue to syntax-check the code that isn't active on the build platform, making it much less likely that one will get syntax errors in the code not active on the platform of the person doing the patching. The dead-code-elimination optimization phase of any decent compiler should dump the dead paths entirely. It may not be possible to use this in cases where the not-taken branch may refer to functions that won't be prototyped on all platforms, depending on the compiler, but there are at least some places where this technique can be used, and it's worth watching out for. (In the case above, I'd probably instead define a sleep function on WIN32 that calls Sleep so that the platform differences are in a separate file, but there are other examples of things like this that are better suited to other techniques.) -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Revamping the build system
Simon Cozens [EMAIL PROTECTED] writes: On Tue, Oct 23, 2001 at 09:05:33AM -0400, Andy Dougherty wrote: While imperfect and Unix-centric, we can (and should!) learn a lot from auto{conf,make} and metaconfig. *nod*. I just had a look around, and most of the other languages are using autoconf. But then, most of the other languages don't run on upwards of 70 platforms. :( I wonder how serious we need to be about keeping that goal. autoconf and libtool give you basically every version of Unix and Windows under cygwin pretty much for free. That's a nice start. The other platforms, like MacOS, Windows with MS compilers, and so forth probably will need a separate build system, but that's not really new. Tcl just has autoconf plus separate build systems for those platforms where autoconf won't run. There's a lot to be said for not re-inventing the wheel. Taking a good look at the facilities for dynamic loading provided by libtool before rolling our own again may also be a good idea; it's designed to support dynamically loadable modules. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Revamping the build system
Dan Sugalski [EMAIL PROTECTED] writes: Once we build miniparrot, then *everything* can be done in perl. Having hacked auto* stuff, I think that'd be a good thing. (autoconf and friends are unmitigated evil hacks--people just don't realize how nasty they are because they never need to look inside) I've looked inside a lot, and I definitely do not agree. But maybe you've not seen autoconf 2.50 and later? -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: bytecode and sizeof(IV)
Simon Cozens [EMAIL PROTECTED] writes: Yep, and the latest pedantic patch doesn't help. Also, I'm seeing this, which is weird: ld -ldbm -ldb -lm -liconv -o test_prog global_setup.o interpreter.o parrot.o register.o basic_opcodes.o memory.o bytecode.o string.o strnative.o test_main.o Definitely bugs in Configure there; cc has to be used as the linker or -lc isn't added (and possibly some of the other crt.o files too), and libraries have to be after all the object files. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Bytecode safety
Gibbs Tanton - tgibbs [EMAIL PROTECTED] writes: I would vote no. HOWEVER, I would think that the user should have the option to turn on checking for malformed bytecode (i.e. Safe mode). In the default case, I think the bytecode should be assumed well formed and no extra checking be performed. Something akin to gcc's --enable-checking strikes me as a really good idea. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: More character matching bits
Dan Sugalski [EMAIL PROTECTED] writes: Should perl's regexes and other character comparison bits have an option to consider different characters for the same thing as identical beasts? I'm thinking in particular of the Katakana/Hiragana bits of japanese, but other languages may have the same concepts. I think canonicalization gets you that if that's what you want. I definitely think that Perl should be able to do all of NFD, NFC, NFKD, and NFKC canonicalization. NFC will collapse most different characters for the same thing to a single character and get rid of most of the compatibility characters for you. NFKC will go further and do stuff like getting rid of superscripts and the like. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: More character matching bits
Dan Sugalski [EMAIL PROTECTED] writes: At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote: Dan Sugalski [EMAIL PROTECTED] writes: Should perl's regexes and other character comparison bits have an option to consider different characters for the same thing as identical beasts? I'm thinking in particular of the Katakana/Hiragana bits of japanese, but other languages may have the same concepts. I think canonicalization gets you that if that's what you want. I don't think canonicalization should do this. (I really hope not) This isn't really a canonicalization matter--words written with one character set aren't (AFAIK) the same as words written with the other, and which alphabet you use matters. (Which sort of argues against being able to do this, I suppose...) I guess I don't know what the definition of the same thing you're using here is. I definitely think that Perl should be able to do all of NFD, NFC, NFKD, and NFKC canonicalization. C D at least. KC KD are doable as well, though I'm not sure when you'd want them. USEFOR is looking at requiring NFKC canonicalization for newsgroup names, to get rid of some of the odder stuff, so Usenet code will potentially need it. I believe the DNS folks are also looking at it for IDN. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Should we care much about this Unicode-ish criticism?
Dan Sugalski [EMAIL PROTECTED] writes: At 05:20 PM 6/7/2001 +, Nick Ing-Simmons wrote: One reason perl5.7.1+'s Encode does not do asian encodings yet is that the tables I have found so far (Mainly Unicode 3.0 based) are lossy. Joy. Hopefully by the time we're done there'll be a full implementation. This makes me even more determined to support non-ASCII, non-Unicode encodings in the core if we want to handle non-western text. Incidentally, one of the places that the largest amount of work that I'm aware of in this area has been done is in the iconv support in current versions of glibc. That includes (in current CVS) something pretty close to full bidirectional mappings between a huge variety of local character sets and Unicode. May be worth looking at their code, although unfortunately it can't be incorporated directly into Perl. They may have already dealt with the issues of lossiness or lack thereof. As I recall from reading mailing list traffic, one of the major things that was recently added were a variety of tests to the glibc test suite to ensure that round-trip conversions through Unicode were lossless where possible. The other advantage of looking at glibc's approach is that they get tons of bug reports about obscure things and conventions for using particular characters that aren't obvious from the specifications. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Should we care much about this Unicode-ish criticism?
Nicholas Clark [EMAIL PROTECTED] writes: What happens if unicode supported uppercase and lowercase numbers? [I had a dig about, and it doesn't seem to mention lowercase or uppercase digits. Are they just a typography distinction, and hence not enough to be worthy of codepoints?] Damned if I know; I didn't know there even was such a thing. Uppercase vs. lowercase for letters is more than a typographic distinction for many languages; there are words in English, for example, with a different meaning depending on whether they're capitalized (since capitalization indicates a proper noun). If there is some similar distinction of meaning for numbers in some language, I suppose that Unicode may add such a thing; to date, there doesn't appear to be any concept of uppercase or lowercase for anything but letters. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Should we care much about this Unicode-ish criticism?
Dan Sugalski [EMAIL PROTECTED] writes: At 12:40 PM 6/5/2001 -0700, Russ Allbery wrote: (As an aside, UTF-8 also is not an X-byte encoding; UTF-8 is a variable byte encoding, with each character taking up anywhere from one to six bytes in the encoded form depending on where in Unicode the character falls.) Have they changed that again? Last I checked, UTF-8 was capped at 4 bytes, but that's in the Unicode 3.0 standard. Yes, it changed with Unicode 3.1 when they started allocating characters from higher planes. Far and away the best reference for UTF-8 that I've found is RFC 2279. It's much more concise and readable than the version in the Unicode standard, and is more aimed at implementors and practical considerations. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Should we care much about this Unicode-ish criticism?
Bryan C Warnock [EMAIL PROTECTED] writes: Some additional stuff to ponder over, and maybe Unicode addresses these - I haven't been able to read *all* the Unicode stuff yet. (And, yes, Simon, you will see me in class.) Some languages don't have upper or lower case. Are tests and translations on caseless characters true or false? (Or undefined?) Caseless characters should be guaranteed unchanged by conversion to upper or lower case, IMO. Case is a normative property of characters in Unicode, so case mappings should actually be pretty well-defined. Note that there are actually three cases in Unicode, upper, lower, and title case, since there are some characters that require the third distinction (stuff like Dz is generally used as an example). Should the same Unicode character, when used in two different languages, be string equivalent? The way to start solving this whole problem is probably through normalization; Unicode defines two separate normalizations, one of which collapses more similar characters than the other. One is designed to preserve formatting information while the other loses formatting information. (The best example of how they differ is that one leaves the ffi ligature alone and the other breaks it down into three separate characters.) Perl should allow programmers to choose their preferred normalization schemes or none at all. (There are really four normalization schemes; in two of them, you leave things fully decomposed, and in the other two you recompose characters as much as possible.) -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Should we care much about this Unicode-ish criticism?
Simon Cozens [EMAIL PROTECTED] writes: On Tue, Jun 05, 2001 at 03:27:03PM -0700, Russ Allbery wrote: Caseless characters should be guaranteed unchanged by conversion to upper or lower case, IMO. I think Bryan's asking more about \p{IsUpper} than uc(). Ahh... well, Unicode classifies them for us, yes? Lowercase, Uppercase, Titlecase, and Other, IIRC. So a caseless character wouldn't show up in either IsLower or IsUpper. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Should we care much about this Unicode-ish criticism?
NeonEdge [EMAIL PROTECTED] writes: This is evident in the Musical Symbols and even Byzantine Musical Symbols. Are these character sets more important than the actual language character sets being denied to the other countries? Are musical and mathematical symbols even a language at all? At the same time as 246 Byzantine Musical Symbols and 219 Musical Symbols were added, 43,253 Asian language ideographs were added. I fail to see the problem. Musical and mathematical symbols are certainly used more frequently than ancient Han ideographs that have been obsolete for 2,000 years, and it's not like the ideographs are having major difficulties being added to Unicode either. If the author of the original paper referred to here thinks there are still significant characters missing from Unicode, he should stop whining about it and put together a researched proposal. That's what the Byzantine music researchers did, and as a result their characters have now been added. This is how standardization works. You have to actually go do the work; you can't just complain and expect someone else to do it for you. In the meantime, the normally-encountered working character set of modern Asian languages has been in Unicode from the beginning, and currently the older and rarer characters and the characters used these days only in proper names are being backfilled at a rate of tens of thousands per Unicode revision. How this can then be described as ignoring Asian languages boggles me beyond words. There are a lot of characters. It takes time. Rome wasn't built in a day. It seems to me that Unicode, in it's present form, although a valiant attempt, is just a 'better' ascii, and not a complete solution. It seems to me that you haven't bothered to go look at what Unicode is actually doing. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Should we care much about this Unicode-ish criticism?
Larry Wall [EMAIL PROTECTED] writes: Doesn't really matter where they install the artificial cap, because for philosophical reasons Perl is gonna support larger values anyway. It's just that 4 bytes of UTF-8 happens to be large enough to represent anything UTF-16 can represent with surrogates. So they refuse to believe in anything longer than 4 bytes, even though the representation can be extended much further. (Perl 5 extends it all the way to 64-bit values, represented in 13 bytes!) That's probably unnecessary; I really don't expect them to ever use all 31 bytes that the IETF-standardized version of UTF-8 supports. I don't know if Perl will have a utf16 that is distinguised from UTF-16. I wouldn't bother spending any time on UTF-16 beyond basic support for converting away from it. It combines the worst of both worlds, and I don't expect it to be used much now that they've buried the idea of keeping Unicode to 16 bits. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Should we care much about this Unicode-ish criticism?
Russ Allbery [EMAIL PROTECTED] writes: That's probably unnecessary; I really don't expect them to ever use all 31 bytes that the IETF-standardized version of UTF-8 supports. 31 bits, rather. *sigh* But given that, modulo some debate over CJKV, we're getting into *really* obscure stuff already at only 94,140 characters, I'm guessing that there would have to be some really major and fundamental changes in written human communication before something more than two billion characters are used. Which doesn't mean rule out the possibility of ever expanding, since one should always leave that option open, but expending coding effort on it isn't worth it. Particularly since extending UTF-8 to more than 31 bits requires breaking some of the guarantees that UTF-8 makes, unless I'm missing how you're encoding the first byte so as not to give it a value of 0xFE. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Should we care much about this Unicode-ish criticism?
Larry Wall [EMAIL PROTECTED] writes: Russ Allbery writes: Particularly since extending UTF-8 to more than 31 bits requires breaking some of the guarantees that UTF-8 makes, unless I'm missing how you're encoding the first byte so as not to give it a value of 0xFE. The UTF-16 BOMs, 0xFEFF and 0xFFFE, both turn out to be illegal UTF-8 in any case, so it doesn't much matter, assuming BOMs are used on UTF-16 that has to be auto-distinguished from UTF-8. (Doing any kind of auto-recognition on 16-bit data without BOMs is problematic in any case.) Yeah, but one of the guarantees of UTF-8 is: - The octet values FE and FF never appear. I can see that this property may not be that important, but it makes me feel like things that don't have this property aren't really UTF-8. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: PDD 2nd go: Conventions and Guidelines for Perl Source Code
Larry Wall [EMAIL PROTECTED] writes: Dan Sugalski writes: 1) The indentation should be all tabs or all spaces. No mix, it's a pain. This will devolve into an editor war, and I don't think it's a real issue. I'm not positive that it will. I can provide the magic incantations that work for both emacs and vim, and most other editors have similar customization. In fact, the problem is largely caused by emacs (where it's easiest to fix), since emacs in a default configuration tends to gratiutiously change spaces into tabs. One thing that may be seriously worth considering is actually putting a local variables block into every file that sets up the proper settings for the most commonly used editors. Both emacs and vim support such a thing, and it's about five lines of boilerplate that can be put at the end of the file and will then automatically make those editors just Do The Right Thing. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Markup wars (was Re: Proposal for groups)
Bennett Todd [EMAIL PROTECTED] writes: My own personal favourite for archival format would be to stick with POD until and unless we can cons up something even Plainer than POD. I've got this dream that someday we'll be able to take something --- perhaps based on Damian's Text::Autoformat --- and use it to parse purely plain ASCII text, formatted nicely for screen display, with no markup at all, and garnish it with markup allowing it to be automatically translated into nice sexy HTML, or SGML according to various other DTDs, or XML, or POD, or the man or mandoc troff macros, or LaTeX, or whatever. I've fiddled with this before and can do text to HTML; the rest is just a question of picking different backends and shouldn't be *too* hard. All the heuristics for parsing text are inherently fragile, but if you follow a standard text formatting style, it works reasonably well. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: SvPV*
Dan Sugalski [EMAIL PROTECTED] writes: More often vice versa. INN embeds perl, for example, and uses it for spam detection. When it builds scalars for perl to use, it uses the copy of the article already in memory to avoid copies. (Given the volume of news and the size of some news articles this can save a lot) You wouldn't want perl messing with it in that case, since the string memory really isn't perl's to manage. INN marks such "windowed" scalars as read-only, which I think only makes sense for that situation. I guess I could think of cases where you might want to do in-place modifications without changing the allocation, but that sounds a lot iffier. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
Dan Sugalski [EMAIL PROTECTED] writes: C's vararg handling sucks in many sublime and profound ways. It does, though, work. If we declare in advance that all C-visible perl functions have an official parameter list of (...), then we can make it work. The calling program would just fetch function pointers from us somehow, and do the call in. Can't. ISO C requires that all variadic functions take at least one named parameter. The best you can do is something like (void *, ...). -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 301 (v1) Cache byte-compiled programs and modules
Chaim Frenkel [EMAIL PROTECTED] writes: "RA" == Russ Allbery [EMAIL PROTECTED] writes: RA This will be completely impossible to implement in some installation RA environments, such as AFS or read-only remote NFS mounts. I really RA don't like software that tries to play dynamic compilation tricks; RA please just compile at installation time and then leave it alone. This isn't really a problem. I'm sorry, but yes, it is. Purify does this already. And Purify is a pain in the neck to install in AFS. It's one of the hardest software packages that we have to support because of the cache directory. We tried to maintain it like it's supposed to work, and it failed miserably. We finally had to give up completely and wrap Purify to redirect the cache directory into /tmp, losing the entire benefits of the cache directory, because anything else was just completely unworkable. I cannot emphasize too much how bad of an idea this is. It sounds great if you've got a single machine with a single local read-write file system and you trust all your users with a world-writeable directory; in *any* other situation, it's a recipe for severe annoyance, if not disaster. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 313 (v1) Perl 6 should support I18N and L10N
Dan Sugalski [EMAIL PROTECTED] writes: The issues I was thinking of have come up mainly from the Freeciv lists, and concern ordering of substituted output and suchlike things. Does it handle the case where in english you see: error %s in file %s but in some other language the two substitutions are swapped? ISO C could actually handle this. You change the above to: file %2$s has error %1$s (I think that's the right syntax). Most libcs already handle this properly; it's just a matter of the translators knowing that they can do this and avoiding printf constructs that make this too hard to do. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 313 (v1) Perl 6 should support I18N and L10N
Bart Lateur [EMAIL PROTECTED] writes: Eh? Are you saying that Perl's error message should be adapted to the language of the computer user? Yes. Most major free software packages already do this. I don't like that. You can always not set the environment variables. How would Perl decide on what language to use? Some environment variable? Yes, there's a POSIX standard for this. And what about programmer supplied error messages? Should the programmer supply lots of languge versions as well? That's an interesting problem; I think there definitely should be mechanisms available to do this if so wished. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
libcrypt and the crypt builtin
This message on perl5-porters brought an idea to mind: Paul Marquess [EMAIL PROTECTED] writes: From: Marc Lehmann [mailto:[EMAIL PROTECTED]] Actually, the perl binary links against waay too many libraries. As a rule of thumb, the perl binary is linked against all libraries any extensions could need. At least on ELF platforms, this is unnecessary bloat and slows down loading noticably (at least for me). My perl (on linux) is linked against: libdl libm libc libcrypt Err, I think the perl binary might just need the first three. The last is needed for the crypt built-in. Given that crypt isn't the most widely used builtin in the world and since we're proposing migration of some things to modules anyway, how about moving crypt? It won't save much at all in the way of code size, but it *would* mean that Perl would link against one fewer library on some platforms, which is frequently a plus in load time. Don't have time at the moment to write an RFC, so if someone else wants to, be my guest -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 155 - Remove geometric functions from core
Andy Dougherty [EMAIL PROTECTED] writes: I'm sure the glibc folks indeed work very hard at this and are largely successful. I also know, however, that over the past couple of years or so, I've had to recompile nearly all of my applications on several occasions when I've upgraded glibc. Other times, glibc upgrades have gone without a hitch. It's probably my fault and probably somewhere deep in my personal library I'm incorrectly fiddling with stdio internals or something, but I just wanted to offer a counter data point that doing this sort of this robustly is, indeed, very hard. It may not be your fault... my understanding is that glibc 2.0 really didn't do things right, and that glibc 2.1 did break some binary compatibility to fix some serious bugs. It's probably only fair to start holding glibc to this standard from 2.2 and up. Perl *should* have a *much* easier task than glibc, given that our interface is positively tiny compared to the entire C library. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: Removing stuff to shareable thingies
Dan Sugalski [EMAIL PROTECTED] writes: It's not unreasonable to expect this sort of feature to possibly be used for more esoteric extensions to the perl core or commonly and heavily used extensions. I wouldn't, for example, want to always load in DBD::Oracle or a full complex math library (complete with FFT and gaussian filters) every time I was ripping through a text file. If the feature exists as part of the design from the start, it puts certain requirements for the lexer/parser and core interpreter that will make modularizing things a neccessity and thus functional for those situations where it is reasonable to do it. Yes, this part I agree with... it's pretty close to our current dynamic module system, though, isn't it? Or is it the on-demand part specifically that would be new? Shared libraries opened at run-time make sense to me for things that make a large and noticeable thud (such as DBD::Oracle with all the accompanying Oracle shared libraries and whatnot) or things that are distributed independently (which is where we pick up a whole bunch of stuff probably not large enough to warrant a shared library by itself except that it just makes things infinitely more convenient). -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 155 - Remove geometric functions from core
David L Nicol [EMAIL PROTECTED] writes: This is what I was talking about when I suggested the language maintain a big list of all the addresses of each function, and after the function gets loaded or compiled it is added to the big list, and after this stage the placeholder in the op can be replaced with a longjump. Since the shared segments live at different addresses in different processes (or should I have stayed awake through that lecture) I'm not sure I'm completely following what you're arguing for here, but be careful not to go too far down the road of duplicating what the dynamic loader already knows how to do. There be dragons; that stuff is seriously baroque. You really don't want to reimplement it. I'd love to see Perl aggressively take advantage of new capabilities in dynamic loaders, though. Among other things, I'll point out that symbol versioning is the way that things like libc manage to be backward compatible while still changing things, and we should probably seriously consider using symbol versioning in a shared libperl.so as a means to provide that much desired and extremely difficult to implement stable API for modules and the XS-equivalent. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 155 - Remove geometric functions from core
Dan Sugalski [EMAIL PROTECTED] writes: On 29 Aug 2000, Russ Allbery wrote: I'd love to see Perl aggressively take advantage of new capabilities in dynamic loaders, though. Among other things, I'll point out that symbol versioning is the way that things like libc manage to be backward compatible while still changing things, and we should probably seriously consider using symbol versioning in a shared libperl.so as a means to provide that much desired and extremely difficult to implement stable API for modules and the XS-equivalent. This is where my lack of strange Unix knowledge comes to the fore. Is this really a problem? It seems to me to be a standard sort of thing to be dealing with. (OTOH, my platform of choice has 20-year-old executables as part of its test suite and a strong engineering bent, so I may be coming at things from a different direction than most folks) Well, it depends on what your goals are, basically. For most shared libraries, people don't take the trouble. Basically, no matter how well you design the API up front, if it's at all complex you'll discover that down the road you really want to *change* something, not just add something new (maybe just add a new parameter to a function). At that point, the standard Perl thing up until now to do is to just change it in a major release and require people to relink their modules against the newer version. And relink their applications that embed Perl. Not a big deal, and that's certainly doable. But it's possible to do more than that if you really want to. The glibc folks have decided to comment to nearly full binary compatibility for essentially forever; the theory is that upgrading libc should never break a working application even if the ABI changes. I'm not familiar with the exact details of how symbol versioning works, but as I understand it, this is what it lets you do. Both the old and the new symbol are available, and newly built applications use the new one while older applications continue to use the previous symbol. That means that all your older binary modules keep working, and if your applications that embed Perl are linked dynamically, you can even upgrade Perl underneath them without having to rebuild them. I'm not sure it's worth the trouble, but it's something to consider. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 155 (v2) Remove mathematic and trigonomic functions from core binary
Stephen P Potter [EMAIL PROTECTED] writes: At this point, should I go ahead and abandon the Math/Trig and/or Sockets ones? I'm still in favor of moving the socket functions into Socket if for no other reason than it may help beat into people's heads that code like: eval 'require "sys/socket.ph"'; eval 'sub SOCK_DGRAM {-f "/vmunix" ? 2 : 1;}' if $@; and $csock = pack('S n a4 x8', 2, 0, $caddr); $ssock = pack('S n a4 x8', 2, $port, $saddr); unless (socket(S,2,SOCK_DGRAM,$UDP_PROTO)) { warn "$0 (socket): $!\n"; close S; return undef; } should be done away with for good. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 99 (v1) Maintain internal time in Modified Julian (not epoch)
Tim Jenness [EMAIL PROTECTED] writes: Of course, "seconds since 1970" is only obvious to unix systems programmers. I disagree; I don't think that's been true for a while. It's certainly familiar, if not obvious, to *any* Unix programmer (not just systems programmers), as it's what time() returns, and pretty much any C programmer will have used that at some point or another. It's also so widespread as to be at least somewhat familiar to non-Unix programmers. Anyway, it doesn't matter; it's a lot more widely used than any other epoch, and epochs are completely arbitrary anyway. What's wrong with it? MJD is doable with current perl 32bit doubles. I use it all the time in perl programs and am not suffering from a lack of precision. Day resolution is insufficient for most purposes in all the Perl scripts I've worked on. I practically never need sub-second precision; I almost always need precision better than one day. If we're aiming at replacing time, it has to provide *at least* second precision, at which point I really don't see the advantage of MJD over Unix time. Why change something that works? Is Perl currently using different epochs on different platforms? If so, I can definitely see the wisdom in doing something about *that* and off-loading the system-local time processing into modules (although I can also see the wisdom in leaving well enough alone). But why not go with the most commonly used and most widely analyzed epoch? -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 99 (v1) Maintain internal time in Modified Julian (not epoch)
Nathan Wiger [EMAIL PROTECTED] writes: Anyway, it doesn't matter; it's a lot more widely used than any other epoch, and epochs are completely arbitrary anyway. What's wrong with it? I think the "What's wrong with it?" part is the wrong approach to this discussion. That's exactly what I disagree with, I think. I don't understand why this would be the wrong approach to the discussion. It seems to me that it follows automatically from "epochs are completely arbitrary anyway." That being said, what we need to say "is it possible UNIX might not be perfect?" (hard to imagine, true... :-). More specifically, "is there something that would work better for putting Perl in Palm pilots, watches, cellphones, Windows and Mac hosts, *plus* everything else it's already in?" How does it make any difference what epoch you use? Why would this make Perl more portable? No, but currently Perl IS forcing Windows, Mac, and BeOS users to understand what the UNIX epoch is. In that case, I don't understand what the difference is between that and forcing those users *plus* Unix users to understand what the MJD epoch is. There's some other advantages to MJD beyond system-independence. But MJD isn't any more system-independent than Unix time. Absolutely nothing about Unix time is specific to Unix; it's just as portable as any other arbitrary epoch. Namely, it allows easy date arithmetic, meaning complex objects are not required to modify dates even down to the nanosecond level. Unix time allows this down to the second level already. If we wanted to allow it down to the nanosecond level through a different interface to return something like TAI64NA or something, that would make sense to me. What doesn't make sense to me is a change of epoch; I just don't see what would be gained. I must be very confused. I don't understand what we gain from MJD dates at all, and the arguments in favor don't make any sense to me; all of the advantages listed apply equally well to the time system we have already. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 99 (v1) Maintain internal time in Modified Julian (not epoch)
Nathan Wiger [EMAIL PROTECTED] writes: The idea would be twofold: 1. time() would still return UNIX epoch time. However, it would not be in core, and would not be the primary timekeeping method. It would be in Time::Local for compatibility (along with localtime and gmtime). 2. mjdate() would return MJD. It _would_ be in core, and it _would_ be the internal timekeeping method. All of the new date functions would be designed to be based off of it. Here's the significant problem that I have with this: It feels very much like it's putting the cart before the horse. Perl is fundamentally a Unix language (portable Unix, to a degree). It's core user base has always been sysadmins and hackers with a Unix-like mindset, regardless of the platform they're using. As an example, I've written literally hundreds of scripts that use Unix time in one way or another; it has innumerable really nice properties and is compatible with all the other programs written in other languages that I have to interact with. By comparison, who uses MJD? Practically no one. It's a theoretically nice time scale, but outside of the astronomy community, how many people even have any idea what it is? This appears to be a proposal to replace a *very* well-known time base with very well-known and commonly-used properties with a time base that practically no one knows or currently uses just because some of its epoch properties make slightly more sense. Unless I'm missing something fumdamental here, this strikes me as a horrible idea. Unix's time representation format has no fundamental problems that aren't simple implementation issues. Negative values represent times before 1970 just fine. The range problem is easily solved by making it a 64-bit value, something that apparently we'd need to do with an MJD-based time anyway. And everyone already knows how it works and often relies on the base being consistent with their other applications. It really doesn't sound like a good idea to change all that. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 99 (v1) Maintain internal time in Modified Julian (not epoch)
Tim Jenness [EMAIL PROTECTED] writes: On 14 Aug 2000, Russ Allbery wrote: Day resolution is insufficient for most purposes in all the Perl scripts I've worked on. I practically never need sub-second precision; I almost always need precision better than one day. MJD allows fractional days (otherwise it would of course be useless). As I write this the MJD is 51771.20833 Floating point? Or is the proposal to use fixed-point adjusted by some constant multiplier? (Floating point is a bad idea, IMO; it has some nasty arithmetic properties, the main one being that the concept of incrementing by some small amout is somewhat ill-defined.) At some level time() will have to be changed to support fractions of a second and this may break current code that uses time() explicitly rather than passing it straight to localtime() and gmtime(). Agreed. I guess I don't really care what we use for an epoch for our sub-second interface; I just don't see MJD as obviously better or more portable. I'd actually be tentatively in favor taking *all* of the time stuff and removing it from the core, under the modularity principal, but I don't have a firm enough grasp of where the internals use time to be sure that's a wise idea. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 46 (v1) Use features of portable, free compilers
=head1 TITLE Use features of portable, free compilers and libraries [...] =head1 ABSTRACT There is no sane reason why *nix vendors continue to push proprietary compilers and system libraries on their customers when better, free replacements could be had for little effort. Eventually, they will realize this and start porting GNU Libc and Binutils, contributing whatever unique features their current tools have to the GNU versions, and shipping these packages with their systems. Perl should take aggressive advantage of these programs' features in anticipation of eventually not having to support all the other cruft that's out there. It's completely unrealistic to believe that everyone is eventually going to use GNU libc and binutils. It's also completely false that the GNU libraries and tools are better than the vendor counterparts; Sun's compilers and linkers are considerably better than GNU's for SPARC C code, for example. If the glibc versions of utilities are useful and we can provide replacements for systems that don't have them, that's a good tactic and should be considered. Use of all sorts of random non-portable and often ill-conceived features is not. gcc implements all sorts of extensions for all sorts of reasons; some of them are worthwhile to use and some of them are horrid mistakes. Some of them are perfectly usable but shouldn't be used in new code because something else has been standardized (such as variadic macro handling). Similar things apply to glibc. Perl should be portable. Perl should make use of the capabilities of its target system as much as it can while still remaining portable, but Perl is a software package, not a political tool, and shouldn't be used as one. And the GNU packages contain tons of bad, poorly-designed cruft of their own; it's important to not lose sight of the fact that GNU libc *is* just another vendor library with its own nice features and bad mistakes, just like Sun's and HP's and AIX's. It has a nice license; that's irrelevant to porting Perl. In short, while some of the ideas in this RFC have merit, I am absolutely 100% opposed to the grand implications and its tone and would consider this approach to be disasterous for Perl. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: kpathsea
Simon Cozens [EMAIL PROTECTED] writes: Sounds good. Here's a slight modification: perllib.db is a cache; lookups take place as normal, but then any new information is written into the cache. The cache is invalided every $configurable period. Putting on my sysadmin hat, anything that a program wants to modify during its normal course of operation and isn't a dotfile in the user's home directory is inherently Evil and not to be tolerated if at all possible. Bear in mind that site-wide Perl installations are going to be exported read-only. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/