Re: threads?
I agree that threads are generelly a difficult issue to cope. What is worse, there are a lot of Java-developers who tell us, that it is not difficult for them, but in the end the software fails on the productive system, for example because the load is different then on the test system, causing different threads to be slowed down to a different extent etc. So people who are having difficulties with multithreading still use them a lot and don't admit the difficulties and they might not even appear during testing... Even though I did see software that heavily uses multithreading and works well. On the other hand I think that there are certain tasks that need to use some kind of parallelism, either for making use of parallel CPU infrastructure or for implementing patterns that can more easily be expressed using something like multithreading. I think that the approach of running several processes instead of several threads is something that can be considered in some cases, but I think it does come with a performance price tag that might not be justified in all situations. Maybe the actor model from Scala is worth looking at, at least the Scala-guys claim that that solves the issue, but I don't know if that concept can easily be adapted for Perl 6. Best regards, Karl
Re: 1.23 becomes Rat
James Cloos: If so, please use something compatable with ieee 754 decimal floats, so that they can be used when running on hardware which supports them. Even w/o such hardware, gcc (at least) has support for software emulation of _Decimal32 and _Decimal64 (and _Decimal128?). I think there are different ways to go. The traditional fixed byte length numbers like float32, float64, int16, int32, int64 etc. that are quite common in most programming languages. But especially for integers, where it makes most sense, it has become popular to have arbitrary length integers. Using something like decimal32 or decimal64 fits into this fixed byte length world and is quite useful for many things without doubt. BigDecimal in Java and LongDecimal for Ruby tend to go the other way, so they basically combine an arbitrary length integer with some additional information to make it a LongDecimal or BigDecimal or so, so they are not compatible with ieee 754, but they allow the int part to grow to arbitrary size. Both ways are possible and usually one would expect to find them in addon libraries (perl-modules in our context here). Going for such a type by default would require to integrate it into the language. But there might be some concerns for going this direction instead of good old float. One side of the story is that float is way better in terms of performance, because it has been implemented in most common hardware for about 20 years now (and decent hardware implementations have been around much longer than that), so the hardware implementation should be mature and well optimized. On the other hand people are used to the behaviour of float and use decimal types where some kind of explicit control is needed, like in finance applications. Best regards, Karl
1.23 becomes Rat (Re: Synopsis 02: Range objects)
Larry Wall wrote: Another note, it's likely that numeric literals such as 1.23 will turn into Rats rather than Nums, at least up to some precision that is pragmatically determined. Doing these as Rat would avoid a lot of the precision issues that floating point arithmetic has all the time. It will actually work perfectly well with addition, because the denominator is always a small power of 10, so that is true for the sum as well. Multiplying might be an issue, because the denominator becomes a large power of 10, but I think that that can be handled pretty well, unless the multiplication is really performed to an extent that the result uses significant amounts of memory. But as soon as division is occuring, these rational numbers tend to develop denominators that are not powers of 10 any more. Combining this with some multiplications and additions this may result in huge numerators and denominators that are somewhat expensive to handle. So what would happen after such a long calculation: - would the Rats somehow know that they are all derived from Rats that were just used instead of floats because of being within a pragmatically determined precision? Then the result of * or / could just as pragmatically become a floating point number? - would the Rats grow really huge numerators and denominators, making it expensive to work with them? - would the first division have to deal with the conversion from Rat to floating point? - or should there be a new numeric type similar to Rat that is always having powers of 10 as denominator (like BigDecimal in Java or LongDecimal for Ruby or decimal in C# or so)? Even in this last case the division is not really easy to define, because the exact result cannot generelly be expressed with a denomonator that is a power of 10. This can be resolved by: - requires additional rounding information (so writing something like a.divide(b, 10, ROUND_UP) or so instead of a/b - implicitely find the number of significant digits by using partial derivatives of f(x,y)=x/y - express the result as some kind of rational number - express the result as some kind of floating point number. Regards Karl
Re: Synopsis 02: Range objects
Michael Zedeler schrieb: Well... maybe. How do you specify the intended precision, then? If I want the values from 1 to 2 with step size 0.01, I guess that writing 1.00 .. 2.00 won't be sufficient. Trying to work out the step size by looking at the precision of things that are double or floats doesn't really sound so feasible, since there are a lot of holes in the actual representation, so 1.0001 may become 1.0, yielding very different results. That is a general problem of floats. We tend to write them in decimal notation, but internally they use a representation which is binary. And it is absolutely not obvious what the precision of 1.0001 might be. There could be a data type like LongDecimal in Ruby or BigDecimal in Java, that actually has a knowlegde of its precision and whose numbers are fractions with a power of 10 as the denominator. But for floats I would only see the interval as reasonably clear. Even a step of 1 is coming with some problems, because an increment of 1 does not have any effect on floating point numbers like 1.03e300 or so. Regards, Karl
Unicode support in Emacs
Larry Wall wrote: Well, it's too bad the emacs developers are lagging behind the vim developers in this area, but it might (or might not) have something to do with the fact that certain obnoxious people like me were bugging the vim folks incessantly to get their Unicode story straight for a couple of years before they actually did it. :-) About 10 years ago I wrote an email to Richard Stallman, who was at that time the maintainer for Emacs. And I asked him about Unicode. But at that time he had already thought of his own thing, slightly different than unicode, maybe slightly smarter... And he wrote me that this would be the way to go. :-( I get the impression that Unicode-support has kind of gone on top of this stuff and I must admit that the way I am currently using Unicode is to edit the stuff with \ucafe\ubabe-kind of replacements and run perlscripts to convert for example my private html-format into WWW-html. No, we should remind the Emacs-developers that the Unicode-support is at least pretty hard to handle for slightly below average users like me... I would in particular like to thank Bram Moolenaar for not writing us out of the book of life for all our whining. The Unicode support in vim has been rock solid, and we are grateful. Maybe I'll start using that one day. ;-) Sorry for the off-topic, but I think that it has some importance for Perl in the sense that it is good to go the right way and not to wait until Emacs supports it out of the box without 1435 lines of lisp. Best regards, Karl P.S. Don't get me wrong: RMS is a good guy and he has done a lot of useful and good stuff, part of which many of us are using all the time. P.P.S. I don't think that RMS is the current maintainer of Emacs.
Re: zip
Goplat wrote: I have quite a few fonts, the only one I can find where | is a broken bar is Terminal, a font for DOS programs that uses the cp437 charset, which is incompatable with latin1 (« and » are AE and AF instead of AB and BB) and it dosen't even have a ¦. So, it dosen't seem like a problem. It is still easy to confuse, but why worry? Larry's suggestion to use ¥ (JPY-sign) looks much better anyway. I think it is always important to remember that it is not only writing Perl6, but also reading Perl6 that has to be doable. Two many equivalent ways to write the same thing mean that the reader has to learn more. I think that Perl is very strong with the writing part. It is relatively easy and efficient to write in Perl, but the reading part is more of a challenge. That 1 and l look so similar is just due to the stupid convention to use fonts that make these two look very similar for source codes. But why add another problem with | and ¦ which do look similar if the resolution and size are low, if the ¥ can do the same thing in a better way? Introducing a z as a second alternative instead of ¥ might also cost something in terms of learning to read perl. The infix-operators that consist of letters are something that has to be learned very well in order to read perl-sources that have been written by others. So it might be good to have not too many of them. Best regards, Karl
broken bar (Re: Some questions about operators.)
Dear All, I think that the broken bar is dangerous. Why: It can be mixed up with the normal bar |. In some fonts it looks the same. And to many people it is not 100% clear, which of the two bars is the broken one and which not. Off course it is possible to avoid this, but that is not solving the problem of reading perl-code that someone else has written. The «» are not such a problem. But I would think that it would still be worth considering to avoid the broken bar. Sorry if this disscussion has been performend 1000 times already. Best regards, Karl
Re: Funky «vector» operator
Dear All, just for the Emacs-users among you: C-x 8 yields « and C-x 8 yields ». For the Unix/Linux users it is possible to setup or modify the keyboard layout using xmodmap. Actually there are so many combinations of OS, keyboard layouts, tools, editors and unicode encodings that this could become quite an FAQ... Btw. since it is favored that the default encoding for perl6 source code will be utf-8, it is not enough that you type something that displays as « or ». Your editor has to support utf-8 or you need to have conversion tools to and from something that your editor supports. Best regards, Karl
Re: Latin-1-characters
Dear All, from what has been written by others, there are enough useful encodings other than utf-8, utf-16/UCS-2 and UCS-4 that support efficient storage even for unicode-files whose contents are Greek, Cyrillic, etc.. Sorry for the confusion caused by the fact that I was not aware of these. utf-8 is fine for languages like German, Polish, Norwegian, Spanish, French,... which have = 90% of the text with ASCII-7-bit-characters. Add perl to that list, by the way. I rather strongly suspect that most perl code will consist mostly of 7-bit characters. (Even perl code written by traditional-Chinese-speakers (and I pick on traditional Chinese only because it has a very large character repituar -- one of the reasons there's a simplified variant).) My experience would be that Perl-programs do contain local language and thus local characters which might be outside of ISO-646-IRV (7-bit-ASCII) for String-literals and for comments. By the way, there is (should be) nothing that is encodable in a non-Unicode character set that is not encodable in (any encoding of) Unicode. That's where the uni bit comes from. If there is, it's means that Unicode is not fulfilling it's design goals. Yes, we can consider any file to be unicode with some encoding. That is how the Java-guys do it, with the restriction that they don't easily let you choose anything other than latin-1 + \ucafe-stuff for non-latin-1 characters (or maybe I didn't bother, because latin-1/ISO-8859-1 works fine for me). IMHO the OS should provide a standard way to specify such a charset as a file attribute, but usually it does not and it won't in the future, unless the file comes through the network and has a Mime-Header. I think the answer is multi-fold. 0) Auto-detect the encoding in the compiler, if a U+FFEF signature, or a #! signature, is found at the beginning of the input. (If there is a FFEF signature, it should get thrown away after it is recognized. It may be possible to recoginze on package or module as well, and possibly even on #.) With FFFE and FEFF this seems obvious. In case of #! it would not be clear to me if this defaults to ISO-8859-1 (latin-1) or to utf-8. See HTML vs. XHTML as an example where the default has been changed. 1) Beleive what the underling FS/OS/transport tells us. (This is likely to be a constant for many OSes, possibly selectable at the compiler's compile-time. It's the encoding on the end of the content-type for HTTP and other MIME-based transports.) I understand that the FS/OS do not really tell us, at least neither for Unix/Linux nor for NT/Windows. Relying on environment variables or locale settings looks dangerous to me, because it breaks programs that worked fine in environment A, when you run them elsewhere or it imposes restrictions how to setup these environment variables. It could be ok for one-liners run from the command line like this ls *.JPG|perl -p -e 's/(.*\.)JPG$/mv $1JPG $1jpg/;' |grep mv |sh stuff. This would work fine even for shell scripts, because they would have to set the appropriate environment variables for themselves, thus disregarding any user settings. Probably something additional like PERL_DEFAULT_ENCODING, because otherwise we might get clashes with (other) regular use of locale-settings. In cases where the OS or FS really has a capability to provide encoding on a per file basis as a file attribute or in cases where the file comes from the network with a mime-header, your suggestion should be perfect. 2) Support a use encoding 'foo' similar to that in recent perl5s: It states the encoding that the file it appears in is written in. Yes, that looks like the right way to do it. And it eliminates part of the concerns for 1), if it is assumed that this line use encoding is kind of required in every non-trivial perl-source. Btw. this is the encoding of the perl-source-code itself, files that are processed by perl I/O could off course have any encoding. (the higher-numbered sources of encoding information override the former ones.) Yes, off course. 0) and 2) are obvious, but 1) might need to be dealt with carefully. Best regards, Karl
Re: Latin-1-characters
Mark J. Reed wrote: Unicode per se doesn't do anything to file sizes; it's all in how you encode it. Yes. And basically there are common ways to encode this: utf-8 and utf-16 (or similar variants requiring = 2 bytes per character) The UTF-8 encoding is not so attractive in locales that make heavy use of characters which require several bytes to encode therein, or relatively little use of characters in the ASCII range; utf-8 is fine for languages like German, Polish, Norwegian, Spanish, French,... which have = 90% of the text with ASCII-7-bit-characters. but that's why there are other encoding schemes like SCSU which get you Unicode compatibility while not taking up much more space than the locale's native charset. These make sense for languages like Japanese, Korean, Chinese etc, where you need more than one byte per character anyway. But Russian, Greek, Hebrew, Arabic, Armenian and Georgian would work fine with one byte per character. But the kinds of of encoding that I can think of both make this two bytes per character. So for these I see file sizes doubled. Or do I miss something? Anyway, it will be necessary to specify the encoding of unicode in some way, which could possibly allow even to specify even some non-unicode-charsets. IMHO the OS should provide a standard way to specify such a charset as a file attribute, but usually it does not and it won't in the future, unless the file comes through the network and has a Mime-Header. Best regards, Karl
Latin-1-characters
And I do think people would rebel at using Latin-1 for that one. I get enough grief for :-) I can imagine that these cause some trouble with people using a charset other than ISO-8859-1 (Latin-1) that works well with 8 bit, like Greek, Arabic, Cyrillic and Hebrew. For these guys Unicode is not so attractive, because it kind of doubles the size of their files, so I would assume that they tend to do a lot of stuff with their koi-8 or with some ISO-8859-x not containing the desired character. For it might not be such a problem, because would work instead. Maybe this issue could (will?) be addressed by declaring the charset in the source and using something like (or better than) \u00AB for stuff that this charset does not have, using a charset-conversion to unicode while parsing the source. This looks somewhat cleaner to me than just pretending a source file written in ISO-8859-7 (Greek) were ISO-8859-1 (Latin-1), relying on the assumption that the two characters we use above 0x80 happen to be in the same positions 0xab and 0xbb. Sorry if that is an old story... Best regards, Karl