Question about list context for String.chars

2005-04-11 Thread gcomnz
Hi all, 

I'm writing a bunch of examples for perl 6 pleac and it seems rather
natural to expect $string.chars to return a list of unicode chars in
list context, however I can't find anything to confirm that. (The
other alternatives being split and unpack.)

# unpack
@array = unpack(C*, $string);
# split
@array = split /./, $string;
# this too?
@array = $string.split(/./)
# and how about this?
@array = $string.chars
# and this explicit list context?
@array = $string.chars[];

Thanks,

Marcus


Re: Question about list context for String.chars

2005-04-11 Thread Ingo Blechschmidt
Hi,

gcomnz wrote:
 I'm writing a bunch of examples for perl 6 pleac and it seems rather
 natural to expect $string.chars to return a list of unicode chars in
 list context, however I can't find anything to confirm that. (The
 other alternatives being split and unpack.)

I like that.

If one wanted to have the *number* of chars/graphemes/whatever, one
could still use the cheap unary + operator.

And .keys, .values, .pairs, etc. don't return a plain number, but actual
contents, too (consistency!).


--Ingo

-- 
Linux, the choice of a GNU | Wissen ist Wissen, wo man es findet.  
generation on a dual AMD   | 
Athlon!| 



Whither use English?

2005-04-11 Thread David Vergin
I'm working on docs/S28draft.pod in the pugs project. And consulting perl5's 
perlvar.pod, the issue of use English comes up. AFAICT from various sources, 
little has been said about this

NOTE: 
http://groups-beta.google.com/group/perl.perl6.language/msg/fa241233bcfba024: 
we've already been through the whole Cuse English; thing and how no one uses 
it

What's the word. Will there be something like use English?

Regards to all,
David



Re: Whither use English?

2005-04-11 Thread Juerd
David Vergin skribis 2005-04-11  9:44 (-0700):
 What's the word. Will there be something like use English?

Yes, and it's the default :)


Juerd
-- 
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html 
http://convolution.nl/gajigu_juerd_n.html


Re: Whither use English?

2005-04-11 Thread Aaron Sherman
On Mon, 2005-04-11 at 14:31, Juerd wrote:
 David Vergin skribis 2005-04-11  9:44 (-0700):
  What's the word. Will there be something like use English?
 
 Yes, and it's the default :)

Yes, but it will be spelled:

use $*LANG ;-)

Seriously, is there some reason that we would not provide a
Language::Russian and Language::Nihongo? Given Perl 6, it would even
be quite valid for those modules to add aliases for all of the core
functions and keywords, not just global variables.

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Whither use English?

2005-04-11 Thread Juerd
Aaron Sherman skribis 2005-04-11 14:49 (-0400):
 Yes, but it will be spelled:
   use $*LANG ;-)
 Seriously, is there some reason that we would not provide a
 Language::Russian and Language::Nihongo? Given Perl 6, it would even
 be quite valid for those modules to add aliases for all of the core
 functions and keywords, not just global variables.

Because providing it leads to its use, and when it gets used, knowing
English is no longer enough.

I have some code that uses Dutch variable names. When I show that code
to people who can't read any Dutch, they have a hard time finding out
what it does and how it works. If even builtin functions become
unfamiliar, this figuring out becomes impossible instead of hard,
without learning the language it's written in.

English sucks in many interesting ways, but at least it's a de facto
standard and documentation will be available in it.

I'm not even sure I like the *possibility* of using non-ascii letters in
identifiers, even.

As a 12-year old, I used several BASIC dialects. One time I found a
Dutch BASIC. It had TOON instead of PRINT, and INVOER instead of
INPUT. Even though these words were in my own language, I found using
them hard just because I was used to something entirely different. 

You could say it only takes some getting used to, but it's easier to get
used to one language than to all languages a grammar exists for.

And even though I knew when I wrote it that it was a mistake, I used
esperato identifiers in Lingua::EO::Supersignoj. You can't imagine how
often I've used new instead of nova since I released that. A next
version is going to have English as the primary language, even though I
love Esperanto.

I do think translating *documentation* is a very good idea. But please
let that be an official project, with lots and lots of committers,
because every one-man translation operation eventually dies.


Juerd
-- 
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html 
http://convolution.nl/gajigu_juerd_n.html


Re: Whither use English?

2005-04-11 Thread Mark Reed

On 2005-04-11 15:00, Juerd [EMAIL PROTECTED] wrote:
 
 I'm not even sure I like the *possibility* of using non-ascii letters in
 identifiers, even.


I agree that it would be a nightmare if project A used presu instead of
print everywhere, while project B used toon, etc.  But non-ASCII identifiers
are a good thing, because there are many places even in the English-speaking
world  even in Ugly America  where people are used to such identifiers.  I
want to be able to use $ for a variable representing angstroms, to see the
constant Math::Trig:: in trig functions,  to declare a sub  that does
summations, etc etc.  And even if those dont come through in email
properly, they make it through CVS/SVN commits and updates just fine. :)





Re: Question about list context for String.chars

2005-04-11 Thread Aaron Sherman
On Mon, 2005-04-11 at 14:12, Ingo Blechschmidt wrote:

 gcomnz wrote:
  I'm writing a bunch of examples for perl 6 pleac and it seems rather
  natural to expect $string.chars to return a list of unicode chars in
  list context, however I can't find anything to confirm that. (The
  other alternatives being split and unpack.)
 
 I like that.

Same here, though I have to admit that I'm slow on this whole Unicode
thing, so I'm not sure what you mean by Unicode chars. For example,
are you expecting to get f, f, i or  back when you say
.chars? More interestingly, what about all of the Arabic ligatures
which someone who speaks that language might reasonably expect to get
back as multiple chars, but they have their own Unicode codepoint
(e.g.  which is U+FCF3 ARABIC LIGATURE SHADDA WITH DAMMA MEDIAL FORM
which you might expect to get ,  from)? Any Arabic speakers to
confirm or deny this behavior of ligatures?

Please be aware, I'm talking about ligatures above, NOT special letters
such as , which are their own letters, and cannot be decomposed into
a, e without losing information.

Given Parrot, what happens when you are presented with a Big5 string
that does not have a strict Unicode equivalent? Does .chars throw an
exception, or does it rely on the string to know how to characterify
itself according to its vtable?

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Question about list context for String.chars

2005-04-11 Thread gcomnz
I have to say I'm slightly confused too for some languages, especially
for syllabic alphabets. At the same time, I'm pretty clear for CJK,
Syllabaries,  and alphabets, or at least I hope I'm clear (I guess I'm
about to find out), .chars just returns the right unicode level for
whatever the string contents requires.

abc.chars  would return a b c, which I'm guessing would be byte
size usually.

.chars would return , which can probably be 
expressed with UTF8?

 Aaron wrote:
 Same here, though I have to admit that I'm slow on this whole Unicode
 thing, so I'm not sure what you mean by Unicode chars. For example,
 are you expecting to get f, f, i or  back when you say
 .chars? More interestingly, what about all of the Arabic ligatures
 which someone who speaks that language might reasonably expect to get
 back as multiple chars, but they have their own Unicode codepoint
 (e.g.  which is U+FCF3 ARABIC LIGATURE SHADDA WITH DAMMA MEDIAL FORM
 which you might expect to get ,  from)? Any Arabic speakers to
 confirm or deny this behavior of ligatures?

From Apocalyps 5: Under level 2 Unicode support, a character is
assumed to mean a grapheme, that is, a sequence consisting of a base
character followed by 0 or more combining characters.

Marcus

On 4/11/05, Aaron Sherman [EMAIL PROTECTED] wrote:
 On Mon, 2005-04-11 at 14:12, Ingo Blechschmidt wrote:
 
  gcomnz wrote:
   I'm writing a bunch of examples for perl 6 pleac and it seems rather
   natural to expect $string.chars to return a list of unicode chars in
   list context, however I can't find anything to confirm that. (The
   other alternatives being split and unpack.)
 
  I like that.
 
 Same here, though I have to admit that I'm slow on this whole Unicode
 thing, so I'm not sure what you mean by Unicode chars. For example,
 are you expecting to get f, f, i or  back when you say
 .chars? More interestingly, what about all of the Arabic ligatures
 which someone who speaks that language might reasonably expect to get
 back as multiple chars, but they have their own Unicode codepoint
 (e.g.  which is U+FCF3 ARABIC LIGATURE SHADDA WITH DAMMA MEDIAL FORM
 which you might expect to get ,  from)? Any Arabic speakers to
 confirm or deny this behavior of ligatures?
 
 Please be aware, I'm talking about ligatures above, NOT special letters
 such as , which are their own letters, and cannot be decomposed into
 a, e without losing information.
 
 Given Parrot, what happens when you are presented with a Big5 string
 that does not have a strict Unicode equivalent? Does .chars throw an
 exception, or does it rely on the string to know how to characterify
 itself according to its vtable?
 
 --
 Aaron Sherman [EMAIL PROTECTED]
 Senior Systems Engineer and Toolsmith
 It's the sound of a satellite saying, 'get me down!' -Shriekback
 



Re: Whither use English?

2005-04-11 Thread Aaron Sherman
On Mon, 2005-04-11 at 15:00, Juerd wrote:
 Aaron Sherman skribis 2005-04-11 14:49 (-0400):
  Yes, but it will be spelled:
  use $*LANG ;-)
  Seriously, is there some reason that we would not provide a
  Language::Russian and Language::Nihongo? Given Perl 6, it would even
  be quite valid for those modules to add aliases for all of the core
  functions and keywords, not just global variables.
 
 Because providing it leads to its use, and when it gets used, knowing
 English is no longer enough.

I don't think you can say (as Larry has) that you want to be able to
fully re-define the language from within itself and still impose the
constraint that it can't confuse people who don't know anything about
my module.

You might argue that Language::Dutch should never ship with the core...
that's a valid opinion, but SOMEONE is going to write it. It'd be a kind
of strange form of censorship for CPAN not to accept it. After all,
there's more than one way to say it... isn't there?

 English sucks in many interesting ways, but at least it's a de facto
 standard and documentation will be available in it.

Let's think about this in terms other than someone distributing code to
the masses. What about teaching? If I were going to teach the basic
concepts of programming, I'd like to do so with a language whose
constructs are all native. This is simply practical: having to learn
vocabulary at the same time that you learn a new WAY of communicating
makes it harder. If CPAN had a Language::NYUpperEastSide, then I might
consider using that for my elementary computer class rather than try to
teach everyone real English AND programming in one year ;-)

 I'm not even sure I like the *possibility* of using non-ascii letters in
 identifiers, even.

I think we already have Latin-1 in identifiers... let me check. Yep:

pugs my $ = 1;
undef
pugs $;
1

Let's see about UTF-8

pugs my $ = 1;
undef
pugs $;
1

A-yup!

-- 
Aaron Sherman [EMAIL PROTECTED]
Senior Systems Engineer and Toolsmith
It's the sound of a satellite saying, 'get me down!' -Shriekback




Re: Question about list context for String.chars

2005-04-11 Thread Mark Reed

On 2005-04-11 15:40, gcomnz [EMAIL PROTECTED] wrote:
 

.chars would return [EMAIL PROTECTED]@, which can probably be 
expressed
with UTF8?

The string  is probably represented internally as UTF-8, but that
should have no effect on what .chars returns, which should, indeed, be 
[EMAIL PROTECTED], that is, an array whose elements are strings which each 
represent
one Unicode code point  irrespective of encoding.

I think that, in general, at the level of Perl code, 1 character should be
one code point, and any higher-level support for combining and splitting
should be outside the core, in Unicode::Whatever.






Re: Question about list context for String.chars

2005-04-11 Thread Aaron Sherman
On Mon, 2005-04-11 at 15:40, gcomnz wrote:
 I have to say I'm slightly confused too for some languages,
 especiallyfor syllabic alphabets. At the same time, I'm pretty clear
 for CJK,Syllabaries,  and alphabets, or at least I hope I'm clear (I
 guess I'mabout to find out), .chars just returns the right unicode
 level forwhatever the string contents requires.

 abc.chars  would return a b c, which I'm guessing would be
 bytesize usually.

Fair enough.

 .chars would return , which can probably be 
 expressed with
 UTF8?

I think you're confusing UTF8 (which can represent ALL Unicode
characters) and the UTF8 subset which consists of one-byte
representations (which happens to overlap with 7-bit ASCII).

 From Apocalyps 5: Under level 2 Unicode support, a character
 isassumed to mean a grapheme, that is, a sequence consisting of a
 basecharacter followed by 0 or more combining characters.
 Marcus

Hmmm... that doesn't answer the ligature question clearly though. That
answers for the case of combining diacritical marks: 

http://en.wikipedia.org/wiki/Combining_diacritical_mark

e.g. A  vs , which is a pre-combined example, but there are (as I
understand it), many valid examples which do not have a pre-combined
representation in Unicode.

But not for ligatures:

http://en.wikipedia.org/wiki/Ligature_%28typography%29

which are, by definition, actually two or more unique characters which
have a special typographical representation when adjacent. So, they are
a single grapheme, but like I said: certain cultures would be shocked by
a .chars that did not decompose their ligatures (and again, I'm mostly
thinking Arabic, so I'd defer to someone who actually spoke Arabic and
knows how they deal with this).




Re: Question about list context for String.chars

2005-04-11 Thread gcomnz
  abc.chars  would return a b c, which I'm guessing would be
  bytesize usually.
 
 Fair enough.
 
  .chars would return [EMAIL PROTECTED]@, which can probably be 
  expressed with
  UTF8?
 
 I think you're confusing UTF8 (which can represent ALL Unicode
 characters) and the UTF8 subset which consists of one-byte
 representations (which happens to overlap with 7-bit ASCII).

Perhaps my confusion is that I thought, perhaps wrongly, that since
.chars returns a count that is appropriate for the given unicode
level, that would mean that if it were able to return a list in list
context then it would be with the right storage size as needed for the
given string contents. For instance, a b c just requires bytes for
each element, while Kanji would require more. I'm leaving very wide
room open here for me really misunderstanding how all this works.

 
  From Apocalyps 5: Under level 2 Unicode support, a character
  isassumed to mean a grapheme, that is, a sequence consisting of a
  basecharacter followed by 0 or more combining characters.
  Marcus
 
 Hmmm... that doesn't answer the ligature question clearly though. That
 answers for the case of combining diacritical marks:

I read followed by 0 or more combining characters to mean that it is
smart enough to combine the vowels in Arabic and other syllabic
alphabets that use special conjuncts. However I'm also not exactly
sure if that's even reasonably possible, or even if it makes sense in
the counting of characters for languages that use those.


Here documents as positional parameters to a function call

2005-04-11 Thread gcomnz
Hey all, more pleac conversion questions:

I can't prove with the docs that a heredoc will continue to work as
positional params to a function call, particularly where it's not the
first param:

die Couldn't send mail unless send_mail qq:to/EOTEXT/, $target
 here doc here ...
EOTEXT

Any comments?

Marcus


Re: Here documents as positional parameters to a function call

2005-04-11 Thread Luke Palmer
gcomnz writes:
 Hey all, more pleac conversion questions:
 
 I can't prove with the docs that a heredoc will continue to work as
 positional params to a function call, particularly where it's not the
 first param:
 
 die Couldn't send mail unless send_mail qq:to/EOTEXT/, $target
  here doc here ...
 EOTEXT

Here docs work just like in Perl 5 with two differences:  They are
spelled qq:to/END/, q:to/END/, etc. and the ending text can have leading
whitespace, which is stripped off of the text.

Luke


Re: Question about list context for String.chars

2005-04-11 Thread Rod Adams
gcomnz wrote:
Hi all, 

I'm writing a bunch of examples for perl 6 pleac and it seems rather
natural to expect $string.chars to return a list of unicode chars in
list context, however I can't find anything to confirm that. (The
other alternatives being split and unpack.)
# unpack
@array = unpack(C*, $string);
# split
@array = split /./, $string;
# this too?
@array = $string.split(/./)
# and how about this?
@array = $string.chars
# and this explicit list context?
@array = $string.chars[];
Thanks,
Marcus
 

Well, in general the word chars has come to mean whatever a character 
is in the current lexical scope, typically a language level char.

It had previously been decided that C.chars,etc would return the 
length. I'm not about to change that without approval from @Larry.

I don't see any technical problem with saying that C.chars returns an 
array of those chars, when then gets converted to length of array in 
scalar context. The creating a list just to get length can of course 
be optimized away.

My main issue is that it's it giving two rather different semantics to 
the same method name, and leaving it to what amounts to context based 
dispatching. So I don't like this idea as written.

However, I do like the idea of treating a string as an array of chars. I 
remember some discussion a while back about making [] on strings do 
something useful (but not the same thing as Csubstr), but I forget how 
it ended, and my brain is too fried to go hunt it down. But overall I 
like that idea. Then you could just say:

   @array = $string[];
Which is a lot prettier than anything you mentioned above, let's us get 
rid of the .split:/null/ issue, has better huffman coding, and lets 
.chars have only one meaning.

For reference, what I'm thinking of having [] do is return the chars 
specified as a list. This should be lvaluable, so you can hack at 
individual chars to your heart's content.

This is different from substr(), since the latter returns a string of 
the range of chars, not the individual chars. Consider:

   $a = $b = All good boys go to heaven.;
   substr($a,9,3) = girl;
   $b[9..11] = girl[];
   say A: $a;
   say B: $b;
   A: All good girls go to heaven.
   B: All good girs go to heaven.
-- Rod Adams
  


Re: Question about list context for String.chars

2005-04-11 Thread gcomnz
 Rod wrote:
 However, I do like the idea of treating a string as an array of chars. I
 remember some discussion a while back about making [] on strings do
 something useful (but not the same thing as Csubstr), but I forget how
 it ended, and my brain is too fried to go hunt it down. But overall I
 like that idea. Then you could just say:
 
 @array = $string[];

This all sounds nice and simple. My only question then is what about
the instances where you specifically need the array of graphs, codes,
bytes, or whatever? If we can do one, why not all?

I recall that a good point Larry made previously is not to bend over
backward to let C programmers still think like C programmers in Perl
(sorry if my munging didn't get that just right). And to be honest I
only came up with this question for the cookbook (pleac) examples, but
I'm guessing there's some reasonable use for all this stuff outside of
the C-thinking world?


Re: Question about list context for String.chars

2005-04-11 Thread Matt Diephouse
On Apr 12, 2005 12:20 AM, gcomnz [EMAIL PROTECTED] wrote:
  Rod wrote:
  However, I do like the idea of treating a string as an array of chars. I
  remember some discussion a while back about making [] on strings do
  something useful (but not the same thing as Csubstr), but I forget how
  it ended, and my brain is too fried to go hunt it down. But overall I
  like that idea. Then you could just say:
 
  @array = $string[];
 
 This all sounds nice and simple. My only question then is what about
 the instances where you specifically need the array of graphs, codes,
 bytes, or whatever? If we can do one, why not all?

That's why C$string.chars[] was proposed -- it would be accompanied
by .graphs, .codes, and .bytes. That is all fine and dandy, but I
don't think I should have to think about unicode if i don't want to.
And if I understand correctly, that means that I want everything to
use chars by default. And C$string[] would be a nice shortcut for
that.

-- 
matt diephouse
http://matt.diephouse.com


Re: Question about list context for String.chars

2005-04-11 Thread gcomnz
   However, I do like the idea of treating a string as an array of chars. I
   remember some discussion a while back about making [] on strings do
   something useful (but not the same thing as Csubstr), but I forget how
   it ended, and my brain is too fried to go hunt it down. But overall I
   like that idea. Then you could just say:
  
   @array = $string[];
 
  This all sounds nice and simple. My only question then is what about
  the instances where you specifically need the array of graphs, codes,
  bytes, or whatever? If we can do one, why not all?
 
 That's why C$string.chars[] was proposed -- it would be accompanied
 by .graphs, .codes, and .bytes. That is all fine and dandy, but I
 don't think I should have to think about unicode if i don't want to.
 And if I understand correctly, that means that I want everything to
 use chars by default. And C$string[] would be a nice shortcut for
 that.

Yes, that's sort of what I was arguing for, in an underhanded way. I
agree that $string[] is a good shorthand for the most common usage
($string.chars[]) too.


Re: Question about list context for String.chars

2005-04-11 Thread Rod Adams
Matt Diephouse wrote:
On Apr 12, 2005 12:20 AM, gcomnz [EMAIL PROTECTED] wrote:
 

Rod wrote:
However, I do like the idea of treating a string as an array of chars. I
remember some discussion a while back about making [] on strings do
something useful (but not the same thing as Csubstr), but I forget how
it ended, and my brain is too fried to go hunt it down. But overall I
like that idea. Then you could just say:
   @array = $string[];
 

This all sounds nice and simple. My only question then is what about
the instances where you specifically need the array of graphs, codes,
bytes, or whatever? If we can do one, why not all?
   

That's why C$string.chars[] was proposed -- it would be accompanied
by .graphs, .codes, and .bytes. That is all fine and dandy, but I
don't think I should have to think about unicode if i don't want to.
And if I understand correctly, that means that I want everything to
use chars by default. And C$string[] would be a nice shortcut for
that.
 

I've been meaning to ask what people thing about having operators that 
temporarily change the current lexical Unicode level for just one 
single expression. I see them as solving all kinds of corner cases.

Unfortunately, I don't have a solid proposal handy, which has kept me 
from posting it. But since there is some interest in this, I'll throw 
the concept out there, and see if anyone else has a good idea what they 
should look like, and exactly how they should work.

-- Rod Adams