On Thu, 19 Jul 2018 20:34:26 +0200, Christian Gollwitzer wrote:
> Am 19.07.2018 um 14:50 schrieb Gregory Ewing:
>> Chris Angelico wrote:
>>> On Thu, Jul 19, 2018 at 4:41 PM, Gregory Ewing
>>> wrote:
>>>
(Google doesn't seem to think so -- it asks me whether I meant
"assist shop".
Am 19.07.2018 um 14:50 schrieb Gregory Ewing:
Chris Angelico wrote:
On Thu, Jul 19, 2018 at 4:41 PM, Gregory Ewing
wrote:
(Google doesn't seem to think so -- it asks me whether
I meant "assist shop". Although it does offer to translateč
it into Czech...)
Into or from?? I'm thoroughly
Chris Angelico wrote:
On Thu, Jul 19, 2018 at 4:41 PM, Gregory Ewing
wrote:
(Google doesn't seem to think so -- it asks me whether
I meant "assist shop". Although it does offer to translate
it into Czech...)
Into or from?? I'm thoroughly confused now!
Hard to tell. This is what the link
it's also thoroughly time to give this thread a well deserved rest
RIP
Abdur-Rahmaan Janhangeer
https://github.com/Abdur-rahmaanJ
Into or from?? I'm thoroughly confused now!
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
On Thu, Jul 19, 2018 at 4:41 PM, Gregory Ewing
wrote:
> Stefan Ram wrote:
>>
>> »assistshop«,
>
>
> Is that a word?
>
> (Google doesn't seem to think so -- it asks me whether
> I meant "assist shop". Although it does offer to translate
> it into Czech...)
>
Into or from?? I'm thoroughly
Stefan Ram wrote:
»assistshop«,
Is that a word?
(Google doesn't seem to think so -- it asks me whether
I meant "assist shop". Although it does offer to translate
it into Czech...)
--
Greg
--
https://mail.python.org/mailman/listinfo/python-list
Stefan Ram wrote:
Gregory Ewing writes:
That's debatable. I've never thought of it that way and I'm
fairly certain I don't pronounce it that way. My tongue does
not do the same thing when I say "ch" as it does when I
say "tsh".
archives ˈɑɚ kɑɪvz (n)
bachelor ˈbæʧ lɚ (n)
machine
MRAB wrote:
"ch" usually represents 2 phonemes, basically the sounds of "t" followed
by "sh";
That's debatable. I've never thought of it that way and I'm
fairly certain I don't pronounce it that way. My tongue does
not do the same thing when I say "ch" as it does when I
say "tsh".
--
Greg
--
On 18-07-18 10:07, Marko Rauhamaa wrote:
>> Sure there were some surprises or gotcha's, but the result was still
>> better than doing it in python2 and they were easier to deal with than
>> in python2.
> BTW, in those needs, even Python2 has Unicode strings and unicodedata at
> your disposal.
Antoon Pardon :
> On 17-07-18 14:22, Marko Rauhamaa wrote:
>> If you assume that NFC normalizes every letter to a single codepoint
>> (and carefully use NFC everywhere), you are right. But equally likely
>> you may inadvertently be setting yourself up for a surprise.
>
> You are moving the goal
On 17-07-18 14:22, Marko Rauhamaa wrote:
> Antoon Pardon :
>
>> On 17-07-18 10:27, Marko Rauhamaa wrote:
>>> Also, Python2's strings do as good a job at delivering codepoints as
>>> Python3.
>> No they don't. The programs that I work on, need to be able to treat
>> at least german, french, dutch
On 17/07/18 19:16, Marko Rauhamaa wrote:
MRAB :
"ch" usually represents 2 phonemes, basically the sounds of "t"
followed by "sh";
Traditionally, that sound is considered a single phoneme:
https://en.wikipedia.org/wiki/Affricate_consonant>
Can you hear the difference in these
On 17/07/18 19:16, Marko Rauhamaa wrote:
MRAB :
"ch" usually represents 2 phonemes, basically the sounds of "t"
followed by "sh";
Traditionally, that sound is considered a single phoneme:
https://en.wikipedia.org/wiki/Affricate_consonant>
To quote the introduction of that article, "It
MRAB :
> "ch" usually represents 2 phonemes, basically the sounds of "t"
> followed by "sh";
Traditionally, that sound is considered a single phoneme:
https://en.wikipedia.org/wiki/Affricate_consonant>
Can you hear the difference in these expressions:
high chairs
height shares
On 2018-07-17 03:25, Tim Chase wrote:
On 2018-07-17 01:08, Steven D'Aprano wrote:
In English, I think most people would prefer to use a different
term for whatever "sh" and "ch" represent than "character".
The term you may be reaching for is "consonant cluster"?
Antoon Pardon :
> On 17-07-18 10:27, Marko Rauhamaa wrote:
>> Also, Python2's strings do as good a job at delivering codepoints as
>> Python3.
>
> No they don't. The programs that I work on, need to be able to treat
> at least german, french, dutch and english text. My experience is that
> in
On 17-07-18 10:27, Marko Rauhamaa wrote:
> Steven D'Aprano :
>> On Mon, 16 Jul 2018 21:48:42 -0400, Richard Damon wrote:
>>> Who says there needs to be one. A good engineer will use the
>>> definition that is most appropriate to the task at hand. Some things
>>> need very solid definitions, and
> On Jul 17, 2018, at 3:44 AM, Steven D'Aprano
> wrote:
>
> On Mon, 16 Jul 2018 21:48:42 -0400, Richard Damon wrote:
>
>>> On Jul 16, 2018, at 9:21 PM, Steven D'Aprano
>>> wrote:
>>>
On Mon, 16 Jul 2018 19:02:36 -0400, Richard Damon wrote:
You are defining a variable/fixed
Chris Angelico :
> On Tue, Jul 17, 2018 at 6:27 PM, Marko Rauhamaa wrote:
>> Of course, UTF-8 doesn't relieve you from Unicode problems. But it has
>> one big advantage: it can usually deal with non-Unicode data without any
>> extra considerations while Python3's strings make you have to take
>>
Chris Angelico :
> On Tue, Jul 17, 2018 at 7:03 PM, Marko Rauhamaa wrote:
>> What I'd need is for the tty to tell me what column the cursor is
>> visually. Or better yet, the tty would have to tell me where the column
>> would be *after* I emit the next grapheme cluster.
>
> Are you prepared for
On Tue, Jul 17, 2018 at 7:03 PM, Marko Rauhamaa wrote:
> Chris Angelico :
>
>> On Tue, Jul 17, 2018 at 6:27 PM, Marko Rauhamaa wrote:
>>> For me, the issue is where do I produce a line break in my text output?
>>> Currently, I'm just counting codepoints to estimate the width of the
>>> output.
Chris Angelico :
> On Tue, Jul 17, 2018 at 6:27 PM, Marko Rauhamaa wrote:
>> For me, the issue is where do I produce a line break in my text output?
>> Currently, I'm just counting codepoints to estimate the width of the
>> output.
>
> Well, that's just flat out wrong, then. Counting graphemes
On Tue, Jul 17, 2018 at 6:27 PM, Marko Rauhamaa wrote:
> It is essential for people to understand that the very same issues that
> plague UTF-8 plague UTF-32 as well. Using UTF in both highlights that
> fact.
What a wonderful nonsense. I suppose that the same issues plague Elon
Musk as plague
On Tue, Jul 17, 2018 at 6:27 PM, Marko Rauhamaa wrote:
>> But of course other people's experience may vary. I'm interested in
>> learning about the library you use to process graphemes in your software.
>
> For me, the issue is where do I produce a line break in my text output?
> Currently, I'm
Steven D'Aprano :
> On Tue, 17 Jul 2018 09:52:13 +0300, Marko Rauhamaa wrote:
>
>> Both Python2 and Python3 provide two forms of string, one containing
>> 8-bit integers and another one containing 21-bit integers.
>
> Why do you insist on making counter-factual statements as facts? Don't
> you
Steven D'Aprano :
> On Mon, 16 Jul 2018 21:48:42 -0400, Richard Damon wrote:
>> Who says there needs to be one. A good engineer will use the
>> definition that is most appropriate to the task at hand. Some things
>> need very solid definitions, and some things don’t.
>
> The the problem is solved:
On Tue, 17 Jul 2018 10:51:38 +0300, Marko Rauhamaa wrote:
> in which Python3's honor is defended in a good many of the discussions
> in this newsgroup: anger, condescension, ridicule, name-calling.
You call it defending Python 3's honour. I call it responding to people
who insist on spreading
On Tue, 17 Jul 2018 15:20:16 +0900, INADA Naoki wrote (replying to Marko):
> I still don't understand what's your original point. I think UTF-8 vs
> UTF-32 is totally different from Python 2 vs 3.
>
> For example, string in Rust and Swift (2010s languages!) are *valid*
> UTF-8. There are strong
On Tue, 17 Jul 2018 09:52:13 +0300, Marko Rauhamaa wrote:
> Both Python2 and Python3 provide two forms of string, one containing
> 8-bit integers and another one containing 21-bit integers.
Why do you insist on making counter-factual statements as facts? Don't
you have a Python REPL you can try
On Tue, 17 Jul 2018 08:26:45 +0300, Marko Rauhamaa wrote:
> Steven D'Aprano :
>> On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote:
>>> UTF-8 bytes can only represent the first 128 code points of Unicode.
>>
>> This is DailyWTF material. Perhaps you want to rethink your wording and
>>
INADA Naoki :
>> I won't comment on Rust and Swift because I don't know them.
> ...
>> I won't comment on Go, either.
>
> Hmm, do you say Python 3 is "cult-like" without survey other popular,
> programming languages?
You can talk about Python3 independently of other programming languages.
On Mon, 16 Jul 2018 21:25:20 -0500, Tim Chase wrote:
> On 2018-07-17 01:08, Steven D'Aprano wrote:
>> In English, I think most people would prefer to use a different term
>> for whatever "sh" and "ch" represent than "character".
>
> The term you may be reaching for is "consonant cluster"?
>
>
On Mon, 16 Jul 2018 21:48:42 -0400, Richard Damon wrote:
>> On Jul 16, 2018, at 9:21 PM, Steven D'Aprano
>> wrote:
>>
>>> On Mon, 16 Jul 2018 19:02:36 -0400, Richard Damon wrote:
>>>
>>> You are defining a variable/fixed width codepoint set. Many others
>>> want to deal with CHARACTER sets.
>>
> I won't comment on Rust and Swift because I don't know them.
...
> I won't comment on Go, either.
Hmm, do you say Python 3 is "cult-like" without survey other popular,
programming languages?
There are many popular languages which separate bytes and unicode
string explicitly and string is not
On 7/16/2018 10:25 PM, Tim Chase wrote:
On 2018-07-17 01:08, Steven D'Aprano wrote:
In English, I think most people would prefer to use a different
term for whatever "sh" and "ch" represent than "character".
The term you may be reaching for is "consonant cluster"?
INADA Naoki :
> On Tue, Jul 17, 2018 at 2:31 PM Marko Rauhamaa wrote:
>> So I hope that by now you have understood my point and been able to
>> decide if you agree with it or not.
>
> I still don't understand what's your original point.
> I think UTF-8 vs UTF-32 is totally different from Python
On 7/16/2018 7:02 PM, Richard Damon wrote:
On Jul 16, 2018, at 3:28 PM, Terry Reedy wrote:
If one is using a broader definition than usual, it is clearer to say so.
This is the core of what I wrote. Do you disagree?
You are defining a variable/fixed width codepoint set.
No, I did
On Tue, Jul 17, 2018 at 2:31 PM Marko Rauhamaa wrote:
>
> Steven D'Aprano :
> > On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote:
> >> UTF-8 bytes can only represent the first 128 code points of Unicode.
> >
> > This is DailyWTF material. Perhaps you want to rethink your wording
> > and
Steven D'Aprano :
> On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote:
>> UTF-8 bytes can only represent the first 128 code points of Unicode.
>
> This is DailyWTF material. Perhaps you want to rethink your wording
> and maybe even learn a bit more about Unicode and the UTF encodings
>
On 2018-07-17 01:21, Steven D'Aprano wrote:
> > This doesn’t mean that UTF-32 is an awful system, just that it
> > isn’t the magical cure that some were hoping for.
>
> Nobody ever claimed it was, except for the people railing that
> since it isn't a magically system we ought to go back to the
On 2018-07-17 01:08, Steven D'Aprano wrote:
> In English, I think most people would prefer to use a different
> term for whatever "sh" and "ch" represent than "character".
The term you may be reaching for is "consonant cluster"?
https://en.wikipedia.org/wiki/Consonant_cluster
-tkc
--
> On Jul 16, 2018, at 9:21 PM, Steven D'Aprano
> wrote:
>
>> On Mon, 16 Jul 2018 19:02:36 -0400, Richard Damon wrote:
>>
>> You are defining a variable/fixed width codepoint set. Many others want
>> to deal with CHARACTER sets.
>
> Good luck coming up with a universal, objective,
On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote:
> All UTF-8. No unicode strings.
That just means you are re-implementing the bits of Unicode you care
about (which may be "nothing at all") as UTF-8. If your application is
nothing but middleware squirting bytes from one layer to
On Mon, 16 Jul 2018 15:28:51 -0400, Terry Reedy wrote:
> On 7/16/2018 1:11 PM, Richard Damon wrote:
>
>> Many consider that UTF-32 is a variable-width encoding because of the
>> combining characters. It can take multiple ‘codepoints’ to define what
>> should be a single ‘character’ for display.
On Mon, 16 Jul 2018 19:02:36 -0400, Richard Damon wrote:
> You are defining a variable/fixed width codepoint set. Many others want
> to deal with CHARACTER sets.
Good luck coming up with a universal, objective, language-neutral,
consistent definition for a character.
> This doesn’t mean that
On Tue, 17 Jul 2018 06:15:25 +1000, Chris Angelico wrote:
> On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano
> wrote:
>> There is nothing special about diacritics such that we ought to treat
>> some combinations like "Ch" (two code points = one character) as "fixed
>> width" while others like
> On Jul 16, 2018, at 3:28 PM, Terry Reedy wrote:
>
>> On 7/16/2018 1:11 PM, Richard Damon wrote:
>>
>> Many consider that UTF-32 is a variable-width encoding because of the
>> combining characters. It can take multiple ‘codepoints’ to define what
>> should be a single ‘character’ for
On Tue, Jul 17, 2018 at 7:02 AM, Ethan Furman wrote:
> On 07/16/2018 01:15 PM, Chris Angelico wrote:
>>
>> On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano wrote:
>
>
>>> There is nothing special about diacritics such that we ought to treat
>>> some combinations like "Ch" (two code points = one
Ethan Furman :
> Depends on the language: in Spanish, "ch" is it's own letter (at least
> it was when I grew up), so any word containing it should still contain
> it when reversed: "chica" would be "acich".
The Royal Academy broke "ch" and "ll" up into separate letters a decade
or so back. It had
On Tue, Jul 17, 2018 at 6:54 AM, Marko Rauhamaa wrote:
> Chris Angelico :
>> Challenge: Reverse a string in UTF-8.
>
> Counter-challenge: Reverse a Unicode string:
>
>>>> s = "a\u0304e"
>>>> s
>'āe'
>>>> L = list(s)
>>>> L.reverse()
>>>> "".join(L)
>'ēa'
>
>>
On 07/16/2018 01:15 PM, Chris Angelico wrote:
On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano wrote:
There is nothing special about diacritics such that we ought to treat
some combinations like "Ch" (two code points = one character) as "fixed
width" while others like "â" (two code points =
Chris Angelico :
> Challenge: Reverse a string in UTF-8.
Counter-challenge: Reverse a Unicode string:
>>> s = "a\u0304e"
>>> s
'āe'
>>> L = list(s)
>>> L.reverse()
>>> "".join(L)
'ēa'
> Challenge: Center text in UTF-8.
Counter-challenge: Center a Unicode string:
>>>
On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano
wrote:
> There is nothing special about diacritics such that we ought to treat
> some combinations like "Ch" (two code points = one character) as "fixed
> width" while others like "â" (two code points = one character) as
> "variable width".
When
On Tue, Jul 17, 2018 at 5:51 AM, Marko Rauhamaa wrote:
> Steven D'Aprano :
>> Under that standard definition, UTF-8 and UTF-16 are variable-width,
>> and UTF-32 is fixed-width.
>>
>> But I'll accept that UTF-32 is variable-width if Marko accepts that
>> ASCII is too.
>
> If that makes you happy,
On 16/07/18 20:51, Marko Rauhamaa wrote:
I use UTF-8 in my C programs and sense no disadvantage. I have never
felt a need for wchar_t.
That's not a good comparison, though, because wchar_t in C really
doesn't give you much (if any) advantage over rolling your own UTF-8
support, even when
Steven D'Aprano :
> Under that standard definition, UTF-8 and UTF-16 are variable-width,
> and UTF-32 is fixed-width.
>
> But I'll accept that UTF-32 is variable-width if Marko accepts that
> ASCII is too.
If that makes you happy, fine. The point is, UTF-32 has no advantages
over UTF-8. And I'm
On 7/16/2018 1:11 PM, Richard Damon wrote:
Many consider that UTF-32 is a variable-width encoding because of the combining
characters. It can take multiple ‘codepoints’ to define what should be a single
‘character’ for display.
I hope you realize that this is not the standard meaning of
On Mon, 16 Jul 2018 14:22:27 -0400, Richard Damon wrote:
[...]
> But I am not talking about those sort of characters or ligatures,
So what? I am.
You don't get to say "only non-standard definitions I approve of count".
There is the industry standard definition of what it means to be a fixed-
On Tue, Jul 17, 2018 at 4:22 AM, Richard Damon wrote:
>
> But I am not talking about those sort of characters or ligatures, but
> ‘characters’ that are built up of a combining diacritical marks (like
> accents) and a base character. Unicode define many code points for the more
> common of
> On Jul 16, 2018, at 1:36 PM, Steven D'Aprano
> wrote:
>
> On Mon, 16 Jul 2018 13:11:23 -0400, Richard Damon wrote:
>
>>> On Jul 16, 2018, at 12:51 PM, Steven D'Aprano
>>> wrote:
>>>
On Mon, 16 Jul 2018 00:28:39 +0300, Marko Rauhamaa wrote:
if your new system used Python3's
On Mon, 16 Jul 2018 13:11:23 -0400, Richard Damon wrote:
>> On Jul 16, 2018, at 12:51 PM, Steven D'Aprano
>> wrote:
>>
>>> On Mon, 16 Jul 2018 00:28:39 +0300, Marko Rauhamaa wrote:
>>>
>>> if your new system used Python3's UTF-32 strings as a foundation, that
>>> would be an equally naïve
> On Jul 16, 2018, at 12:51 PM, Steven D'Aprano
> wrote:
>
>> On Mon, 16 Jul 2018 00:28:39 +0300, Marko Rauhamaa wrote:
>>
>> if your new system used Python3's UTF-32 strings as a foundation, that
>> would be an equally naïve misstep. You'd need to reach a notch higher
>> and use glyphs or
62 matches
Mail list logo