subject:"Re\: \[Python\-ideas\] Proposal for default character representation"

Re: [Python-ideas] Proposal for default character representation

2016-10-16 Thread Steven D'Aprano

On Sun, Oct 16, 2016 at 05:02:49PM +0200, Mikhail V wrote:

> In this discussion yes, but layout aspects can be also
> improved and I suppose special purpose of
> language does not always dictate the layout of
> code, it is up to you who can define that also.
> And glyphs is not very narrow aspect, it is
> one of the fundamental aspects. Also
> it is much harder to develop than good layout, note that.

This discussion is completely and utterly off-topic for this mailing 
list. If you want to discuss changing the world to use your own custom 
character set for all human communication, you should write a blog or a 
book. It is completely off-topic for Python: we're interested in 
improving the Python programming language, not yet another constructed 
language or artifical alphabet:

https://en.wikipedia.org/wiki/Shavian_alphabet

If you're interested in this, there is plenty of prior art. 
See for example: Esperanto, Ido, Volapük, Interlingua, Lojban. But don't 
discuss it here.

-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-16 Thread Greg Ewing


Mikhail V wrote:

Those things cannot be easiliy measured, if at all,


If you can't measure something, you can't be sure
it exists at all.

> In my case I am looking at what I've achieved

during years of my work on it and indeed there some
interesting things there.


Have you *measured* anything, though? Do you have
any feel for how *big* the effects you're talking
about are?


There must *very* solid reason
for digits+letters against my variant, wonder what is it.


The reasons only have to be *very* solid if there
are *very* large advantages to the alternative you
propose. My conjecture is that the advantages are
actually extremely *small* by comparison. To refute
that, you would need to provide some evidence to
the contrary.

Here are some reasons in favour of the current
system:

* At the point where most people learn to program,
they are already intimately familiar with reading,
writing and pronouncing letters and digits.

* It makes sense to use 0-9 to represent the first
ten digits, because they have the same numerical
value.

* Using letters for the remaining digits, rather
than punctuation characters, makes sense because
we're already used to thinking of them as a group.

* Using a consecutive sequence of letters makes
sense because we're already familiar with their
ordering.

* In the absence of any strong reason otherwise,
we might as well take them from the beginning of
the alphabet.

Yes, those are all based on "habits", but they're
habits shared by everyone, just like the base 10
that you have a preference for. You would have to
provide some strong evidence that it's worth
disregarding them and using your system instead.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-16 Thread Mikhail V

On 16 October 2016 at 04:10, Steve Dower  wrote:
>> I posted output with Python2 and Windows 7
>> BTW , In Windows 10 'print'  won't work in cmd console at all by default
>> with unicode but thats another story, let us not go into that.
>> I think you get my idea right, it is not only about printing.

> FWIW, Python 3.6 should print this in the console just fine. Feel free to
> upgrade whenever you're ready.
>
> Cheers,
> Steve

Thanks, that is good, sure I'll do that since I need that
right now (a lot of  work with Cyrillic data).

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-16 Thread Mikhail V

On 16 October 2016 at 17:16, Todd  wrote:
>Even if you were right that your approach is somehow inherently easier,
>it is flat-out wrong that other approaches lead to "brain impairment".
>On the contrary, it is well-established that challenging
>the brain prevents or at least delays brain impairment.

My phrasing "impairment" is of course somewhat exaggeration.
It cannot be compared to harm due to smoking for example.
However it also known that many people who do
big amount of information processing and intensive reading
are subject to earlier loss of the vision sharpness.
And I feel it myself.
How exactly this happens to the eye itself is not clear for me.
One my supposition is that during the reading there is
very intensive two-directional signalling between eye and
brain. So generally you are correct, the eye is technically
a camera attached to the brain and simply sends pictures
at some frequency to the brain.
But I would tend to think that it is not so simple actually.
You probably have heard sometimes users who claim something like:
"this text hurts my eyes"
For example if you read non-antialiased text and with too
high contrast, you'll notice that something is indeed going wrong
with your eyes.
This can happen probably because the brain starts to signal
the eye control system "something is wrong, stop doing it"
Since your eye cannot do anything with wrong contrast on
your screen and you still need to continue reading, this
happens again and again. This can cause indeed unwanted
processes and overtiredness of muscles inside the eye.
So in case of my examle with Chinese students, who wear
goggles more frequently, this would probaly mean
that they could "recover" if they just stop
reading a lot.

"challenging the brain prevents or at least delays brain"
Yes but I hardly see connection with this case,
I would probably recommend to make some creative
exercises, like drawing or solving puzzles for this purpose.
But if I propose reading books in illegible font than I
would be wrong in any case.

> And it also makes no sense that it would cause visual impairment, either.
> Comparing glyphs is a higher-level task in the brain,
> it has little to do with your eyes.

You forget about that whith illegible font or wrong contrast
for example you *do* need to do more concentrarion,
This causes again your eye to try harder to adopt
to the information you see, reread, which again
affects your lens and eye movements.
Anyway, how do you think then this earlier vision loss
happens? You'd say I fantasise?

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-16 Thread Todd

On Thu, Oct 13, 2016 at 1:46 AM, Mikhail V  wrote:

> Practically all this notation does, it reduces the time
> before you as a programmer
> become visual and brain impairments.
>
>
Even if you were right that your approach is somehow inherently easier, it
is flat-out wrong that other approaches lead to "brain impairment".  On the
contrary, it is well-established that challenging the brain prevents or at
least delays brain impairment.

And it also makes no sense that it would cause visual impairment, either.
Comparing glyphs is a higher-level task in the brain, it has little to do
with your eyes.  All your eyes detect are areas of changing contrast, any
set of lines and curves, not even glyphs, is functionally identical at that
level (and even at medium-level brain regions).  The size of the glyphs can
make a difference, but not the number of available ones.  On the contrary,
having more glyphs increases the information density of text, reducing the
amount of reading you have to do to get the same information.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-16 Thread Mikhail V

On 16 October 2016 at 02:58, Greg Ewing  wrote:

>> even if it is assembler or whatever,
>> it can be made readable without much effort.
>
>
> You seem to be focused on a very narrow aspect of
> readability, i.e. fine details of individual character
> glyphs. That's not what we mean when we talk about
> readability of programs.

In this discussion yes, but layout aspects can be also
improved and I suppose special purpose of
language does not always dictate the layout of
code, it is up to you who can define that also.
And glyphs is not very narrow aspect, it is
one of the fundamental aspects. Also
it is much harder to develop than good layout, note that.

>> That is because that person from beginning
>> (blindly) follows the convention.
>
> What you seem to be missing is that there are
> *reasons* for those conventions. They were not
> arbitrary choices.
Exactly, and in case of hex notation I fail to see
how my proposal with using letters instead of
what we have now, could be overseen at the time
of decision. There must *very* solid reason
for digits+letters against my variant, wonder what is it.
Hope not that mono-width reason.
And basic readability principles is somewhat that
was clear for people 2000 years ago already.

>
> So, if anything, *you're* the one who is "blindly
> following tradition" by wanting to use base 10.
Yes because when I was a child I learned it
everywhere for everything, others too.
As said I don't defend usage of base-10
as you can already note from my posts.

>
>> 2. Better option would be to choose letters and
>>
>> possibly other glyphs to build up a more readable
>> set. E.g. drop "c" letter and leave "e" due to
>> their optical collision, drop some other weak glyphs,
>> like "l" "h". That is of course would raise
>> many further questions, like why you do you drop this
>> glyph and not this and so on so it will surely end up in quarrel.
>
>
> Well, that's the thing. If there were large, objective,
> easily measurable differences between different possible
> sets of glyphs, then there would be no room for such
> arguments.
Those things cannot be easiliy measured, if at all, it
requires a lot of tests and huge amount of time,
you cannot plug measure device to the brain to
precisely measure the load. In this case the only choice is to trust
most experienced people who show the results which worked
for them better and try self to implement and compare.
Not saying you have special reason
to trust me personally.


>
> The fact that you anticipate such arguments suggests
> that any differences are likely to be small, hard
> to measure and/or subjective.
>
>> But I can bravely claim that it is better than *any*
>> hex notation, it just follows from what I have here
>> on paper on my table,
>
>
> I think "on paper" is the important thing here. I
> suspect you are looking at the published results from
> some study or other and greatly overestimating the
> size of the effects compared to other considerations.

If you try to google that particular topic you'll see that there
is zero related published material, there are tons of
papers on readability, but zero concrete proposals
or any attempts to develop something real.
That is the thing. I would look in results if there
was something. In my case I am looking at what I've achieved
during years of my work on it and indeed there some
interesting things there.
Not that I am overestimating the role of it, but indeed it
can really help in many cases, e.g, like in my example
with bitstrings.
Last but not the least, I am not a "paper ass" in any case,
I try to keep only experimantal work where possible.

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-15 Thread Steve Dower

FWIW, Python 3.6 should print this in the console just fine. Feel free to 
upgrade whenever you're ready.

Cheers,
Steve

-Original Message-
From: "Mikhail V" <mikhail...@gmail.com>
Sent: ‎10/‎12/‎2016 16:07
To: "M.-A. Lemburg" <m...@egenix.com>
Cc: "python-ideas@python.org" <python-ideas@python.org>
Subject: Re: [Python-ideas] Proposal for default character representation

Forgot to reply to all, duping my mesage...

On 12 October 2016 at 23:48, M.-A. Lemburg <m...@egenix.com> wrote:

> Hmm, in Python3, I get:
>
>>>> s = "абв.txt"
>>>> s
> 'абв.txt'

I posted output with Python2 and Windows 7
BTW , In Windows 10 'print'  won't work in cmd console at all by default
with unicode but thats another story, let us not go into that.
I think you get my idea right, it is not only about printing.

> The hex notation for \u is a standard also used in many other
> programming languages, it's also easier to parse, so I don't
> think we should change this default.

In programming literature it is used often, but let me point out that
decimal is THE standard and is much much better standard
in sence of readability. And there is no solid reason to use 2 standards
at the same time.

>
> Take e.g.
>
>>>> s = "\u123456"
>>>> s
> 'ሴ56'
>
> With decimal notation, it's not clear where to end parsing
> the digit notation.

How it is not clear if the digit amount is fixed? Not very clear what
did you mean.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-15 Thread Chris Angelico

On Sun, Oct 16, 2016 at 12:06 AM, Mikhail V  wrote:
> But I can bravely claim that it is better than *any*
> hex notation, it just follows from what I have here
> on paper on my table, namely that it is physically
> impossible to make up highly effective glyph system
> of more than 8 symbols.

You should go and hang out with jmf. Both of you have made bold
assertions that our current system is physically/mathematically
impossible, despite the fact that *it is working*. Neither of you can
cite any actual scientific research to back your claims.

Bye bye.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-15 Thread Mikhail V

On 14 October 2016 at 11:36, Greg Ewing  wrote:

>but bash wasn't designed for that.
>(The fact that some people use it that way says more
>about their dogged persistence in the face of
>adversity than it does about bash.)

I can not judge what bash is good for, since I never
tried to learn it. But it *looks* indeed frightening.
First feeling is OMG, I must close this and never
see again.
Also I can only hard imagine that special purpose
of some language can ignore readability,
even if it is assembler or whatever,
it can be made readable without much effort.
So I just look for some other solution for same task,
let it be 10 times more code.

> So for that
> person, using decimal would make the code *harder*
> to maintain.
> To a maintainer who doesn't have that familiarity,
> it makes no difference either way.

That is because that person from beginning
(blindly) follows the convention.
So my intention of course was not
to find out if the majority does or not,
but rather which one of two makes
more sence *initially*, just trying to imagine
that we can decide.
To be more precise, if you were to choose
between two options:

1. use hex for the glyph index and use
hex for numbers (e.g. some arbitrary
value like screen coordinates)
2. use decimal for both cases.

I personally choose option 2.
Probably nothing will convince me that option
1. will be better, all the more I don't
believe that anything more than base-8
makes much sense for readable numbers.
Just  little bit  dissapointed that others
again and again speak of convention.

>I just
>don't see this as being anywhere near being a
>significant problem.

I didn't mean that, it is just slightly
annoys me.

>>In standard ASCII
>>there are enough glyphs that would work way better
>>together,

>Out of curiosity, what glyphs do you have in mind?

If I were to decide, I would look into few options here:
1. Easy option which would raise less further
questions is to take 16 first lowercase letters.
2. Better option would be to choose letters and
possibly other glyphs to build up a more readable
set. E.g. drop "c" letter and leave "e" due to
their optical collision, drop some other weak glyphs,
like "l" "h". That is of course would raise
many further questions, like why you do you drop this
glyph and not this and so on so it will surely end up in quarrel.

Here lies another problem - non-constant width of letters,
but this is more the problem of fonts and rendering,
so adresses IDE and editors problematics.
But as said I won't recommend base 16 at all.

>>ұұ-ұ   ---ұ
>>
>>you can downscale the strings, so a 16-bit
>>value would be ~60 pixels wide

> Yes, you can make the characters narrow enough that
> you can take 4 of them in at once, almost as though
> they were a single glyph... at which point you've
> effectively just substituted one set of 16 glyphs

No no. I didn't mean to shrink them till they melt together.
The structure is still there, only that with such notation
you don't need to keep the glyph so big as with many-glyph systems.

>for another. Then you'd have to analyse whether the
>*combined* 4-element glyphs were easier to disinguish
>from each other than the ones they replaced. Since
>the new ones are made up of repetitions of just two
>elements, whereas the old ones contain a much more
>varied set of elements, I'd be skeptical about that.

I get your idea and this a very good point.
Seems you have experience in such things?
Currently I don't know for sure if such approach
more effective or less than others and for what case.
But I can bravely claim that it is better than *any*
hex notation, it just follows from what I have here
on paper on my table, namely that it is physically
impossible to make up highly effective glyph system
of more than 8 symbols. You want more only if really
*need* more glyphs.
And skepticism should always be present.

One thing however especially interests me, here not
only the differentiation of glyph comes in play,
but also positional principle which helps to compare
 and it can be beneficaial for specific cases.
So you can clearly see if one
number is two times bigger than other for example.
And of course, strictly speaking those bit groups are not glyphs,
you can call them of course so, but this is
just rhetorics. So one could call all english
written words also glyphs but they are not really.
But I get your analogy, this is how the tests
should be made.

>BTW, your choice of ұ because of its "peak readibility"
>seems to be a case of taking something out of context.
>The readability of a glyph can only be judged in terms
>of how easy it is to distinguish from other glyphs.

True and false. Each single taken glyph has a specific structure
and put alone it has optical qualities.
This is somewhat quite complicated and hardly
describable by words, but anyway, only tests can
tell what is better. In this case it is still 2
glyphs or better say one and a half glyph.

Re: [Python-ideas] Proposal for default character representation

2016-10-15 Thread M.-A. Lemburg

On 14.10.2016 10:26, Serhiy Storchaka wrote:
> On 13.10.16 17:50, Chris Angelico wrote:
>> Solution: Abolish most of the control characters. Let's define a brand
>> new character encoding with no "alphabetical garbage". These
>> characters will be sufficient for everyone:
>>
>> * [2] Formatting characters: space, newline. Everything else can go.
>> * [8] Digits: 01234567
>> * [26] Lower case Latin letters a-z
>> * [2] Vital social media characters: # (now officially called
>> "HASHTAG"), @
>> * [2] Can't-type-URLs-without-them: colon, slash (now called both
>> "SLASH" and "BACKSLASH")
>>
>> That's 40 characters that should cover all the important things anyone
>> does - namely, Twitter, Facebook, and email. We don't need punctuation
>> or capitalization, as they're dying arts and just make you look
>> pretentious.
> 
> https://en.wikipedia.org/wiki/DEC_Radix-50

And then we store Python identifiers in a single 64-bit word,
allow at most 20 chars per identifier and use the remaining
bits for cool things like type information :-)

Not a bad idea, really.

But then again: even microbits support Unicode these days, so
apparently there isn't much need for such memory footprint
optimizations anymore.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 15 2016)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...   http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...   http://zope.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
  http://www.malemburg.com/

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Greg Ewing


Steven D'Aprano wrote:
That's because some sequence of characters 
is being wrongly interpreted as an emoticon by the client software.


The only thing wrong here is that the client software
is trying to interpret the emoticons.

Emoticons are for *humans* to interpret, not software.
Subtlety and cleverness is part of their charm. If you
blatantly replace them with explicit images, you crush
that.

And don't even get me started on *animated* emoji...

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Steven D'Aprano

On Fri, Oct 14, 2016 at 07:56:29AM -0400, Random832 wrote:
> On Fri, Oct 14, 2016, at 01:54, Steven D'Aprano wrote:
> > Good luck with that last one. Even if you could convince the Chinese and 
> > Japanese to swap to ASCII, I'd like to see you pry the emoji out of the 
> > young folk's phones.
> 
> This is actually probably the one part of this proposal that *is*
> feasible. While encoding emoji as a single character each makes sense
> for a culture that already uses thousands of characters; before they
> existed the English-speaking software industry already had several
> competing "standards" emerging for encoding them as sequences of ASCII
> characters.

It really isn't feasible to use emoticons instead of emoji, not if 
you're serious about it. To put it bluntly, emoticons are amateur hour. 
Emoji implemented as dedicated code points are what professionals use. 
Why do you think phone manufacturers are standardising on dedicated code 
points instead of using emoticons?

Anyone who has every posted (say) source code on IRC, Usenet, email or 
many web forums has probably seen unexpected smileys in the middle of 
their code (false positives). That's because some sequence of characters 
is being wrongly interpreted as an emoticon by the client software. 
The more emoticons you support, the greater the chance this will 
happen. A concrete example: bash code in Pidgin (IRC) will often show 
unwanted smileys.

The quality of applications can vary greatly: once the false emoticon is 
displayed as a graphic, you may not be able to copy the source code 
containing the graphic and paste it into a text editor unchanged.

There are false negatives as well as false positives: if your :-) 
happens to fall on the boundary of a line, and your software breaks the 
sequence with a soft line break, instead of seeing the smiley face you 
expected, you might see a line ending with :- and a new line starting 
with ).

It's hard to use punctuation or brackets around emoticons without 
risking them being misinterpreted as an invalid or different sequence. 

If you are serious about offering smileys, snowmen and piles of poo to 
your users, you are much better off supporting real emoji (dedicated 
Unicode characters) instead of emoticons. It is much easier to support ☺ 
than :-) and you don't need any special software apart from fonts that 
support the emoji you care about.

-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Chris Angelico

On Fri, Oct 14, 2016 at 8:36 PM, Greg Ewing  wrote:
>> I know people who can read bash scripts
>> fast, but would you claim that bash syntax can be
>> any good compared to Python syntax?
>
>
> For the things that bash was designed to be good for,
> yes, it can. Python wins for anything beyond very
> simple programming, but bash wasn't designed for that.
> (The fact that some people use it that way says more
> about their dogged persistence in the face of
> adversity than it does about bash.)

And any time I look at a large and complex bash script and say "this
needs to be a Python script" or "this would be better done in Pike" or
whatever, I end up missing the convenient syntax of piping one thing
into another. Shell scripting languages are the undisputed kings of
process management.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Greg Ewing


Mikhail V wrote:


if "\u1230" <= c <= "\u123f":

and:

o = ord (c)
if 100 <= o <= 150:


Note that, if need be, you could also write that as

  if 0x64 <= o <= 0x96:


So yours is a valid code but for me its freaky,
and surely I stick to the second variant.


The thing is, where did you get those numbers from in
the first place?

If you got them in some way that gives them to you
in decimal, such as print(ord(c)), there is nothing
to stop you from writing them as decimal constants
in the code.

But if you got them e.g. by looking up a character
table that gives them to you in hex, you can equally
well put them in as hex constants. So there is no
particular advantage either way.


You said, I can better see in which unicode page
I am by looking at hex ordinal, but I hardly
need it, I just need to know one integer, namely
where some range begins, that's it.
Furthermore this is the code which would an average
programmer better read and maintain.


To a maintainer who is familiar with the layout of
the unicode code space, the hex representation of
a character is likely to have some meaning, whereas
the decimal representation will not. So for that
person, using decimal would make the code *harder*
to maintain.

To a maintainer who doesn't have that familiarity,
it makes no difference either way.

So your proposal would result in a *decrease* of
maintainability overall.


if I make a mistake, typo, or want to expand the range
by some value I need to make summ and substract
operation in my head to progress with my code effectively.
Is it clear now what I mean by
conversions back and forth?


Yes, but in my experience the number of times I've
had to do that kind of arithmetic with character codes
is very nearly zero. And when I do, I'm more likely to
get the computer to do it for me than work out the
numbers and then type them in as literals. I just
don't see this as being anywhere near being a
significant problem.


In standard ASCII
there are enough glyphs that would work way better
together,


Out of curiosity, what glyphs do you have in mind?


ұұ-ұ   ---ұ

you can downscale the strings, so a 16-bit
value would be ~60 pixels wide


Yes, you can make the characters narrow enough that
you can take 4 of them in at once, almost as though
they were a single glyph... at which point you've
effectively just substituted one set of 16 glyphs
for another. Then you'd have to analyse whether the
*combined* 4-element glyphs were easier to disinguish
from each other than the ones they replaced. Since
the new ones are made up of repetitions of just two
elements, whereas the old ones contain a much more
varied set of elements, I'd be skeptical about that.

BTW, your choice of ұ because of its "peak readibility"
seems to be a case of taking something out of context.
The readability of a glyph can only be judged in terms
of how easy it is to distinguish from other glyphs.
Here, the only thing that matters is distinguishing it
from the other symbol, so something like "|" would
perhaps be a better choice.

||-|   ---|


So if you are more
than 40 years old (sorry for some familiarity)
this can be really strong issue and unfortunately
hardly changeable.


Sure, being familiar with the current system means that
it would take me some effort to become proficient with
a new one.

What I'm far from convinced of is that I would gain any
benefit from making that effort, or that a fresh person
would be noticeably better off if they learned your new
system instead of the old one.

At this point you're probably going to say "Greg, it's
taken you 40 years to become that proficient in hex.
Someone learning my system would do it much faster!"

Well, no. When I was about 12 I built a computer whose
only I/O devices worked in binary. From the time I first
started toggling programs into it to the time I had the
whole binary/hex conversion table burned into my neurons
was maybe about 1 hour. And I wasn't even *trying* to
memorise it, it just happened.


It is not about speed, it is about brain load.
Chinese can read their hieroglyphs fast, but
the cognition load on the brain is 100 times higher
than current latin set.


Has that been measured? How?

This one sets off my skepticism alarm too, because
people that read Latin scripts don't read them a
letter at a time -- they recognise whole *words* at
once, or at least large chunks of them. The number of
English words is about the same order of magnitude
as the number of Chinese characters.


I know people who can read bash scripts
fast, but would you claim that bash syntax can be
any good compared to Python syntax?


For the things that bash was designed to be good for,
yes, it can. Python wins for anything beyond very
simple programming, but bash wasn't designed for that.
(The fact that some people use it that way says more
about their dogged persistence in the face of
adversity than it does about bash.)

I don't doubt that some sets of glyphs are easier to
distinguish from each

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Serhiy Storchaka


On 13.10.16 17:50, Chris Angelico wrote:

Solution: Abolish most of the control characters. Let's define a brand
new character encoding with no "alphabetical garbage". These
characters will be sufficient for everyone:

* [2] Formatting characters: space, newline. Everything else can go.
* [8] Digits: 01234567
* [26] Lower case Latin letters a-z
* [2] Vital social media characters: # (now officially called "HASHTAG"), @
* [2] Can't-type-URLs-without-them: colon, slash (now called both
"SLASH" and "BACKSLASH")

That's 40 characters that should cover all the important things anyone
does - namely, Twitter, Facebook, and email. We don't need punctuation
or capitalization, as they're dying arts and just make you look
pretentious.


https://en.wikipedia.org/wiki/DEC_Radix-50


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Chris Angelico

On Fri, Oct 14, 2016 at 7:18 PM, Cory Benfield  wrote:
> The many glyphs that exist for writing various human languages are not 
> inefficiency to be optimised away. Further, I should note that most places to 
> not legislate about what character sets are acceptable to transcribe their 
> languages. Indeed, plenty of non-romance-language-speakers have found ways to 
> transcribe their languages of choice into the limited 8-bit character sets 
> that the Anglophone world propagated: take a look at Arabish for the best 
> kind of example of this behaviour, where "الجو عامل ايه النهارده فى 
> إسكندرية؟" will get rendered as "el gaw 3amel eh elnaharda f eskendereya?”
>

I've worked with transliterations enough to have built myself a
dedicated translit tool. It's pretty straight-forward to come up with
something you can type on a US-English keyboard (eg "a\o" for "å", and
"d\-" for "đ"), and in some cases, it helps with visual/audio
synchronization, but nobody would ever claim that it's the best way to
represent that language.

https://github.com/Rosuav/LetItTrans/blob/master/25%20languages.srt

> But I think you’re in a tiny minority of people who believe that all 
> languages should be rendered in the same script. I can think of only two 
> reasons to argue for this:
>
> 1. Dealing with lots of scripts is technologically tricky and it would be 
> better if we didn’t bother. This is the anti-Unicode argument, and it’s a 
> weak argument, though it has the advantage of being internally consistent.
> 2. There is some genuine harm caused by learning non-ASCII scripts.

#1 does carry a decent bit of weight, but only if you start with the
assumption that "characters are bytes". If you once shed that
assumption (and the related assumption that "characters are 16-bit
numbers"), the only weight it carries is "right-to-left text is
hard"... and let's face it, that *is* hard, but there are far, far
harder problems in computing.

Oh wait. Naming things. In Hebrew.

That's hard.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Cory Benfield

> On 14 Oct 2016, at 08:53, Mikhail V  wrote:
> 
> What keeps people from using same characters?
> I will tell you what - it is local law. If you go to school you *have* to
> write in what is prescribed by big daddy. If youre in europe or America, you 
> are
> more lucky. And if you're in China you'll be punished if you
> want some freedom. So like it or not, learn hieroglyphs
> and become visually impaired in age of 18.

So you know, for the future, I think this comment is going to be the one that 
causes most of the people who were left to disengage with this discussion.

The many glyphs that exist for writing various human languages are not 
inefficiency to be optimised away. Further, I should note that most places to 
not legislate about what character sets are acceptable to transcribe their 
languages. Indeed, plenty of non-romance-language-speakers have found ways to 
transcribe their languages of choice into the limited 8-bit character sets that 
the Anglophone world propagated: take a look at Arabish for the best kind of 
example of this behaviour, where "الجو عامل ايه النهارده فى إسكندرية؟" will get 
rendered as "el gaw 3amel eh elnaharda f eskendereya?”

But I think you’re in a tiny minority of people who believe that all languages 
should be rendered in the same script. I can think of only two reasons to argue 
for this:

1. Dealing with lots of scripts is technologically tricky and it would be 
better if we didn’t bother. This is the anti-Unicode argument, and it’s a weak 
argument, though it has the advantage of being internally consistent.
2. There is some genuine harm caused by learning non-ASCII scripts.

Your paragraph suggest that you really believe that learning to write in Kanji 
(logographic system) as opposed to Katagana (alphabetic system with 48 
non-punctuation characters) somehow leads to active harm (your phrase was 
“become visually impaired”). I’m afraid that you’re really going to need to 
provide one hell of a citation for that, because that’s quite an extraordinary 
claim.

Otherwise, I’m afraid I have to say お先に失礼します.

Cory
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Chris Angelico

On Fri, Oct 14, 2016 at 6:53 PM, Mikhail V  wrote:
> On 13 October 2016 at 16:50, Chris Angelico  wrote:
>> On Fri, Oct 14, 2016 at 1:25 AM, Steven D'Aprano  wrote:
>>> On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote:
 and in long perspective when the world's alphabetical garbage will
 dissapear, two digits would be ok.
>>> Talking about "alphabetical garbage" like that makes you seem to be an
>>> ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even
>>> 7-bit ASCII has more than 100 characters (128).
>
> This is sort of rude. Are you from unicode consortium?

No, he's not. He just knows a thing or two.

>> Solution: Abolish most of the control characters. Let's define a brand
>> new character encoding with no "alphabetical garbage". These
>> characters will be sufficient for everyone:
>>
>> * [2] Formatting characters: space, newline. Everything else can go.
>> * [8] Digits: 01234567
>> * [26] Lower case Latin letters a-z
>> * [2] Vital social media characters: # (now officially called "HASHTAG"), @
>> * [2] Can't-type-URLs-without-them: colon, slash (now called both
>> "SLASH" and "BACKSLASH")
>>
>> That's 40 characters that should cover all the important things anyone
>> does - namely, Twitter, Facebook, and email. We don't need punctuation
>> or capitalization, as they're dying arts and just make you look
>> pretentious. I might have missed a few critical characters, but it
>> should be possible to fit it all within 64, which you can then
>> represent using two digits from our newly-restricted set; octal is
>> better than decimal, as it needs less symbols. (Oh, sorry, so that's
>> actually "50" characters, of which "32" are the letters. And we can
>> use up to "100" and still fit within two digits.)
>>
>> Is this the wrong approach, Mikhail?
>
> This is sort of correct approach. We do need punctuation however.
> And one does not need of course to make it too tight.
> So 8-bit units for text is excellent and enough space left for experiments.

... okay. I'm done arguing. Go do some translation work some time.
Here, have a read of some stuff I've written before.

http://rosuav.blogspot.com/2016/09/case-sensitivity-matters.html
http://rosuav.blogspot.com/2015/03/file-systems-case-insensitivity-is.html
http://rosuav.blogspot.com/2014/12/unicode-makes-life-easy.html

>> Perhaps we should go the other
>> way, then, and be *inclusive* of people who speak other languages.
>
> What keeps people from using same characters?
> I will tell you what - it is local law. If you go to school you *have* to
> write in what is prescribed by big daddy. If youre in europe or America, you 
> are
> more lucky. And if you're in China you'll be punished if you
> want some freedom. So like it or not, learn hieroglyphs
> and become visually impaired in age of 18.

Never mind about China and its political problems. All you need to do
is move around Europe for a bit and see how there are more sounds than
can be usefully represented. Turkish has a simple system wherein the
written and spoken forms have direct correspondence, which means they
need to distinguish eight fundamental vowels. How are you going to
spell those? Scandinavian languages make use of letters like "å"
(called "A with ring" in English, but identified by its sound in
Norwegian, same as our letters are - pronounced "Aww" or "Or" or "Au"
or thereabouts). To adequately represent both Turkish and Norwegian in
the same document, you *need* more letters than our 26.

>> Thanks to Unicode's rich collection of characters, we can represent
>> multiple languages in a single document;
>
> Can do it without unicode in 8-bit boundaries with tagged text,
> just need fonts for your language, of course if your
> local charset is less than 256 letters.

No, you can't. Also, you shouldn't. It makes virtually every text
operation impossible: you can't split and rejoin text without tracking
the encodings. Go try to write a text editor under your scheme and see
how hard it is.

> This is how it was before unicode I suppose.
> BTW I don't get it still what such revolutionary
> advantages has unicode compared to tagged text.

It's not tagged. That's the huge advantage.

>> script, but have different characters. Alphabetical garbage, or
>> accurate representations of sounds and words in those languages?
>
> Accurate with some 50 characters is more than enough.

Go build a chat room or something. Invite people to enter their names.
Now make sure you're courteous enough to display those names to
people. Try doing that without Unicode.

I'm done. None of this belongs on python-ideas - it's getting pretty
off-topic even for python-list, and you're talking about modifying
Python 2.7 which is a total non-starter anyway.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct:

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Mikhail V

On 13 October 2016 at 16:50, Chris Angelico  wrote:
> On Fri, Oct 14, 2016 at 1:25 AM, Steven D'Aprano  wrote:
>> On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote:
>>> and in long perspective when the world's alphabetical garbage will
>>> dissapear, two digits would be ok.
>> Talking about "alphabetical garbage" like that makes you seem to be an
>> ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even
>> 7-bit ASCII has more than 100 characters (128).

This is sort of rude. Are you from unicode consortium?

> Solution: Abolish most of the control characters. Let's define a brand
> new character encoding with no "alphabetical garbage". These
> characters will be sufficient for everyone:
>
> * [2] Formatting characters: space, newline. Everything else can go.
> * [8] Digits: 01234567
> * [26] Lower case Latin letters a-z
> * [2] Vital social media characters: # (now officially called "HASHTAG"), @
> * [2] Can't-type-URLs-without-them: colon, slash (now called both
> "SLASH" and "BACKSLASH")
>
> That's 40 characters that should cover all the important things anyone
> does - namely, Twitter, Facebook, and email. We don't need punctuation
> or capitalization, as they're dying arts and just make you look
> pretentious. I might have missed a few critical characters, but it
> should be possible to fit it all within 64, which you can then
> represent using two digits from our newly-restricted set; octal is
> better than decimal, as it needs less symbols. (Oh, sorry, so that's
> actually "50" characters, of which "32" are the letters. And we can
> use up to "100" and still fit within two digits.)
>
> Is this the wrong approach, Mikhail?

This is sort of correct approach. We do need punctuation however.
And one does not need of course to make it too tight.
So 8-bit units for text is excellent and enough space left for experiments.

> Perhaps we should go the other
> way, then, and be *inclusive* of people who speak other languages.

What keeps people from using same characters?
I will tell you what - it is local law. If you go to school you *have* to
write in what is prescribed by big daddy. If youre in europe or America, you are
more lucky. And if you're in China you'll be punished if you
want some freedom. So like it or not, learn hieroglyphs
and become visually impaired in age of 18.

> Thanks to Unicode's rich collection of characters, we can represent
> multiple languages in a single document;

Can do it without unicode in 8-bit boundaries with tagged text,
just need fonts for your language, of course if your
local charset is less than 256 letters.

This is how it was before unicode I suppose.
BTW I don't get it still what such revolutionary
advantages has unicode compared to tagged text.

> script, but have different characters. Alphabetical garbage, or
> accurate representations of sounds and words in those languages?

Accurate with some 50 characters is more than enough.

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Sjoerd Job Postmus

On Fri, Oct 14, 2016 at 08:05:40AM +0200, Mikhail V wrote:
> Any critics on it? Besides not following the unicode consortium.

Besides the other remarks on "tradition", I think this is where a big
problem lies: We should not deviate from a common standard (without
very good cause).

There are cases where a language does good by deviating from the common
standard. There are also cases where it is bad to deviate.

Almost all current programming languages understand unicode, for
instance:

* C: http://en.cppreference.com/w/c/language/escape
* C++: http://en.cppreference.com/w/cpp/language/escape
* JavaScript:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Grammar_and_types#Using_special_characters_in_strings

and that were only the first 3 I tried. They all use `\u` followed by 4
hexadecimal digits.

You may not like the current standard. You may think/know/... it to be
suboptimal for human comprehension. However, what you are suggesting is
a very costly change. A change where --- I believe --- Python should not
take the lead, but also should not be afraid to follow if other
programming languages start to change.

I would suggest that this is a change that might be best proposed to the
unicode consortium itself, instead of going to (just) a programming
language.

It'd be interesting to see whether or not you can convince the unicode
consortium that 8 symbols will be enough.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Mikhail V

On 13 October 2016 at 12:05, Cory Benfield  wrote:
>
> integer & 0x00FF  # Hex
> integer & 16777215  # Decimal
> integer & 0o  # Octal
> integer & 0b  # Binary
>
> The octal representation is infuriating because one octal digit refers to 
> *three* bits

Correct, makes it not so nice looking and 8-bit-paradigm friendly.
Does not make it however
bad option in general and according to my personal suppositions and
works on glyph
development the optimal set is exactly of 8 glyphs.

> Decimal is no clearer.

In same alignment problematic context, yes, correct.

> Binary notation seems like the solution, ...

Agree with you, see my last reply to Greg for more thoughts on bitstrings
and quoternary approach.

> IIRC there’s some new syntax coming for binary literals
> that would let us represent them as 0b___

Very good. Healthy attitude :)

> less dense and loses clarity for many kinds of unusual bit patterns.

Not very clear for me what is exactly there with patterns.

> Additionally, as the number of bits increases life gets really hard:
> masking out certain bits of a 64-bit number requires

Self the editing of such a BITmask in hex notation makes life hard.
Editing it in binary notation makes life easier.

> a literal that’s at least 66 characters long,

Length is a feature of binary, though it is not major issue,
see my ideas on it in reply to Greg

> Hexadecimal has the clear advantage that each character wholly represents 4 
> bits,

This advantage is brevity, but one need slightly less brevity to make
it more readable.
So what do you think about base 4 ?

> This is a very long argument to suggest that your
> argument against hexadecimal literals
> (namely, that they use 16 glyphs as opposed
> to the 10 glyphs used in decimal)
> is an argument that is too simple to be correct.

I didn't understood this sorry :)))
Youre welcome to ask more if youre intersted in this.

> Different collections of glyphs are clearer in different contexts.
How much different collections and how much different contexts?

> while the english language requires 26 glyphs plus punctuation.

Does not *require*, but of course 8 glyphs would not suffice to effectively
read the language, so one finds a way to extend the glyph set.
Roughly speaking 20 letters is enough, but this is not exact science.
And it is quite hard science.

> But I don’t think you’re seriously proposing we should
> swap from writing English using the larger glyph set
> to writing it in decimal representation of ASCII bytes.

I didn't understand this sentence :)

In general I think we agree on many points, thank you for the input!

Cheers,
Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Jonathan Goble

On Fri, Oct 14, 2016 at 1:54 AM, Steven D'Aprano  wrote:
>> and:
>>
>> o = ord (c)
>> if 100 <= o <= 150:
>
> Which is clearly not the same thing, and better written as:
>
> if "d" <= c <= "\x96":
> ...

Or, if you really want to use ord(), you can use hex literals:

o = ord(c)
if 0x64 <= o <= 0x96:
...
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-14 Thread Mikhail V

On 13 October 2016 at 10:18, M.-A. Lemburg  wrote:

> I suppose you did not intend everyone to have to write
> \u010 just to get a newline code point to avoid the
> ambiguity.

Ok there are different usage cases.
So in short without going into detail,
for example if I need to type in a unicode
string literal in ASCII editor I would find such notation
replacement beneficial for me:

u'\u0430\u0431\u0432.txt'
-->
u"{1072}{1073}{1074}.txt"

Printing could be the same I suppose.
I use Python 2.7. And printing so
with numbers instead of non-ASCII would help me see
where I have non-ASCII chars. But I think the print
behavior must be easily configurable.

Any critics on it? Besides not following the unicode consortium.
Also I would not even mind fixed width 7-digit decimals actually.
Ugly but still for me better than hex.

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Steven D'Aprano

On Fri, Oct 14, 2016 at 07:21:48AM +0200, Mikhail V wrote:

> I'll explain what I mean with an example.
> This is an example which I would make to
> support my proposal. Compare:
> 
> if "\u1230" <= c <= "\u123f":

For an English-speaker writing that, I'd recommend:

if "\N{ETHIOPIC SYLLABLE SA}" <= c <= "\N{ETHIOPIC SYLLABLE SHWA}":
...

which is a bit verbose, but that's the price you pay for programming 
with text in a language you don't read. If you do read Ethiopian, then 
you can simply write:

if "ሰ" <= c <= "ሿ":
...

which to a literate reader of Ethiopean, is no harder to understand than 
the strange and mysterious rotated and reflected glyphs used by Europeans:

if "d" <= c <= "p":
...

(Why is "double-u" written as vv (w) instead of uu?)

> and:
> 
> o = ord (c)
> if 100 <= o <= 150:

Which is clearly not the same thing, and better written as:

if "d" <= c <= "\x96":
...

> So yours is a valid code but for me its freaky,
> and surely I stick to the second variant.
> You said, I can better see in which unicode page
> I am by looking at hex ordinal, but I hardly
> need it, I just need to know one integer, namely
> where some range begins, that's it.
> Furthermore this is the code which would an average
> programmer better read and maintain.

No, the average programmer is MUCH more skillful than that. Your 
standard for what you consider "average" seems to me to be more like 
"lowest 20%".

[...]
> I feel however like being misunderstood or so.

Trust me, we understand you perfectly. You personally aren't familiar or 
comfortable with hexadecimal, Unicode code points, or programming 
standards which have been in widespread use for at least 35 years, and 
probably more like 50, but rather than accepting that this means you 
have a lot to learn, you think you can convince the rest of the world to 
dumb-down and de-skill to a level that you are comfortable with. And 
that eventually the entire world will standardise on just 100 
characters, which you think is enough for all communication, maths and 
science.

Good luck with that last one. Even if you could convince the Chinese and 
Japanese to swap to ASCII, I'd like to see you pry the emoji out of the 
young folk's phones.

[...]
> It is not about speed, it is about brain load.
> Chinese can read their hieroglyphs fast, but
> the cognition load on the brain is 100 times higher
> than current latin set.

Citation required.

-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Ned Batchelder

On 10/13/16 2:42 AM, Mikhail V wrote:
> On 13 October 2016 at 08:02, Greg Ewing  wrote:
>> Mikhail V wrote:
>>> Consider unicode table as an array with glyphs.
>>
>> You mean like this one?
>>
>> http://unicode-table.com/en/
>>
>> Unless I've miscounted, that one has the characters
>> arranged in rows of 16, so it would be *harder* to
>> look up a decimal index in it.
>>
>> --
>> Greg
> Nice point finally, I admit, although quite minor. Where
> the data implies such pagings or alignment, the notation
> should be (probably) more binary-oriented.
> But: you claim to see bit patterns in hex numbers? Then I bet you will
> see them much better if you take binary notation (2 symbols) or quaternary
> notation (4 symbols), I guarantee. And if you take consistent glyph set for 
> them
> also you'll see them twice better, also guarantee 100%.
> So not that the decimal is cool,
> but hex sucks (too big alphabet) and _the character set_ used for hex
> optically sucks.
> That is the point.
> On the other hand why would unicode glyph table which is to the
> biggest part a museum of glyphs would be necesserily
> paged in a binary-friendly manner and not in a decimal friendly
> manner? But I am not saying it should or not, its quite irrelevant
> for this particular case I think.

You continue to overlook the fact that Unicode codepoints are
conventionally presented in hexadecimal, including in the page you
linked us to.  This is the convention.  It makes sense to stick to the
convention. 

When I see a numeric representation of a character, there are only two
things I can do with it: look it up in a reference someplace, or glean
some meaning from it directly.  For looking things up, please remember
that all Unicode references use hex numbering. Looking up a character by
decimal numbers is simply more difficult than looking them up by hex
numbers.

For gleaning meaning directly, please keep in mind that Unicode
fundamentally structured around pages of 256 code points, organized into
planes of 256 pages.  The very structure of how code points are
allocated and grouped is based on a hexadecimal-friendly system.  The
blocks of codepoints are aligned on hexadecimal boundaries:
http://www.fileformat.info/info/unicode/block/index.htm .  When I see
\u0414, I know it is a Cyrillic character because it is in block 04xx.

It simply doesn't make sense to present Unicode code points in anything
other than hex.

--Ned.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Thomas Nyberg


On 10/12/2016 07:13 PM, Mikhail V wrote:

On 12 October 2016 at 23:50, Thomas Nyberg  wrote:

Since when was decimal notation "standard"?

Depends on what planet do you live. I live on planet Earth. And you?


If you mean that decimal notation is the standard used for _counting_ by 
people, then yes of course that is standard. But decimal notation 
certainly is not standard in this domain.



opposite. For unicode representations, byte notation seems standard.

How does this make it a good idea?
Consider unicode table as an array with glyphs.
Now the index of the array is suddenly represented in some
obscure character set. How this index is other than index of any
array or natural number? Think about it...


Hexadecimal notation is hardly "obscure", but yes I understand that 
fewer people understand it than decimal notation. Regardless, byte 
notation seems standard for unicode and unless you can convince the 
unicode community at large to switch, I don't think it makes any sense 
for python to switch. Sometimes it's better to go with the flow even if 
you don't want to.



2. Mixing of two notations (hex and decimal) is a _very_ bad idea,
I hope no need to explain why.


Still not sure which "mixing" you refer to.


Still not sure? These two words in brackets. Mixing those two systems.



There is not mixing for unicode in python; it displays as hexadecimal. 
Decimal is used in other places though. So if by "mixing" you mean 
python should not use the standard notations of subdomains when working 
with those domains, then I would totally disagree. The language used in 
different disciplines is and has always been variable. Until that's no 
longer true it's better to stick with convention than add inconsistency 
which will be much more confusing in the long-term than learning the 
idiosyncrasies of a specific domain (in this case the use of hexadecimal 
in the unicode world).


Cheers,
Thomas
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Chris Angelico

On Fri, Oct 14, 2016 at 1:25 AM, Steven D'Aprano  wrote:
> On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote:
>> and in long perspective when the world's alphabetical garbage will
>> dissapear, two digits would be ok.
> Talking about "alphabetical garbage" like that makes you seem to be an
> ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even
> 7-bit ASCII has more than 100 characters (128).

Solution: Abolish most of the control characters. Let's define a brand
new character encoding with no "alphabetical garbage". These
characters will be sufficient for everyone:

* [2] Formatting characters: space, newline. Everything else can go.
* [8] Digits: 01234567
* [26] Lower case Latin letters a-z
* [2] Vital social media characters: # (now officially called "HASHTAG"), @
* [2] Can't-type-URLs-without-them: colon, slash (now called both
"SLASH" and "BACKSLASH")

That's 40 characters that should cover all the important things anyone
does - namely, Twitter, Facebook, and email. We don't need punctuation
or capitalization, as they're dying arts and just make you look
pretentious. I might have missed a few critical characters, but it
should be possible to fit it all within 64, which you can then
represent using two digits from our newly-restricted set; octal is
better than decimal, as it needs less symbols. (Oh, sorry, so that's
actually "50" characters, of which "32" are the letters. And we can
use up to "100" and still fit within two digits.)

Is this the wrong approach, Mikhail? Perhaps we should go the other
way, then, and be *inclusive* of people who speak other languages.
Thanks to Unicode's rich collection of characters, we can represent
multiple languages in a single document; see, for instance, how this
uses four languages and three entirely distinct scripts:
http://youtu.be/iydlR_ptLmk Turkish and French both use the Latin
script, but have different characters. Alphabetical garbage, or
accurate representations of sounds and words in those languages?

Python 3 gives the world's languages equal footing. This is a feature,
not a bug. It has consequences, including that arbitrary character
entities could involve up to seven decimal digits or six hex (although
for most practical work, six decimal or five hex will suffice). Those
consequences are a trivial price to pay for uniting the whole
internet, as opposed to having pockets of different languages, like we
had up until the 90s.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Steven D'Aprano

On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote:

> > How many decimal digits would you use to denote a single character?
> 
> for text, three decimal digits would be enough for me personally,

Well, if it's enough for you, why would anyone need more?

> and in long perspective when the world's alphabetical garbage will
> dissapear, two digits would be ok.

Are you serious? 

Talking about "alphabetical garbage" like that makes you seem to be an 
ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even 
7-bit ASCII has more than 100 characters (128).

-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Chris Angelico

On Thu, Oct 13, 2016 at 9:05 PM, Cory Benfield  wrote:
> Binary notation seems like the solution, but note the above case: the only 
> way to work out how many bits are being masked out is to count them, and 
> there can be quite a lot. IIRC there’s some new syntax coming for binary 
> literals that would let us represent them as 0b___, which 
> would help the readability case, but it’s still substantially less dense and 
> loses clarity for many kinds of unusual bit patterns.
>

And if you were to write them like this, you would start to read them
in blocks of four - effectively, treating each underscore-separated
unit as a glyph, despite them being represented with four characters.
Fortunately, just like with Hangul characters, we have a
transformation that combines these multi-character glyphs into single
characters. We call it 'hexadecimal'.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Greg Ewing


Mikhail V wrote:

Eee how would I find if the character lies in certain range?


>>> c = "\u1235"
>>> if "\u1230" <= c <= "\u123f":
...  print("Boo!")
...
Boo!

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Greg Ewing


Mikhail V wrote:

Ok, but if I write a string filtering in Python for example then
obviously I use decimal everywhere to compare index ranges, etc.
so what is the use for me of that label? Just redundant
conversions back and forth. 


I'm not sure what you mean by that. If by "index ranges"
you're talking about the numbers you use to index into
the string, they have nothing to do with character codes,
so you can write them in whatever base is most convenient
for you.

If you have occasion to write a literal representing a
character code, there's nothing to stop you writing it
in hex to match the way it's shown in a repr(), or in
published Unicode tables, etc.

I don't see a need for any conversions back and forth.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Greg Ewing


Mikhail V wrote:

I am not against base-16 itself in the first place,
but rather against the character set which is simply visually
inconsistent and not readable.


Now you're talking about inventing new characters, or
at least new glyphs for existing ones, and persuading
everyone to use them. That's well beyond the scope of
what Python can achieve!

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Greg Ewing


Mikhail V wrote:

Did you see much code written with hex literals?


From /usr/include/sys/fcntl.h:

/*
 * File status flags: these are used by open(2), fcntl(2).
 * They are also used (indirectly) in the kernel file structure f_flags,
 * which is a superset of the open/fcntl flags.  Open flags and f_flags
 * are inter-convertible using OFLAGS(fflags) and FFLAGS(oflags).
 * Open/fcntl flags begin with O_; kernel-internal flags begin with F.
 */
/* open-only flags */
#define O_RDONLY0x  /* open for reading only */
#define O_WRONLY0x0001  /* open for writing only */
#define O_RDWR  0x0002  /* open for reading and writing */
#define O_ACCMODE   0x0003  /* mask for above modes */

/*
 * Kernel encoding of open mode; separate read and write bits that are
 * independently testable: 1 greater than the above.
 *
 * XXX
 * FREAD and FWRITE are excluded from the #ifdef KERNEL so that TIOCFLUSH,
 * which was documented to use FREAD/FWRITE, continues to work.
 */
#if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE)
#define FREAD   0x0001
#define FWRITE  0x0002
#endif
#define O_NONBLOCK  0x0004  /* no delay */
#define O_APPEND0x0008  /* set append mode */
#ifndef O_SYNC  /* allow simultaneous inclusion of  */
#define O_SYNC  0x0080  /* synch I/O file integrity */
#endif
#if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE)
#define O_SHLOCK0x0010  /* open with shared file lock */
#define O_EXLOCK0x0020  /* open with exclusive file lock */
#define O_ASYNC 0x0040  /* signal pgrp when data ready */
#define O_FSYNC O_SYNC  /* source compatibility: do not use */
#define O_NOFOLLOW  0x0100  /* don't follow symlinks */
#endif /* (_POSIX_C_SOURCE && !_DARWIN_C_SOURCE) */
#define O_CREAT 0x0200  /* create if nonexistant */
#define O_TRUNC 0x0400  /* truncate to zero length */
#define O_EXCL  0x0800  /* error if already exists */
#if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE)
#define O_EVTONLY   0x8000  /* descriptor requested for event 
notifications only */
#endif


#define O_NOCTTY0x2 /* don't assign controlling terminal */


#if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE)
#define O_DIRECTORY 0x10
#define O_SYMLINK   0x20/* allow open of a symlink */
#endif

#ifndef O_DSYNC /* allow simultaneous inclusion of  */
#define O_DSYNC 0x40/* synch I/O data integrity */
#endif

--
Greg

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread M.-A. Lemburg

On 13.10.2016 01:06, Mikhail V wrote:
> On 12 October 2016 at 23:48, M.-A. Lemburg  wrote:
>> The hex notation for \u is a standard also used in many other
>> programming languages, it's also easier to parse, so I don't
>> think we should change this default.
> 
> In programming literature it is used often, but let me point out that
> decimal is THE standard and is much much better standard
> in sence of readability. And there is no solid reason to use 2 standards
> at the same time.

I guess it's a matter of choosing the right standard for the
right purpose. For \u and \U the intention was to be able
to represent a Unicode code point using its standard Unicode ordinal
representation and since the standard uses hex for this, it's
quite natural to use the same here.

>> Take e.g.
>>
> s = "\u123456"
> s
>> 'ሴ56'
>>
>> With decimal notation, it's not clear where to end parsing
>> the digit notation.
> 
> How it is not clear if the digit amount is fixed? Not very clear what
> did you mean.

Unicode code points have ordinals from the range [0, 1114111],
so it's not clear where to stop parsing the decimal representation
and continue to interpret the literal as regular string, since
I suppose you did not intend everyone to have to write
\u010 just to get a newline code point to avoid the
ambiguity.

PS: I'm not even talking about the breakage such a change would
cause. This discussion is merely about the pointing out how
things got to be how they are now.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 13 2016)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...   http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...   http://zope.egenix.com/

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
  http://www.malemburg.com/

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Mikhail V

On 13 October 2016 at 08:02, Greg Ewing  wrote:
> Mikhail V wrote:
>>
>> Consider unicode table as an array with glyphs.
>
>
> You mean like this one?
>
> http://unicode-table.com/en/
>
> Unless I've miscounted, that one has the characters
> arranged in rows of 16, so it would be *harder* to
> look up a decimal index in it.
>
> --
> Greg

Nice point finally, I admit, although quite minor. Where
the data implies such pagings or alignment, the notation
should be (probably) more binary-oriented.
But: you claim to see bit patterns in hex numbers? Then I bet you will
see them much better if you take binary notation (2 symbols) or quaternary
notation (4 symbols), I guarantee. And if you take consistent glyph set for them
also you'll see them twice better, also guarantee 100%.
So not that the decimal is cool,
but hex sucks (too big alphabet) and _the character set_ used for hex
optically sucks.
That is the point.
On the other hand why would unicode glyph table which is to the
biggest part a museum of glyphs would be necesserily
paged in a binary-friendly manner and not in a decimal friendly
manner? But I am not saying it should or not, its quite irrelevant
for this particular case I think.

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-13 Thread Mikhail V

On 13 October 2016 at 04:49, Emanuel Barry <vgr...@live.ca> wrote:
>> From: Mikhail V
>> Sent: Wednesday, October 12, 2016 9:57 PM
>> Subject: Re: [Python-ideas] Proposal for default character representation
>
> Hello, and welcome to Python-ideas, where only a small portion of ideas go
> further, and where most newcomers that wish to improve the language get hit
> by the reality bat! I hope you enjoy your stay :)
Hi, thanks! I enjoy the conversation indeed , never had so much interesting
in a discussion actually!

>
>> On 13 October 2016 at 01:50, Chris Angelico <ros...@gmail.com> wrote:
>> > On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V <mikhail...@gmail.com>
>> wrote:
>> >
>> > Way WAY less readable, and I'm comfortable working in both hex and
>> decimal.
>>
>> Please don't mix the readability and personal habit, which previuos
>> repliers seems to do as well. Those two things has nothing
>> to do with each other. If you are comfortable with old roman numbering
>> system this does not make it readable.
>> And I am NOT comfortable with hex, as well as most people would
>> be glad to use single notation.
>> But some of them think that they are cool because they know several
>> numbering notations ;) But I bet few can actually understand which is more
>> readable.
>
> I'll turn your argument around: Not being comfortable with hex does not make
> it unreadable; it's a matter of habit (as Brendan pointed out in his
> separate reply).

Matter of habit does not reflect the readability, see my last reply to Brandan.
It is quite precise engeneering. And readability it is kind of serious
stuff especially
if you decide for programming carreer.  Young people underestimate it
and for oldies it is too late when they realize it :) And Python is all about
readability and I like it.

As for your other points, I'll need to read it with fresh head tomorrow,
Of course I don't believe this would all suddenly happen with Python,
or other programming language, it is just an idea anyway. And I do
want to learn more actually. Especially want to see some example
where it would be really beneficial to use hex, either technically
(some low level binary related stuff?) or regarding comprehension, which
is to my knowledge hardly possible.

> - Indexing, and that's completely irrelevant to the topic at hand (also see
> above bullet point).
Eee how would I find if the character lies in certain range?
With index here I meant it's numeric value, I just called it index
for some reason, I don't know why. So its a table - value and
corresponding glyph.
Just consieder analogy: I make an 3d array, first index is my value,
and 2nd 3rd is image pixels, so simply image stack. Why on earth would
I use for 1st index some other literals than decimal. Did you see much
code written
with hex literals? Some low level things probably ...

> - ord() which returns an integer (which can be interpreted in any base!),
Yes so my idea is to stick to other notations than hex. for low level
bit manipulation
obviously two-character notation should be used, so again I fail to
see something...


Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Greg Ewing


Mikhail V wrote:

And decimal is objectively way more readable than hex standard character set,
regardless of how strong your habits are.


That depends on what you're trying to read from it. I can
look at a hex number and instantly get a mental picture
of the bit pattern it represents. I can't do that with
decimal numbers.

This is the reason hex exists. It's used when the bit
pattern represented by a number is more important to
know than its numerical value. This is the case with
Unicode code points. Their numerical value is irrelevant,
but the bit pattern conveys useful information, such
as which page and plane it belongs to, whether it fits
in 1 or 2 bytes, etc.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Mikhail V

On 13 October 2016 at 04:18, Brendan Barnwell  wrote:
> On 2016-10-12 18:56, Mikhail V wrote:
>>
>> Please don't mix the readability and personal habit, which previuos
>> repliers seems to do as well. Those two things has nothing
>> to do with each other.
>
>
> You keep saying this, but it's quite incorrect.  The usage of
> decimal notation is itself just a convention, and the only reason it's easy
> for you (and for many other people) is because you're used to it.  If you
> had grown up using only hexadecimal or binary, you would find decimal
> awkward.

Exactly, but this is not called "readability" but rather
"acquired ability to read" or simply habit, which does not reflect
the "readability" of the character set itself.

> There is nothing objectively better about base 10 than any other
> place-value numbering system.

Sorry to say, but here you are totally wrong.
Not to treat you personally for your fallacy, that is quite common
among those who are not familiar with the topic, but you
should consider some important points:
---
1. Each taken character set has certain grade of readability
which depends solely on the form of its units (aka glyphs).
2. Linear string representation is superior to anything else (spiral, arc, etc.)
3. There exist glyphs which provide maximal readability,
those are particular glyphs with particular constant form, and
this form is absolutely independent from the encoding subject.
4. According to my personal studies (which does not mean
it must be accepted or blindly believed in, but I have solid experience
in this area and acting quite successful in it)
the amount of this glyphs is less then 10, namely I am by 8 glyphs now.
5. Main measured parameter which reflects the
readability (somewhat indirect however) is the pair-wize
optical collision of each character pair of the set.
This refers somewhat to legibility, or differentiation ability
of glyphs.
---

Less technically, you can understand it better if you think
of your own words
"There is nothing objectively better
about base 10 than any
other place-value numbering system."
If this could be ever true than you could read with characters that
are very similar to each other or something messy as good as
with characters which are easily identifyable, collision resistant
and optically consistent. But that is absurd, sorry.

For numbers obviously you don't need so many character as for
speech encoding, so this means that only those glyphs or even a subset
of it should be used. This means anything more than 8 characters
is quite worthless for reading numbers.
Note that I can't provide here the works currently
so don't ask me for that. Some of them would be probably
available in near future.

Your analogy with speech and signs is not correct because
speech is different but numbers are numbers.
But also for different speech, same character set must be used
namely the one with superior optical qualities, readability.

> Saying we should dump hex notation because everyone understands decimal is
> like saying that all signs in Prague should only be printed in English

We should dump hex notation because currently decimal
is simply superiour to hex, just like Mercedes is
superior to Lada, aand secondly, because it is more common
for ALL people, so it is 2:0 for not using such notation.
With that said, I am not against base-16 itself in the first place,
but rather against the character set which is simply visually
inconsistent and not readable.
Someone just took arabic digits and added
first latin letters to it. It could be forgiven for a schoolboy's
exercises in drawing but I fail to understand how it can be
accepted as a working notation for medium supposed
to be human readable.
Practically all this notation does, it reduces the time
before you as a programmer
become visual and brain impairments.

> Just look at the Wikipedia page for Unicode, which says: "Normally a
> Unicode code point is referred to by writing "U+" followed by its
> hexadecimal number."  That's it.

Yeah that's it. And it sucks and migrated to coding
standard, sucks twice.
If a new syntax/standard is decided, there'll
be only positive sides of using decimal vs hex.
So nobody'll be hurt, this is only the question of
remaking current implementation and is proposed
only as a long-term theoretical improvement.

> it's just
> a label that identifies the character.

Ok, but if I write a string filtering in Python for example then
obviously I use decimal everywhere to compare index ranges, etc.
so what is the use for me of that label? Just redundant
conversions back and forth. Makes me sick actually.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Emanuel Barry

> From: Mikhail V
> Sent: Wednesday, October 12, 2016 9:57 PM
> Subject: Re: [Python-ideas] Proposal for default character representation

Hello, and welcome to Python-ideas, where only a small portion of ideas go
further, and where most newcomers that wish to improve the language get hit
by the reality bat! I hope you enjoy your stay :) 

> On 13 October 2016 at 01:50, Chris Angelico <ros...@gmail.com> wrote:
> > On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V <mikhail...@gmail.com>
> wrote:
> >
> > Way WAY less readable, and I'm comfortable working in both hex and
> decimal.
> 
> Please don't mix the readability and personal habit, which previuos
> repliers seems to do as well. Those two things has nothing
> to do with each other. If you are comfortable with old roman numbering
> system this does not make it readable.
> And I am NOT comfortable with hex, as well as most people would
> be glad to use single notation.
> But some of them think that they are cool because they know several
> numbering notations ;) But I bet few can actually understand which is more
> readable.

I'll turn your argument around: Not being comfortable with hex does not make
it unreadable; it's a matter of habit (as Brendan pointed out in his
separate reply).

> > You're the one who's non-standard here. Most of the world uses hex for
> > Unicode codepoints.
> No I am not the one, many people find it silly to use different notations
> for same thing - index of the element, and they are very right about that.
> I am not silly, I refuse to use it and luckily I can. Also I know that
decimal
> is more readable than hex so my choice is supportend by the
> understanding and not simply refusing.

Unicode code points are represented using hex notation virtually everywhere
I ever saw it. Your Unicode-code-points-as-decimal website was a new
discovery for me (and, I presume, many others on this list). Since it's
widely used in the world, going against that effectively makes you
non-standard. That doesn't mean it's necessarily a bad thing, but it does
mean that your chances (or anyone's chances) of actually changing that are
equal to zero (and this isn't some gross exaggeration),

> >
> >> PS:
> >> that is rather peculiar, three negative replies already but with no
strong
> >> arguments why it would be bad to stick to decimal only, only some
> >> "others do it so" and "tradition" arguments.
> >
> > "Others do it so" is actually a very strong argument. If all the rest
> > of the world uses + to mean addition, and Python used + to mean
> > subtraction, it doesn't matter how logical that is, it is *wrong*.
> 
> This actually supports my proposal perfectly, if everyone uses decimal
> why suddenly use hex for same thing - index of array. I don't see how
> your analogy contradicts with my proposal, it's rather supporting it.

I fail to see your point here. Where is that "everyone uses decimal"? Unless
you stopped talking about representation in strings (which seems likely, as
you're talking about indexing?), everything is represented as hex.

> But I do want that you could abstract yourself from your habit for a while
> and talk about what would be better for the future usage.

I'll be that guy and tell you that you need to step back from your own idea
for a while and consider your proposal and the current state of things. I'll
also take the opportunity to reiterate that there is virtually no chance to
change this behaviour. This doesn't, however, prevent you or anyone from
talking about the topic, either for fun, or for finding other (related or
otherwise) areas of interest that you think might be worth investigating
further. A lot of threads actually branch off in different topics that came
up when discussing, and that are interesting enough to pursue on their own.

> > everyone has to do the conversion from that to 201C.
> 
> Nobody need to do ANY conversions if  use decimal,
> and as said everything is decimal: numbers, array indexes,
> ord() function returns decimal, you can imagine more examples
> so it is not only more readable but also more traditional.

You're mixing up more than just one concept here:
- Integer literals; I assume this is what you meant, and you seem to forget
(or maybe you didn't know, in which case here's to learning something new!)
that 0xff is perfectly valid syntax, and store the integer with the value of
255 in base 10.

- Indexing, and that's completely irrelevant to the topic at hand (also see
above bullet point).

- ord() which returns an integer (which can be interpreted in any base!),
and that's both an argument for and against this proposal; the "against"
side is actually that decimal notation has no defined boundary for when to
stop (and before you argue

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Ryan Gonzalez

On Oct 12, 2016 9:25 PM, "Chris Angelico"  wrote:
>
> On Thu, Oct 13, 2016 at 12:56 PM, Mikhail V  wrote:
> >  But as said I find this Unicode only some temporary happening,
> >  it will go to history in some future and be
> > used only to study extinct glyphs.
>
> And what will we be using instead?
>

Emoji, of course! What else?

> Morbid curiosity trumping a plonking, for the moment.
>
> ChrisA
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

--
Ryan (ライアン)
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something’s wrong.
http://kirbyfan64.github.io/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Ryan Gonzalez

On Oct 12, 2016 4:33 PM, "Mikhail V"  wrote:
>
> Hello all,
>
> *snip*
>
> PROPOSAL:
> 1. Remove all hex notation from printing functions, typing,
> documention.
> So for printing functions leave the hex as an "option",
> for example for those who feel the need for hex representation,
> which is strange IMO.
> 2. Replace it with decimal notation, in this case e.g:
>
> u'\u0430\u0431\u0432.txt' becomes
> u'\u1072\u1073\u1074.txt'
>
> and similarly for other cases where raw bytes must be printed/inputed
> So to summarize: make the decimal notation standard for all cases.
> I am not going to go deeper, such as what digit amount (leading zeros)
> to use, since it's quite secondary decision.
>

If decimal notation isn't used for parsing, only for printing, it would be
confusing as heck, but using it for both would break a lot of code in
subtle ways (the worst kind of code breakage).

> MOTIVATION:
> 1. Hex notation is hardly readable. It was not designed with readability
> in mind, so for reading it is not appropriate system, at least with the
> current character set, which is a mix of digits and letters (curious who
> was that wize person who invented such a set?).

The Unicode standard.

I agree that hex is hard to read, but the standard uses it to refer to the
code points. It's great to be able to google code points and find the
characters easily, and switching to decimal would screw it up.

And I've never seen someone *need* to figure out the decimal version from
the hex before. It's far more likely to google the hex #.

TL;DR: I think this change would induce a LOT of short-term issues, despite
it being up in the air if there's any long-term gain.

So -1 from me.

> 2. Mixing of two notations (hex and decimal) is a _very_ bad idea,
> I hope no need to explain why.
>

Indeed, you don't. :)

> So that's it, in short.
> Feel free to discuss and comment.
>
> Regards,
> Mikhail
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

--
Ryan (ライアン)
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something’s wrong.
http://kirbyfan64.github.io/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Chris Angelico

On Thu, Oct 13, 2016 at 12:56 PM, Mikhail V  wrote:
>  But as said I find this Unicode only some temporary happening,
>  it will go to history in some future and be
> used only to study extinct glyphs.

And what will we be using instead?

Morbid curiosity trumping a plonking, for the moment.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Brendan Barnwell


On 2016-10-12 18:56, Mikhail V wrote:

Please don't mix the readability and personal habit, which previuos
repliers seems to do as well. Those two things has nothing
to do with each other.


	You keep saying this, but it's quite incorrect.  The usage of decimal 
notation is itself just a convention, and the only reason it's easy for 
you (and for many other people) is because you're used to it.  If you 
had grown up using only hexadecimal or binary, you would find decimal 
awkward.  There is nothing objectively better about base 10 than any 
other place-value numbering system.  Decimal is just a habit.


	Now, it's true that base-10 is at this point effectively universal 
across human societies, and that gives it a certain claim to primacy. 
But base-16 (along with base 2) is also quite common in computing 
contexts.  Saying we should dump hex notation because everyone 
understands decimal is like saying that all signs in Prague should only 
be printed in English because there are more English speakers in the 
entire world than Czech speakers.  But that ignores the fact that there 
are more Czech speakers *in Prague*.  Likewise, decimal may be more 
common as an overall numerical notation, but when it comes to referring 
to Unicode code points, hexadecimal is far and away more common.


	Just look at the Wikipedia page for Unicode, which says: "Normally a 
Unicode code point is referred to by writing "U+" followed by its 
hexadecimal number."  That's it.  You'll find the same thing on 
unicode.org.  The unicode code point is hardly even a number in the 
usual sense; it's just a label that identifies the character.  If you 
have an issue with using hex to represent unicode code points, your 
issue goes way beyond Python, and you need to take it up with the 
Unicode consortium.  (Good luck with that.)


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Mikhail V

On 13 October 2016 at 01:50, Chris Angelico  wrote:
> On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V  wrote:
>> On 12 October 2016 at 23:58, Danilo J. S. Bellini
>>  wrote:
>>
>>> Decimal notation is hardly
>>> readable when we're dealing with stuff designed in base 2 (e.g. due to the
>>> visual separation of distinct bytes).
>>
>> Hmm what keeps you from separateting the logical units to be represented each
>> by a decimal number? like 001 023 255 ...
>> Do you really think this is less readable than its hex equivalent?
>> Then you are probably working with hex numbers only, but I doubt that.
>
> Way WAY less readable, and I'm comfortable working in both hex and decimal.

Please don't mix the readability and personal habit, which previuos
repliers seems to do as well. Those two things has nothing
to do with each other. If you are comfortable with old roman numbering
system this does not make it readable.
And I am NOT comfortable with hex, as well as most people would
be glad to use single notation.
But some of them think that they are cool because they know several
numbering notations ;) But I bet few can actually understand which is more
readable.

> You're the one who's non-standard here. Most of the world uses hex for
> Unicode codepoints.
No I am not the one, many people find it silly to use different notations
for same thing - index of the element, and they are very right about that.
I am not silly, I refuse to use it and luckily I can. Also I know that decimal
is more readable than hex so my choice is supportend by the
understanding and not simply refusing.

>
>> PS:
>> that is rather peculiar, three negative replies already but with no strong
>> arguments why it would be bad to stick to decimal only, only some
>> "others do it so" and "tradition" arguments.
>
> "Others do it so" is actually a very strong argument. If all the rest
> of the world uses + to mean addition, and Python used + to mean
> subtraction, it doesn't matter how logical that is, it is *wrong*.

This actually supports my proposal perfectly, if everyone uses decimal
why suddenly use hex for same thing - index of array. I don't see how
your analogy contradicts with my proposal, it's rather supporting it.

> quote; if you us 0x93, you are annoyingly wrong,

Please don't make personal assessments here, I can use whatever I want,
moreover I find this notation as silly as using different measurement
systems without any reason and within one activity, and in my eyes
 this is annoyingly wrong and stupid, but I don't call nobody here stupid.

But I do want that you could abstract yourself from your habit for a while
and talk about what would be better for the future usage.

> everyone has to do the conversion from that to 201C.

Nobody need to do ANY conversions if  use decimal,
and as said everything is decimal: numbers, array indexes,
ord() function returns decimal, you can imagine more examples
so it is not only more readable but also more traditional.

> How many decimal digits would you use to denote a single character?

for text, three decimal digits would be enough for me personally,
and in long perspective when the world's alphabetical garbage will
dissapear, two digits would be ok.

> you have to pad everything to seven digits (\u034 for an ASCII
> quote)?

Depends on case, for input  -
 some separator, or padding is also ok,
I don't have problems with both. For printing obviously don't show
leading zeros, but rather spaces.
 But as said I find this Unicode only some temporary happening,
 it will go to history in some future and be
used only to study extinct glyphs.

Mikhail
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Chris Angelico

On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V  wrote:
> On 12 October 2016 at 23:58, Danilo J. S. Bellini
>  wrote:
>
>> Decimal notation is hardly
>> readable when we're dealing with stuff designed in base 2 (e.g. due to the
>> visual separation of distinct bytes).
>
> Hmm what keeps you from separateting the logical units to be represented each
> by a decimal number? like 001 023 255 ...
> Do you really think this is less readable than its hex equivalent?
> Then you are probably working with hex numbers only, but I doubt that.

Way WAY less readable, and I'm comfortable working in both hex and decimal.

>> I agree that mixing representations for the same abstraction (using decimal
>> in some places, hexadecimal in other ones) can be a bad idea.
> "Can be"? It is indeed a horrible idea. Also not only for same abstraction
> but at all.
>
>> makes me believe "decimal unicode codepoint" shouldn't ever appear in string
>> representations.
> I use this site to look the chars up:
> http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html

You're the one who's non-standard here. Most of the world uses hex for
Unicode codepoints.

http://unicode.org/charts/

HTML entities permit either decimal or hex, but other than that, I
can't think of any common system that uses decimal for Unicode
codepoints in strings.

> PS:
> that is rather peculiar, three negative replies already but with no strong
> arguments why it would be bad to stick to decimal only, only some
> "others do it so" and "tradition" arguments.

"Others do it so" is actually a very strong argument. If all the rest
of the world uses + to mean addition, and Python used + to mean
subtraction, it doesn't matter how logical that is, it is *wrong*.
Most of the world uses U+201C or "\u201C" to represent a curly double
quote; if you us 0x93, you are annoyingly wrong, and if you use 8220,
everyone has to do the conversion from that to 201C. Yes, these are
all differently-valid standards, but that doesn't make it any less
annoying.

> Please note, I am talking only about readability _of the character
> set_ actually.
> And it is not including your habit issues, but rather is an objective
> criteria for using this or that character set.
> And decimal is objectively way more readable than hex standard character set,
> regardless of  how strong your habits are.

How many decimal digits would you use to denote a single character? Do
you have to pad everything to seven digits (\u034 for an ASCII
quote)? And if not, how do you mark the end? This is not "objectively
more readable" if the only gain is "no A-F" and the loss is
"unpredictable length".

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Mikhail V

On 12 October 2016 at 23:50, Thomas Nyberg  wrote:
> Since when was decimal notation "standard"?
Depends on what planet do you live. I live on planet Earth. And you?

> opposite. For unicode representations, byte notation seems standard.
How does this make it a good idea?
Consider unicode table as an array with glyphs.
Now the index of the array is suddenly represented in some
obscure character set. How this index is other than index of any
array or natural number? Think about it...

>> 2. Mixing of two notations (hex and decimal) is a _very_ bad idea,
>> I hope no need to explain why.
>
> Still not sure which "mixing" you refer to.

Still not sure? These two words in brackets. Mixing those two systems.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Mikhail V

On 12 October 2016 at 23:58, Danilo J. S. Bellini
 wrote:

> Decimal notation is hardly
> readable when we're dealing with stuff designed in base 2 (e.g. due to the
> visual separation of distinct bytes).

Hmm what keeps you from separateting the logical units to be represented each
by a decimal number? like 001 023 255 ...
Do you really think this is less readable than its hex equivalent?
Then you are probably working with hex numbers only, but I doubt that.

> I agree that mixing representations for the same abstraction (using decimal
> in some places, hexadecimal in other ones) can be a bad idea.
"Can be"? It is indeed a horrible idea. Also not only for same abstraction
but at all.

> makes me believe "decimal unicode codepoint" shouldn't ever appear in string
> representations.
I use this site to look the chars up:
http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html

PS:
that is rather peculiar, three negative replies already but with no strong
arguments why it would be bad to stick to decimal only, only some
"others do it so" and "tradition" arguments.
The "base 2" argument could work at some grade but if stick to this
criteria why not speak about octal/quoternary/binary then?

Please note, I am talking only about readability _of the character
set_ actually.
And it is not including your habit issues, but rather is an objective
criteria for using this or that character set.
And decimal is objectively way more readable than hex standard character set,
regardless of  how strong your habits are.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Mikhail V

Forgot to reply to all, duping my mesage...

On 12 October 2016 at 23:48, M.-A. Lemburg  wrote:

> Hmm, in Python3, I get:
>
 s = "абв.txt"
 s
> 'абв.txt'

I posted output with Python2 and Windows 7
BTW , In Windows 10 'print'  won't work in cmd console at all by default
with unicode but thats another story, let us not go into that.
I think you get my idea right, it is not only about printing.

> The hex notation for \u is a standard also used in many other
> programming languages, it's also easier to parse, so I don't
> think we should change this default.

In programming literature it is used often, but let me point out that
decimal is THE standard and is much much better standard
in sence of readability. And there is no solid reason to use 2 standards
at the same time.

>
> Take e.g.
>
 s = "\u123456"
 s
> 'ሴ56'
>
> With decimal notation, it's not clear where to end parsing
> the digit notation.

How it is not clear if the digit amount is fixed? Not very clear what
did you mean.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Danilo J. S. Bellini

I'm -1 on this.

Just type "0431 unicode" on your favorite search engine. U+0431 is the
codepoint, not whatever digits 0x431 has in decimal. That's a tradition and
something external to Python.

As a related concern, I think using decimal/octal on raw data is a terrible
idea (e.g. On Linux, I always have to re-format the "cmp -l" to really
grasp what's going on, changing it to hexadecimal). Decimal notation is
hardly readable when we're dealing with stuff designed in base 2 (e.g. due
to the visual separation of distinct bytes). How many people use "hexdump"
(or any binary file viewer) with decimal output instead of hexadecimal?

I agree that mixing representations for the same abstraction (using decimal
in some places, hexadecimal in other ones) can be a bad idea. Actually,
that makes me believe "decimal unicode codepoint" shouldn't ever appear in
string representations.

--
Danilo J. S. Bellini
---
"*It is not our business to set up prohibitions, but to arrive at
conventions.*" (R. Carnap)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread M.-A. Lemburg

On 12.10.2016 23:33, Mikhail V wrote:
> Hello all,
> 
> I want to share my thoughts about syntax improvements regarding
> character representation in Python.
> I am new to the list so if such a discussion or a PEP exists already,
> please let me know.
> 
> So in short:
> 
> Currently Python uses hexadecimal notation
> for characters for input and output.
> For example let's take a unicode string "абв.txt"
> (a file named with first three Cyrillic letters).
> 
> Now printing  it we get:
> 
> u'\u0430\u0431\u0432.txt'

Hmm, in Python3, I get:

>>> s = "абв.txt"
>>> s
'абв.txt'

> So one sees that we have hex numbers here.
> Same is for typing in the strings which obviously also uses hex.
> Same is for some parts of the Python documentation,
> especially those about unicode strings.
> 
> PROPOSAL:
> 1. Remove all hex notation from printing functions, typing,
> documention.
> So for printing functions leave the hex as an "option",
> for example for those who feel the need for hex representation,
> which is strange IMO.
> 2. Replace it with decimal notation, in this case e.g:
> 
> u'\u0430\u0431\u0432.txt' becomes
> u'\u1072\u1073\u1074.txt'
> 
> and similarly for other cases where raw bytes must be printed/inputed
> So to summarize: make the decimal notation standard for all cases.
> I am not going to go deeper, such as what digit amount (leading zeros)
> to use, since it's quite secondary decision.
> 
> MOTIVATION:
> 1. Hex notation is hardly readable. It was not designed with readability
> in mind, so for reading it is not appropriate system, at least with the
> current character set, which is a mix of digits and letters (curious who
> was that wize person who invented such a set?).
> 2. Mixing of two notations (hex and decimal) is a _very_ bad idea,
> I hope no need to explain why.
> 
> So that's it, in short.
> Feel free to discuss and comment.

The hex notation for \u is a standard also used in many other
programming languages, it's also easier to parse, so I don't
think we should change this default.

Take e.g.

>>> s = "\u123456"
>>> s
'ሴ56'

With decimal notation, it's not clear where to end parsing
the digit notation.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 12 2016)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...   http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...   http://zope.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
  http://www.malemburg.com/

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal for default character representation

2016-10-12 Thread Thomas Nyberg


On 10/12/2016 05:33 PM, Mikhail V wrote:

Hello all,


Hello! New to this list so not sure if I can reply here... :)



Now printing  it we get:

u'\u0430\u0431\u0432.txt'



By "printing it", do you mean "this is the string representation"? I 
would presume printing it would show characters nicely rendered. Does it 
not for you?


and similarly for other cases where raw bytes must be printed/inputed
So to summarize: make the decimal notation standard for all cases.
I am not going to go deeper, such as what digit amount (leading zeros)
to use, since it's quite secondary decision.


Since when was decimal notation "standard"? It seems to be quite the 
opposite. For unicode representations, byte notation seems standard.



MOTIVATION:
1. Hex notation is hardly readable. It was not designed with readability
in mind, so for reading it is not appropriate system, at least with the
current character set, which is a mix of digits and letters (curious who
was that wize person who invented such a set?).


This is an opinion. I should clarify that for many cases I personally 
find byte notation much simpler. In this case, I view it as a toss up 
though for something like utf8-encoded text I would had it if I saw 
decimal numbers and not bytes.



2. Mixing of two notations (hex and decimal) is a _very_ bad idea,
I hope no need to explain why.


Still not sure which "mixing" you refer to.



So that's it, in short.
Feel free to discuss and comment.

Regards,
Mikhail


Cheers,
Thomas
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

51 matches

Mail list logo