[Python-ideas] Re: Proposal: -X importcache to supplement -X importtime for loaded modules

2023-02-14 Thread Inada Naoki
`-X importtime=2`

This is expert-only tool. So no need for verbose name.

On Sat, Feb 11, 2023 at 10:01 PM Noah Kim  wrote:
>
> All,
>
> I'm writing to propose an adjacent interpreter flag to `-X importtime`: `-X 
> importcache` (open to alternative naming).
>
> While `-X importtime` is incredibly useful for analyzing module import times, 
> by design, it doesn't log anything if an imported module has already been 
> loaded. `-X importcache` would provide additional output for every module 
> that's already been loaded:
>
> ```
> >>> import uuid
> import time: cached| cached |   _io
> import time: cached| cached |   _io
> import time: cached| cached |   os
> import time: cached| cached |   sys
> import time: cached| cached |   enum
> import time: cached| cached | _io
> import time: cached| cached | _io
> import time: cached| cached | collections
> import time: cached| cached | os
> import time: cached| cached | re
> import time: cached| cached | sys
> import time: cached| cached | functools
> import time: cached| cached | itertools
> import time:   151 |151 | _wmi
> import time: 18290 |  18440 |   platform
> import time:   372 |372 |   _uuid
> import time: 10955 |  29766 | uuid
> ```
>
> In codebases with convoluted/poorly managed import graphs (and consequently, 
> workloads that suffer from long import times), the ability to record all 
> paths to an expensive dependency--not just the first-imported--can help 
> expedite refactoring (and help scale identification of this type of issue). 
> More generally, this flag would provide a more efficient path to tracking 
> runtime dependencies.
>
> As a proof of concept, I was able to hack this functionality into `-X 
> importtime` by adding a couple lines to `import_ensure_initialized` in 
> `Python/import.c` (hence the output above). A separate flag is probably 
> desirable to preserve backwards compatibility.
>
> Looking forward to your feedback,
> Noah
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/GEISYQ5BXWGKT33RWF77EOSOMMMFUBUS/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FO3FOBAQDOUROEHZRF3SZHNSKVW364CE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Add a line_offsets() method to str

2022-06-21 Thread Inada Naoki
On Tue, Jun 21, 2022 at 3:49 AM Marco Sulla
 wrote:
>
> On Sun, 19 Jun 2022 at 03:06, Inada Naoki  wrote:
> > FWIW, I had proposed str.iterlines() to fix incompatibility between
> > IO.readlines() and str.splitlines().
>
> It's a good idea IMHO. In your mind, str.iterlines() will find only
> \n, \r and \r\n, as IO.readlines() do? I ask this because
> str.splitlines() finds all line boundary chars.
>

I don't remember the old discussion, but I think so.

Another idea is adding some option to `splitlines()` that uses only
universal newlines. (e.g. str.splitlines(keepends=False, *,
ascii=False))
Then we can add `str.iterlines` that have the same arguments to
`str.splitlines`.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RFCO3BHIKH2KZNIA2X5LCEY2OFMRJGDX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Inada Naoki
On Sat, Jun 18, 2022 at 5:13 AM Jonathan Slenders  wrote:
> First time, it was needed in prompt_toolkit, where I spent a crazy amount of 
> time looking for the most performant solution.
> Third time is for the Rich/Textual project from Will McGugan. (See: 
> https://twitter.com/willmcgugan/status/1537782771137011715 )

Would you give me a pointer to the code?
I want to know its use cases.

>
> The fastest solution I've been using for some time, does this (simplified): 
> `accumulate(chain([0], map(len, text.splitlines(True`. The performance is 
> great here, because the amount of Python instructions is O(1). Everything is 
> chained in C-code thanks to itertools. Because of that, it can outperform the 
> regex solution with a factor of ~2.5. (Regex isn't slow, but iterating over 
> the results is.)
>
> The bad things about this solution is however:
> - Very cumbersome syntax.
> - We call `splitlines()` which internally allocates a huge amount of strings, 
> only to use their lengths. That is still much more overhead then a simple 
> for-loop in C would be.
>

FWIW, I had proposed str.iterlines() to fix incompatibility between
IO.readlines() and str.splitlines().
That will be much efficient than splitlines because it doesn't
allocate a huge amount of strings at once. It allocates line string at
a time.
https://discuss.python.org/t/changing-str-splitlines-to-match-file-readlines/174/2

Of course, it will be still slower than your line_offsets() idea
because it still need to allocate line strings many times.

> Performance matters here, because for these kind of problems, the list of 
> integers that gets produced is typically used as an index to quickly find 
> character offsets in the original string, depending on which line is 
> displayed/processed. The bisect library helps too to quickly convert any 
> index position of that string into a line number. The point is, that for big 
> inputs, the amount of Python instructions executed is not O(n), but O(1). Of 
> course, some of the C code remains O(n).
>
> So, my ask here.
> Would it make sense to add a `line_offsets()` method to `str`?
> Or even `character_offsets(character)` if we want to do that for any 
> character?
> Or `indexes(...)/indices(...)` if we would allow substrings of arbitrary 
> lengths?
>

I don't like string offsets so I don't like adding more methods
returning offsets.
Currently, string offset in Python is counted by code points. This is
not efficient for Python implementations using UTF-8 (PyPy and
MicroPython) or UTF-16 (Maybe Jython and IronPython, but I don't know)
internally.

CPython uses PEP 393 for now so offsets are efficient.
But I want to change it to UTF-8 in the future. UTF-8 internal
encoding is much efficient for many Python use cases like:

* Read UTF-8 string from text and write it to UTF-8 console.
* Read UTF-8 from database and write it to UTF-8 JSON.

Additionally, there are many very fast string algorithms work with
UTF-8 written in C or Rust.
Python is glue language. Reducing overhead with such libraries is good
for Python.

For now, my recommendation is using some library written in Cython if
it is performance critical.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/57JHIUS5RAOH4IV6NSOYGNVTPAEQTCMC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Less is more? Smaller code and data to fit more into the CPU cache?

2022-03-22 Thread Inada Naoki
On Wed, Mar 23, 2022 at 12:59 AM Jonathan Fine  wrote:
>
> Does anyone know of any work that's been done to research or make critical 
> Python code and data smaller so that more of it fits in the CPU cache? I'm 
> particularly interested in measured benefits.
>

I reduced the size of namespace dict in Python 3.11. This will
increase the cache efficiency.
https://bugs.python.org/issue46845

And I deprecated the cached hash in bytes object. It will removed in
Python 3.13 if no objections.
Bytes objects are used to bytecode, so this will increase cache efficiency too.

Sadly, I can not confirm the benefits. We have macro benchmark
(pypeformance), but it is still small. Most hot data fits into L2
cache.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FSPYHX7PVWSFFVAVTYT575SJR4O6PQH7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Revisiting a frozenset display literal

2022-01-17 Thread Inada Naoki
On Mon, Jan 17, 2022 at 8:49 PM Steven D'Aprano  wrote:
>
> On Mon, Jan 17, 2022 at 08:04:50PM +0900, Inada Naoki wrote:
>
> > Name lookup is faster than building set in most case.
> > So I don't think cost to look name up is important at all.
>
> But the cost to look up the name is *in addition* to building the set.
>

I meant it is negligible so we can just ignore it while this discussion.

> If you saw this code in a review:
>
> t = tuple([1, 2, 3, 4, 5])
>
> would you say "that is okay, because the name lookup is smaller than the
> cost of building the list"?
>
> I wouldn't. I would change the code to `(1, 2, 3, 4, 5)`.
>

* I never said it. I just said just lookup cost is not good reason
because you listed name lookup cost for rationale. Please stop
strawman.
* tuple construction is much faster than set construction. So name
lookup speed is more important for tuple.
* Constant tuple is much much frequently used than constant set.

>
> > Proposed literal might have significant efficiency benefit only when:
> >
> > * It is used in the function scope. and,
> > * It can not be optimized by the compiler now.
>
> Sometimes, now, the compiler *pessimizes* the construction of the frozen
> set. See b.p.o #46393.
>

I saw. And I know all the discussions in the b.p.o. already.
But how important it is for Python depends on how often it is used,
especially in hot code.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DSUYXZBLQ62MMRUYJ2ZNLDXYSEEOGDHW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Revisiting a frozenset display literal

2022-01-17 Thread Inada Naoki
On Mon, Jan 17, 2022 at 7:10 PM Steven D'Aprano  wrote:
>
> Out of those 29 calls, I think that probably 13 would be good candidates
> to use a frozenset display form (almost half). For example:
>
> ast.py:binop_rassoc = frozenset(("**",))  # f{("**",)}
> asyncore.py:   ignore_log_types = frozenset({'warning'})  # f{'warning'}
>

Both are in class scope so the overhead is very small.

> Not all of them are purely literals, e.g.
>
> asyncore.py:   _DISCONNECTED = frozenset({ECONNRESET, ENOTCONN, ...})
>
> would still have to generate the frozenset at runtime, but it wouldn't
> need to look up the frozenset name to do so so there would still be some
> benefit.

Name lookup is faster than building set in most case.
So I don't think cost to look name up is important at all.

Proposed literal might have significant efficiency benefit only when:

* It is used in the function scope. and,
* It can not be optimized by the compiler now.

I am not sure how many such usages in stdlib.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GL6KUTJZ67CQ37NK64LJF4XTPAS564OA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Revisiting a frozenset display literal

2022-01-17 Thread Inada Naoki
On Mon, Jan 17, 2022 at 7:44 PM Paul Moore  wrote:
>
> On Mon, 17 Jan 2022 at 10:12, Steven D'Aprano  wrote:
> > That would make the creation of frozensets more efficient, possibly
> > encourage people who currently are writing slow and inefficient code
> > like
> >
> > targets = (3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88)
> > if n in targets:
> > do_something()
> >
> > to use a frozenset, as they probably should already be doing.
>
> More realistically, would they not use a set already, as in
>
> targets = {3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88}
> if n in targets:
> do_something()
>
> ?
>

This is very inefficient because building a set is much heavier in `n in tuple`.
We should write `if n in {3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88}` for now.
Or we should write `_TARGETS = frozenset((3, 5, 7, 11, 12, 18, 27, 28,
30, 35, 57, 88))` in global scope and use it as `if n in _TARGETS`.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RAW2KG77YI5GS4AXLLFKLBR3PTVE65OA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Revisiting a frozenset display literal

2022-01-16 Thread Inada Naoki
On Mon, Jan 17, 2022 at 9:58 AM Oscar Benjamin
 wrote:
>
> On Mon, 17 Jan 2022 at 00:46, Christopher Barker  wrote:
>>
>> I’m a bit confused — would adding a “literal” form for frozenset  provide 
>> much, if any, of an optimization? If not,
>> that means it’s only worth doing for convenience.
>>
>> How often do folks need a frozen set literal? I don’t think I’ve ever used 
>> one.
>
>
> You won't have used one because they have not yet existed (hence this thread).
>

Although we don't have frozenset literal now, we have frozenset.
So we can estimate how frozenset literal is useful by seeing how
frozenset is used.

Unless how the literal improve codes is demonstrated, I am -0.5 on new
literal only for consistency.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AYO2GIDUZZ43KTBP7P2TRYGORSWVFYO5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 671 (late-bound arg defaults), next round of discussion!

2021-12-05 Thread Inada Naoki
I read PEP 671 today, and I feel PEP 671 is not as useful as someone expected.

For example, PEP 671 has this example:

   def bisect_right(a, x, lo=0, hi=>len(a), *, key=None):

But I think we can not change the signature for backward
compatibility. For example,

   def search(self, x, lo=0, hi=None):
return bisect_right(self._data, x, lo=lo, hi=hi)

If we just change the signature of bisect_right, this wrapper method
will be broken.
So bisect_right should support None for several versions and emit
frustrating DeprecationWarning.
I don't think this change has good cost/performance.

Additionally, this example illustrates that PEP 671 is not wrapper
functions friendly.
If the wrapped functions uses PEP 671, wrapper functions should:

* Copy & paste all default expressions, or
  * But default expression may contain module private variables...
* Use **kwds and hide real signatures.

I have not read all of PEP 671 threads so I am sorry if this is
already discussed.
But this topic is not covered in current PEP 671 yet.

Generally speaking, I don't want to add anything to Python language
that makes Python more complex.
But if I chose one PEP, I prefer PEP 505 than PEP 671.
PEP 505 can be used for default parameters (e.g. `hi ??= len(a)`)  and
many other places.
I feel it has far better benefit / language complexity ratio.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JRIKPS5GCMTHUX25UVBBIQ527ZI6VZHN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: dict_items.__getitem__?

2021-10-04 Thread Inada Naoki
On Tue, Oct 5, 2021 at 9:46 AM Erik Demaine  wrote:
> Of course, the universal way to get the
> first item from an iterable x is
>
> item = next(iter(x))
>
> I can't say this is particularly readable, but it is functional and fast.

I think we can add `itertools.first()` for this idiom, and
`itertools.last()` for `next(iter(reversed(x)))` idiom.

>
> Given the dictionary order guarantee from Python 3.7, adding indexing
> (__getitem__) to dict views seems natural.  The potential catch is that (I
> think) it would require linear time to access an item in the middle, because
> you need to count the dummy elements.  But accessing [i] and [-i] should be
> doable in O(|i|) time.

It is not true. They are O(n) where n is the maximum dict size.

>
> Python is also full of operations that take linear time to do: list.insert(0,
> x), list.pop(0), list.index(), etc.  But it may be that __getitem__ takes
> constant time on all built-in data structures, and the apparent symmetry but
> very different performance between dict()[i] and list()[i] might be confusing.
> That said, I really just want d[0] and d[-1], which is when these are fast.

They are not first. Since `next(iter(d))` is O(n), following code is O(n^2)

```
while d:
del d[next(iter(d))]
```

Making list is much faster than it. Following code is O(n).

```
for k in list(d):
del d[k]
```

>
> I found some related discussion in
> https://mail.python.org/archives/list/python-ideas@python.org/thread/QVTGZD6USSC34D4IJG76UPKZRXBBB4MM/
> but not this exact idea.

What is difference between your idea and previsous discussion?


-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PHJONYONXXJ5Y6Z76NIORHLC2C6WKH6U/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: dict.sort()?

2021-05-29 Thread Inada Naoki
FWI, this is a previous thread.

https://discuss.python.org/t/add-a-dict-sort-method/5747

2021年5月30日(日) 1:57 Marco Sulla :

> Since `dict` now is ordered, how about a `sort()` method?
> It could have the same signature of list.sort(), with an optional
> parameter "by" that can be "keys" or "values" ("keys" could be the
> default).
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/XXA2E5ILMAEMFLPWZIZN3T67FERJPBFF/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NLRSYBPC7PZKCRZSROXQTERHDXYEDZMF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-11 Thread Inada Naoki
I looked some Python courses for children. They won't use venvs.
For example, they put .py file in a specified directory, then run it
in the Minecraft or other graphical applications.

Now I think we should promote putting PYTHONUTF8=1 in user environment
before thinking about complex per-site ideas.

Since its user environment variable, it won't break legacy
applications running in a parent account.

Does anyone against adding "Enable UTF-8 mode" in the Start menu?
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PYW32DM7QH5JD763V466ORCVF2E7AE7H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-10 Thread Inada Naoki
On Wed, Feb 10, 2021 at 8:39 PM Paul Moore  wrote:
>
> On Wed, 10 Feb 2021 at 11:01, Inada Naoki  wrote:
> >
> > On Wed, Feb 10, 2021 at 5:33 PM Paul Moore  wrote:
> > >
> > > So get PYTHONUTF8 added to the environment activate script. That's a
> > > simple change to venv. And virtualenv, and conda - yes, it need to
> > > happen in multiple places, but that's still easier IMO than proposing
> > > a change to Python's already complex (and slower than many of us would
> > > like) startup process.
> >
> > I am not sure this idea works fine. Is the activate script always
> > called when venv is used on Windows?
> >
> > When I use venv on Unix, I often just execute .venv/bin/some-script
> > without activating the venv.
>
> So in your training course, tell users to activate the environment.

I am not sure here. It's not my training course. Target user is
thousands of students.
They may don't use command prompt at all.

> Experienced users (like you) who can run scripts directly aren't the
> target of this change, are they? This is one of the frustrating points
> here, I'm not clear who the target is. When I say it wouldn't help me,
> I'm told I'm not the target. When I suggest an alternative, it
> apparently isn't useful because it wouldn't work for you...
>

I'm sorry about it. I didn't mean "it don't work for me". I meant just
I am not sure activation script is always executed.

I looked vscode-python and found it execute the activation script.
I am not sure about PyCharm yet, but it works if they works like vscode-python.

Another story is clicking .exe files in the Scripts/ directory.
But it can be fixed by changing only the launcher exe.

Adding per-venv UTF-8 mode is one attractive option. We can keep
python.exe untouched.


> > Students may need to learn about encoding at some point.
> > But when they learn "how to read/write file" first time, they don't
> > need to know what encoding is.
>
> Agreed.
>
> > VSCode, notepad, PyCharm use UTF-8 by default.
> > Students don't need to learn how to use encoding other than UTF-8
> > until really need it.
>
> If they only use ASCII files and a system codepage that is the same as
> ASCII for the first 127 characters, they it's irrelevant. If they read
> data from a legacy system, that is quite likely to be in the system
> codepage (most of the local files I use at work, for example, are not
> UTF-8).

But students don't know what is ASCII yet.

>
> So I'd say that many students don't need to learn how to use *any*
> encoding until they need it. But I'm not a professional trainer, so my
> experience is limited.
>
> > We can add "Enable the UTF-8 mode" checkbox to the installer.
> > And we can have "Enable the UTF-8 mode" tool in the start menu.
> > So students don't need to edit the ini file manually.
>
> Those options could set the environment variable. After all, that's
> what "Add Python to PATH" does, and people seem OK with that. No need
> for an ini file (that adds an extra file read to the startup time, as
> has already been mentioned as a downside).
>
> > The problem is; should we recommend to enable UTF-8 mode globally by
> > setting environment variable, or provide a per-site UTF-8 mode
> > setting?
>
> What precisely do you mean by "per site"? Do you mean "per Python
> interpreter"? Do you view separate virtual environments as "sites"?

One installation is one site. One venv is one site. One conda env is one site.
I don't know proper term for it, but I call it "site" because all of
them have one "site-packages".


> > They may not want to promote UTF-8 mode until official Python promote
> > UTF-8 mode.
> > So I think venv should support UTF-8 mode first.
>
> That's fair enough. Although I'd like to point out the parallel here -
> you're saying "environment tools might not want to make UTF8 the
> default until Python does". I'm saying "Python might not want to make
> UTF8 the default until the OS does". I'm not completely sure why your
> argument is stronger than mine :-)
>

Oh, I don't propose changing the default encoding for now.

Microsoft provides "Beta: use unicode UTF-8 for worldwide language
support in my PC" option.
It affects to all application. It is similar to global PYTHONUTF8
environment variable.

Microsoft provides UTF-8 code page (*) too. It affects only one application.
It is similar to per-site UTF-8 mode idea.

(*) 
https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page

So what i am proposing is not more aggressive than Microsoft.
Microsoft provides similar options 

[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-10 Thread Inada Naoki
On Wed, Feb 10, 2021 at 5:33 PM Paul Moore  wrote:
>
> So get PYTHONUTF8 added to the environment activate script. That's a
> simple change to venv. And virtualenv, and conda - yes, it need to
> happen in multiple places, but that's still easier IMO than proposing
> a change to Python's already complex (and slower than many of us would
> like) startup process.

I am not sure this idea works fine. Is the activate script always
called when venv is used on Windows?

When I use venv on Unix, I often just execute .venv/bin/some-script
without activating the venv.


> > Personally, I do not start out with environments with my beginning students 
> > -- they really only need one at the early stages. But other instructors do.
> >
> > Others have to work with a locked down system provided by their employer 
> > that might be an older version of Python, or need some particular 
> > configuration that they don't want to override.
> >
> > And all the examples given here of how to set environment variables and 
> > shortcuts, etc on Windows is EXACTLY the kind of information  I don't want 
> > to have to provide for my students :-( -- I'm teaching Python, not Windows 
> > administration.
>
> So teach Python as it actually is, surely? If you teach people how to
> use "Python-with-UTF8-mode", won't they struggle when introduced to
> the real world where UTF8 mode isn't set? Won't they assume the
> default encoding for open() is UTF-8, and be confused when they are
> wrong? Yes, I know your job as an instructor is to omit confusing
> details, and UTF8 mode would help with that. I get that. But that's
> just one case.
>

Students may need to learn about encoding at some point.
But when they learn "how to read/write file" first time, they don't
need to know what encoding is.

VSCode, notepad, PyCharm use UTF-8 by default.
Students don't need to learn how to use encoding other than UTF-8
until really need it.


> And anyway, would you not have to explain how to set UTF-8 mode for
> the training environment one way or another anyway? Sure, you may not
> have to explain how to set an environment variable. But you have to
> explain how to configure an ini file instead.

We can add "Enable the UTF-8 mode" checkbox to the installer.
And we can have "Enable the UTF-8 mode" tool in the start menu.
So students don't need to edit the ini file manually.

The problem is; should we recommend to enable UTF-8 mode globally by
setting environment variable, or provide a per-site UTF-8 mode
setting?


> >> > I don't want to recommend env vars and registry for conda and portable
> >> > Python users...
> >
> > and a lot of newbies learning Python for data science are starting out with 
> > conda as well ...
>
> So conda could set UTF-8 mode with "conda env --new --utf8". No
> changes to core Python interpreter startup needed.
>

They may not want to promote UTF-8 mode until official Python promote
UTF-8 mode.
So I think venv should support UTF-8 mode first.


> >> I'm not sure what you mean here. Why is this different from (say)
> >> PYTHONPATH? How would conda and portable python users configure
> >> PYTHONPATH? Why is UTF-8 mode any different?
> >
> >
> > It's not -- using PYTHONPATH is a "bad idea" I never recommend it to 
> > anyone. It was a nightmare when folks have Python 2 and 3 on the same 
> > machine, but now, in the age of environments, it's still a really bad idea.
>
> Sure, PYTHONPATH was just an example. Environment variables are how
> you configure Python in many ways. I'm asking why UTF-8 mode is so
> special it needs a different configuration mechanism than every other
> setting for Python.
>

Because it solves many real world problem that many Windows users suffer.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CXJOASSPLNV4EAE47RNZZ2UEF4TGHE6L/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-09 Thread Inada Naoki
On Wed, Feb 10, 2021 at 6:02 AM Paul Moore  wrote:
>
> On Tue, 9 Feb 2021 at 17:32, Inada Naoki  wrote:
> >
> > On Tue, Feb 9, 2021 at 7:42 PM M.-A. Lemburg  wrote:
> > >
> > > Here's a good blog post about setting env vars on Windows:
> > >
> > > https://www.dowdandassociates.com/blog/content/howto-set-an-environment-variable-in-windows-command-line-and-registry/
> > >
> > > It's not really much harder than on Unix platforms.
> > >
> >
> > But it affects to all Python installs. Can teachers recommend to set
> > PYTHONUTF8 environment variable for students?
>
> Why is that an issue? In the first instance, do the sorts of
> "beginner" we're discussing here have multiple python installs? Would
> they need per-interpreter configuration of UTF-8 mode?
>

Hmm, I was afraid to break applications using existing Python in the system.

But if no one cares about it, I'm ok with just adding something like
"enable-utf8-mode.bat" / "disable-utf8-mode.bat".


> >
> > I don't want to recommend env vars and registry for conda and portable
> > Python users...
>
> I'm not sure what you mean here. Why is this different from (say)
> PYTHONPATH? How would conda and portable python users configure
> PYTHONPATH? Why is UTF-8 mode any different?
>
> Paul

How often PYTHONPATH is needed at all? I saw many people broke their
environment by setting PYTHONPATH. I don't recommend to use it at all.

On the other hand, I want to make teachers can recommend to enable
UTF-8 mode for students.
That is the defference between PYTHONUTF8 and PYTHONPATH.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LH3UANAQ4CC62TJBU3AWONGQRRGW3JR2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-09 Thread Inada Naoki
On Tue, Feb 9, 2021 at 7:42 PM M.-A. Lemburg  wrote:
>
> Here's a good blog post about setting env vars on Windows:
>
> https://www.dowdandassociates.com/blog/content/howto-set-an-environment-variable-in-windows-command-line-and-registry/
>
> It's not really much harder than on Unix platforms.
>

But it affects to all Python installs. Can teachers recommend to set
PYTHONUTF8 environment variable for students?



> The only catch is that Windows users will often not know about such
> env vars or how to use them, because on Windows you typically set up
> your configuration via the application and using the registry.
>
> Perhaps we could have both: an env var to enable UTF-8 mode and
> a registry key set by the installer.
>

I don't want to recommend env vars and registry for conda and portable
Python users...

--
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IKCNL5QJUGS2OOGOAHQJVE3NGEPVECFN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-09 Thread Inada Naoki
On Tue, Feb 9, 2021 at 4:53 PM Christopher Barker  wrote:
>
>>
>> because UTF-8 mode helps many Windows users but it is not accessible
>> enough for Windows users.
>
>
> It's not just accessibility, but discoverability -- Windows users -- and even 
> more so developers that don't generally use Windows often don't know utf-8 
> mode exists. That's why I'm pushing for a way to for an application developer 
> to be able to set up their project so that it will run under utf-8 mode 
> everywhere. With only one way, and without having to add Windows specific 
> code or documentation.
>
> As has been discusses, there are very few cases where it would make any 
> difference under Linux (and zero for teh Mac?) -- but why not have "one way 
> to do it"?
>

It makes problem too hard, complex. It leads we can not fix anything
at all by Python 3.10.
We can add Unix support later if it is really worth enough. It is not
backward incompatible change.

>>
>> Can you provide some realistic use cases where UTF-8 mode helps Unix
>> users but it is not accessible?
>
>
> It's not accessible to the application developer. It is to the deployer / 
> devops person. These are often one and the same, but not always.
>
> My major project had exactly this problem -- the bare bones docker images 
> used on the CI (and for deployment) were set up with an ASCII locale (or 
> something like that) -- and our application failed. In the end we figured out 
> how to configure the images for utf-8, but as it happens, I know Python, and 
> don't know much Linux administration, and the linux sys admins didn't know 
> Python much -- so it took a fair bit of back and forth to figure out.
>

When using docker, it's very easy to put an environment variable.
You don't need to worry about "it will break existing legacy Python
application in same container." You can just create one container for
one application.
So I don't think it is enough reason to.add complexity.

As I said before, use case of UTF-8 mode is different between Unix and Windows.


> We use conda for CI and deployment -- if I had been able to put a "utf-mode" 
> package in the conda requirements file, we wouldn't have had this issue, and 
> our Windows users (yes we have those too) would also get their systems set up 
> to "do the right thing" without their even knowing about it.
>
> Other folks use pipenv and the like -- it would be helpful to them if they 
> could do the same thing with their requirements files as well.
>

Without more concrete idea, such rough lead this thread to maze.

Note that UTF-8 mode must be enabled before any path config on Unix.
So it is almost impossible to enable UTF-8 mode using tools like pip.

If your idea is just putting `python.ini` (or `python.cfg`) in bin/ or
Scripts/ directory from pip/conda package, I don't think it is just a
hack, not a best practice. It will cause file conflict error very
easily.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7ZG7CXJECV3WBAFXDSOOFL5QMWWGB5IR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: class(obj) could return obj.__class__

2021-02-08 Thread Inada Naoki
Please use type(o) instead.

On Tue, Feb 9, 2021 at 4:36 PM Hans Ginzel  wrote:
>
> Please, consider class(obj) to return obj.__class__
> consistenly with dir(), vars(), repr(), str(),…
>
> >>> class c: pass
> >>> o = c()
> >>> o.__class__
> 
> >>> class(o)
>File "", line 1
>  class(o)
>   ^
> SyntaxError: invalid syntax
>
> H.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/TMFUKID6KMTEAZAS4ILBHSG23GGYNCS4/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LMNS3XTBSIOGLNFFSG7P2MEVO3TWVIN5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-08 Thread Inada Naoki
On Tue, Feb 9, 2021 at 3:37 PM Christopher Barker  wrote:
>
> On Mon, Feb 8, 2021 at 6:11 PM Inada Naoki  wrote:
>>
>> >
>> Unlike Windows, environment variables work very fine for such use cases.
>
>
> Windows has environment variables, doesn't it?
>

But it isn't works well for Windows users.
Unix and Windows have different use cases.

> I think it's MUCH better to have ONE way to do something that works, for 
> Python, on all platforms. That way people that only know one platform can 
> still write and document code that can work on all platforms.
>

This thread is only for make UTF-8 mode accessible for Windows users,
because UTF-8 mode helps many Windows users but it is not accessible
enough for Windows users.

Can you provide some realistic use cases where UTF-8 mode helps Unix
users but it is not accessible? If not, please focus on helping
Windows users.

Time is a limited resource.  I have no time to discuss about helping
zero Unix users.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HHBTX4JE7YAUUSPD3FZMSBZJ72NBVRRM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-08 Thread Inada Naoki
On Tue, Feb 9, 2021 at 2:28 AM Christopher Barker  wrote:
>
>> >> And beginners should use a UTF-8 locale.
>> > Beginners may not know how to do that / have a choice.
>> >
>> > This is a question I still don't know the answer to -- I think that most 
>> > (all?) non Windows platforms currently supported use utf-8 -- but is that 
>> > guaranteed? That is, might some platform come up that does need utf-8 
>> > mode? So why not have it available everywhere, even though it will be a 
>> > no-op on most systems.
>>
>> UTF-8 mode is provided for Unix because there is environments for
>> *deployment*, like minimal Unix container image. They have only C
>> locale.
>>
>> For desktop use, I think all Unix environments suited for beginners
>> use UTF-8 locale by default.
>> There is no guarantee. But if default locale is not UTF-8, I don't
>> think the environment is suited for beginners who learning to Python.
>
>
> That's true, but not in Python's control.
>
> But this is not just newbies -- see above, deployment and test  (CI) 
> environments might need it too.
>

Unlike Windows, environment variables work very fine for such use cases.

On Unix, direnv, dotenv, and maybe more tools are there. It is not
only for Python, but for projects.

> Which is another good reason that having it be something that can be "turned 
> on" by an virtual environment / requirements file would be very helpful.
>

There are direnv and dotenv.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MCE4T7YEF5XCNDFC4S3YCVU2XUOF5A6A/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-07 Thread Inada Naoki
On Mon, Feb 8, 2021 at 3:58 PM Christopher Barker  wrote:
>
>> Should we support it in Unix? I don't think so.
>> Command-line and environment variables are easy to use on Unix.
>
>
> maybe, but we have many of the same issues -- we want the configuration tied 
> to the environment, not to the user and all environments. And I'd rather have 
> things done the same way on all platforms, rather than the native way on each 
> platform, if I have to make a choice. That is, if there is a way to configure 
> Python on Windows, I'd really like the SAME way to be available on all 
> platforms.
>

On Unix, there are N ways (e.g. .envrc). N+1 way is really worthwhile?
At least, `python.cfg` (or `python.ini`) in bin/ directory is not good
for Unix environment.


>>
>> And beginners should use a UTF-8 locale.
>
> Beginners may not know how to do that / have a choice.
>
> This is a question I still don't know the answer to -- I think that most 
> (all?) non Windows platforms currently supported use utf-8 -- but is that 
> guaranteed? That is, might some platform come up that does need utf-8 mode? 
> So why not have it available everywhere, even though it will be a no-op on 
> most systems.
>

UTF-8 mode is provided for Unix because there is environments for
*deployment*, like minimal Unix container image. They have only C
locale.

For desktop use, I think all Unix environments suited for beginners
use UTF-8 locale by default.
There is no guarantee. But if default locale is not UTF-8, I don't
think the environment is suited for beginners who learning to Python.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3H7IX6MRFOH3ILZNLXRV6QSKAKVDILLJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-07 Thread Inada Naoki
On Sun, Feb 7, 2021 at 4:16 PM Eryk Sun  wrote:
>
> On 2/6/21, Christopher Barker  wrote:
> > On Sat, Feb 6, 2021 at 11:47 AM Eryk Sun  wrote:
> >
> >> Relative to the installation, "python.cfg" should only be found in the
> >> same directory as the base executable, not its parent directory.
> >
> > OK, my mistake — I thought that was already the case with pyvenv.cfg.
> > Though I don’t get why it matters.
>
> Chiefly, I don't want to overload "pyvenv.cfg" with new behavior
> that's unrelated to virtual environments.
>
> I also dislike the way this file is found. If the parent directory is
> "C:\Program Files", then I'm not worried about finding "C:\Program
> Files\pyvenv.cfg" when the interpreter tries to open it. But this
> pattern is not safe in general when installed to an arbitrary
> directory, or with a portable distribution.
>

OK, then, how about just same to python.exe?
In this case, we need to put python.ini in Scripts directory for venvs.
It seems a bit odd, but much simpler than looking in the parent directory.


> The presence of a "._pth" file (Windows only) beside the DLL or
> executable bypasses the search for "pyvenv.cfg", among other things.
> The embedded distribution includes a ._pth that locks it down. This is
> another reason to use a different file to configure defaults for -X
> settings such as "utf8", a file that's guaranteed to always be read.
>

Thank you, I didn't know that.
If we need to search a parent directory, we need to check ._pth too.

> >> Add an option in the installed "python.cfg" to set the name of the
> >> organization and application.
> >
> > That would work for, e.g. pyinstaller (which I hope already ignores these
> > kinds if configuration.
> >
> > But not for, e.g. web applications that expect to use virtual environments
> > to isolate themselves.
>
> The idea to use the profile data directories %ProgramData% and
> %LocalAppData% was for symmetry with how this could be supported in
> POSIX, which doesn't use the application directory as Windows does.
>

Should we support it in Unix? I don't think so.
Command-line and environment variables are easy to use on Unix.
And beginners should use a UTF-8 locale.

> The application "python.cfg" (in the directory of the executable,
> including a virtual environment) can support a setting to isolate it
> from system and user "python.cfg" files.

I know that. But I don't think it's enough reason to put a new config
file to user profile.
If users don't have system privilege, they can still install another Python.

Config file in user profile is fragile. If all venvs start using
profile directory, it become
unmaintainable soon.

We can just recommend per-user install for new users.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/65TZSOWYHQI7KL2QCHQDRJQQUZDMTDPZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-06 Thread Inada Naoki
On Sat, Feb 6, 2021 at 5:59 AM Eryk Sun  wrote:
>
>
> I would have preferred for the py launcher to read and merge settings
> for all existing configuration files in the order of
> "%ProgramData%\Python\py.ini" (all installations),
> "%__AppDir__%\py.ini" (particular installation), and
> "%LocalAppData%\Python\py.ini" (user).

Note that this is setting of python, not of py launcher.

And no need for all installations, and per-user setting.
Environment variable is that already.
I don't want to add many way to configure one option without strong need.

Currently, a per-install setting is not possible. So it is the only problem.

If adding option to pyvenv.cfg is not make sense, we can add
`python.ini` to same place pyvenv.cfg.
i.e., directory containing python.exe, or one above directory.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DUM6PJDNVBCZSNHLSAVYMEUDMWVTWV5T/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-05 Thread Inada Naoki
On Fri, Feb 5, 2021 at 8:15 PM Barry Scott  wrote:
>
> >
> > The main limitation is that users can not write config file in install
> > location when Python is installed for system, not for user.
>
> This is the problem that I was thinking about when I proposed using
> a py.ini like solution where the file is looked for in the users config 
> folder.
> I think that is the %LOCALAPPDATA% folder for py.exe.
>
> As Chris points out in his summary of the issue.
>
> How would this work for different version of python being installed and 
> needing different config?

Each installation have each config file.

> How would this work for python installed from different vendors?
>

Vendor installer should provide an option for it.

> Maybe the answer is that there is only one user defined override possible and 
> all versions use it.
>
> Also am I right to assume that the impact of these changes would only impact 
> on Windows?
>

I think we don't have any reason to restrict this for Windows.
But since this idea is proposed only for Windows users, only Windows
installer will have "Enable UTF-8 mode" option.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/A33F3EH3DZGF5F6GC3ZGWPVBTIZISJCI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-05 Thread Inada Naoki
On Fri, Feb 5, 2021 at 7:59 PM Barry Scott  wrote:
>
> I'm under the impression that new users will not create a venv.
> Indeed I run a lot of python scripts outside of venv world.
> I only use venv as part of my development pipe lines.
>
> I not sure that a venv cfg file would not help.
> But a python.ini could.
>

python.exe lookup pyvenv.cfg even outside of venv.
So we can write utf8mode=1 in pyvenv.cfg even outside of venv.

The main limitation is that users can not write config file in install
location when Python is installed for system, not for user.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KQXW4JWWWLQSIQPMEZGCS7W4JDBNOPV2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Add a couple of options to open()'s mode parameter to deal with common text encodings

2021-02-04 Thread Inada Naoki
On Fri, Feb 5, 2021 at 8:20 AM Ben Rudiak-Gould  wrote:
>
>
> It seems as though most of those commenting in the other thread don't 
> actually use Python on Windows. I do, and I can say it's a royal pain to have 
> to write open(path, encoding='utf-8') all the time. If you could write 
> open(path, 'r', 'utf-8'), that would be slightly better, but the third 
> parameter is buffering, not encoding, and open(path, 'r', -1, 'utf-8') is not 
> very readable.
>

FWIW, I had another idea that adding `open_utf8()` function for same motivation.
`open_utf8(filename)` is easier to type than `open(filename, encoding="utf-8")`.
But no one support the idea. Everyone think `encoding="utf-8"` is
better than this alias function.

See this thread.
https://mail.python.org/archives/list/python-ideas@python.org/thread/PZUYJ5XDY3WDUSBFW7BAFHP3QRYES2GZ/

Regards,
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ERGAF3IHDNDPH2MJ647KGPMFF4OWEIBE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-04 Thread Inada Naoki
On Fri, Feb 5, 2021 at 6:17 AM Barry Scott  wrote:
>
> Rather than reply point by point I will summarise my input.
>
> I think that utf-8 mode is a great idea.
>
> I think that an .INI file in the style that py.exe uses is better then env 
> var.
>
> Env var on WIndows could be used but there can be surprises with the way
> windows merges user and system env vars. Maybe that only with PATH that
> is very odd.
>
> I'm hoping that the solution implemented allows new users to get a great
> experience and also that advanced users can get control of the mode.
>
> Personally I'd prefer to have files that I edit to configure python then 
> registry
> keys. I can put files into git, not exmple. I have to work hard to manage 
> registry
> keys via git.
>

I 100% agree with you. And pyvenv.cfg satisfies all your needs.

When compared pyvenv.cfg with py.ini-like new config file:

Cons:

* Need system privilege to change the setting of system installed Python.
  * But user can install another Python, or create venv anyway.

Pros:

* The file is already supported.
  * No need to lookup another file at startup.
* No need to edit any file outside the install location.
  * Easy to clean uninstall
  * Portable app friendly
* One file per environment
  * Breaking the config file affects only one environment.

So I still prefer pyvenv.cfg.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BT75AQXDG6ZZ5FUF2HCO75XFR777YQ27/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-02-01 Thread Inada Naoki
On Tue, Feb 2, 2021 at 6:31 AM Barry Scott  wrote:
>
>> On 30 Jan 2021, at 12:05, Inada Naoki  wrote:
>>
>> Where would Python look for a "configuration file like `pyvenv.cfg`" ?
>>
>> I am not a Windows expert so I am not sure. But I think it should be
>> the same directory where `python.exe` is in.
>
> You can put the system default there but each user needs to have a file that 
> they can control
> to set the per user config.
>
> py.exe uses %LOCALAPPDATA%\py.ini
>
> I'd suggest that you could have a %LOCALAPPDATA%\python.ini.
>

But what happen if user installed Python from python.org and Python from conda?
User may have two or more Pythons having the same version.

In my idea, if user can not change the config of system installed
Python, user still can create venv and change the setting for the
venv.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GQUYPIRT7IOWTXNHQCW73M7DOUMUMMFB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-01-30 Thread Inada Naoki
On Sat, Jan 30, 2021 at 3:45 PM Christopher Barker  wrote:
>
> On Thu, Jan 28, 2021 at 4:25 PM Inada Naoki  wrote:
>>
>> > The "real" solution is to change the defaults not to use the system 
>> > encoding at all -- which, of course, we are moving towards with PEP 597. 
>> > So first a plug to do that as fast as possible! I myself would love to see 
>> > PEP 597 implemented tomorrow -- for all supported versions of Python.
>> >
>>
>> Note that PEP 597 doesn't change the default encoding. It just adds an
>> option to emit a warning when the default encoding is used.
>
>
> I know -- and THAT could be done soon, yes?
>

Sorry for the delay. I want to do it in Python 3.10, but I am not sure
the PEP is accepted.
I updated the PEP today and working on reference implementation now.


>>
>> I think it might take about 10 years to change it.
>
> I hope it's not that long -- having code that runs differently in different 
> environments is not good ...
>

I agree. But backward compatibility is important too.


>>
>> Many codes are written by other people. It cause
>> UnicodeDecodeError on Windows.
>> And UTF-8 mode rescues it.
>
> exactly. But the trick is that UTF-* mode is in control of the end user / 
> installer of Python, not the writer of the code.
>

Yes. But the writer of the code can specify `encoding="utf-8"`. PEP
597 helps it.

Additionally, I am proposing per environment option. If code owner
distributes the application by embeddable Python or venv, they can use
UTF-8 mode too.

>>
>> Please don't discuss PEP 597 in this thread. Let's focus on UTF-8 mode.
>> They are different approaches and they are not mutually exclusive.
>
> Sure, but they are related. But I"ll try to find the right thread for PEP 597
>

The thread for the PEP is
https://discuss.python.org/t/pep-597-raise-a-warning-when-encoding-is-omitted/3880

>>
>> > Imagine someone runs some code in Jupyter, and it's fine, and then they 
>> > run it in plain Python, on the same machine, and it breaks -- ouch!
>>
>> You are right. UTF-8 mode must be accessible for both of Jupyter on
>> conda Python and Python installed by official installer.
>> If UTF-8 mode is accessible enough, user can fix it by enabling UTF-8 mode.
>
>
> Sure -- but these days folks may have multiple environments and multiple ways 
> to run code (Jupyter, IDEs), so it's way too easy to have UTF-8 mode on in 
> some but not others -- all on the same machine.
>

If the user don't have legacy application, they can set PYTHONUTF8 as
user environment variable.
Then all environments are working on UTF-8 mode.


> I'm not a Windows user (much), but users of my library are, and my students 
> are, and I'm having a hard time figuring out what will make this work for 
> them.
>
> In the case of my students, I can encourage UTF-8 mode for all installations.
>

Yes, I think UTF-8 mode will help teachers and students.  Maybe,
WSL2.will be another option.


> In the case of my library users -- it's harder, but I can do the same to some 
> extent -- I do currently suggest a conda environment for my code -- so yes, 
> making it easier to turn it on in an environment would be good.
>

If PEP 597 is accepted, you can find all code omitting
`encoding="utf-8"`. Your library users can run it without UTF-8 mode.


>> It is powerful/flexible for power users. But not for beginners.
>> Imagine users execute Jupyter from the start menu.
>>
>> * Command-line `-Xutf8` or `set PYTHONUTF8=1` is not accessible.
>> * User environment variable is not accessible too, and it may affect
>> other Python installations.
>
> which is actually what I like about environment variables -- it could apply 
> to all Python installations on the system -- which would be a good thing!
>

It will be great for many users. But it will be not good for some
users using legacy Python applications.
So it is difficult to recommend UTF-8 mode to everyone.


> Where would Python look for a "configuration file like `pyvenv.cfg`" ?

I am not a Windows expert so I am not sure. But I think it should be
the same directory where `python.exe` is in.

>
> Back to my idea above -- any way to have that be a pip (and conda) 
> installable package? So it could be in a requirements file?
>

I have no idea.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/F2TMSDGTPX55EZNXCDRUBWL46HGORDIH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-01-28 Thread Inada Naoki
On Fri, Jan 29, 2021 at 12:54 PM Ben Rudiak-Gould  wrote:
>
> On Wed, Jan 27, 2021 at 11:36 PM Inada Naoki  wrote:
>>
>> * UnicodeDecodeError is raised when trying to open a text file written in 
>> UTF-8, such as JSON.
>> * UnicodeEncodeError is raised when trying to save text data retrieved from 
>> the web, etc.
>> * User run `pip install` and `setup.py` reads README.md or LICENSE file 
>> written in UTF-8 without `encoding="UTF-8"`
>>
>> Users can use UTF-8 mode to solve these problems.
>
>
> They can use it to solve *those* problems, but probably at the cost of 
> creating different problems.
>
> There's a selection bias here, because you aren't seeing cases where a script 
> worked because the default encoding was the correct one. If you switch a lot 
> of ordinary users (not power users who already use it) to UTF-8 mode, I think 
> a lot of scripts that currently work will start failing, or worse, silently 
> producing bogus output that won't be understood by a downstream tool. I'm not 
> convinced this wouldn't be a bigger problem than the problem you're trying to 
> solve.
>

I understand it so I proposed per-install UTF-8 mode.
User can set PYTHONUTF8=1 user environment variable for now.  But it
may break existing applications.

My proposal is per-environment UTF-8 mode.
When user want to install new Python to learn Python, they can enable
UTF-8 mode only for the new Python environment without breaking
existing applications.


>> * Put a checkbox in the installer?
>
> Do I want Python to assume that everything is UTF-8? Probably not.
>

Even you don't want, many developers assume default is always UTF-8 already.
And you can enable UTF-8 mode only in one venv to run such code, if
UTF-8 mode can be enabled by pyvenv.cfg.


-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/O3H7N5FSV2QT6PFJDTBBKPLO3JSTTVBS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Make UTF-8 mode more accessible for Windows users.

2021-01-28 Thread Inada Naoki
alse)` is better.
But note that `locale.getpreferredencoding(False)` may return "utf8",
"utf-8", "utf_8", "UTF-8"...

> That wouldn't be hard to do, but it might be worth having a small utility 
> that does it in a _future__import:
>
> from __future__ import warn_if_not_utf8

It seems you are misusing __future__ import. __future__ import is for
compilers and parsers. It is not for runtime behavior.
And I don't think we should add `warn_if_not_utf8()` for now.

>>
>> Is it possible to enable UTF-8 mode in a configuration file like 
>> `pyvenv.cfg`?
>
> I can't see how that's any more powerful/flexible than an environment 
> variable.
>

It is powerful/flexible for power users. But not for beginners.
Imagine users execute Jupyter from the start menu.

* Command-line `-Xutf8` or `set PYTHONUTF8=1` is not accessible.
* User environment variable is not accessible too, and it may affect
other Python installations.


>> Is it possible to make it easier to configure?
>>
>> * Put a checkbox in the installer?
>> * Provide a small tool to allow configuration after installation?
>>   * python3 -m utf8mode enable|disable?
>> * Accessible only for CLI user
>>   * Add "Enable UTF-8 mode" and "Disable UTF-8 mode" to Start menu?
>
>
> This is still going to have the same fundamental problems of the same code 
> running differently on different machines or even the same machine in 
> different environments, installs -- someone upgrades and forgets to check 
> that box again, etc 
>

There are pros and cons.

If we use user-wide (or system-wide) setting like `PYTHONUTF8` in user
environment variable, all Python environments use UTF-8 mode
consistently.
But it will break legacy applications running on old Python environment.
If we have per-environment option, it's easy to recommend users to
enable UTF-8 mode.

> Maybe this would be a good thing to do once there are Warnings in place?
>

Do you mean programs only runs on UTF-8 mode warns if UTF-8 mode is
not enabled? e.g.

```
if sys.platform == "win32" and not sys.flags.utf8_mode:
sys.exit("This programs runs only on UTF-8 mode. Please enable UTF-8 mode.")
```

Then, I don't like it... Windows only API to enable UTF-8 mode in
runtime seems better.

```
if sys.platform == "win32":
sys._win32_enable_utf8mode()
```

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KZYWEPFI4TNBBOJB3ZFGVTRWKL73XXRO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Make UTF-8 mode more accessible for Windows users.

2021-01-27 Thread Inada Naoki
The fact that Python does not use UTF-8 as the default encoding when
opening text files is an obstacle for many Windows users, especially
beginners in programming.

If you search for UnicodeDecodeError, you will see that many Windows
users have encountered the problem.
This list is only part of many search results.

* https://qiita.com/Yuu94/items/9ffdfcb2c26d6b33792e
* https://www.mikan-partners.com/archives/3212
* https://teratail.com/questions/268749
* https://github.com/neovim/pynvim/issues/443
* https://www.coder.work/article/1284080
* https://teratail.com/questions/271375
* https://qiita.com/shiroutosan/items/51358b24b0c3defc0f58
* https://github.com/jsvine/pdfplumber/issues/304
* https://ja.stackoverflow.com/questions/69281/73612
* https://trend-tracer.com/pip-error/

Looking at the errors, the following are the most common cases.

* UnicodeDecodeError is raised when trying to open a text file written
in UTF-8, such as JSON.
* UnicodeEncodeError is raised when trying to save text data retrieved
from the web, etc.
* User run `pip install` and `setup.py` reads README.md or LICENSE
file written in UTF-8 without `encoding="UTF-8"`

Users can use UTF-8 mode to solve these problems.
I wrote a section for UTF-8 mode in the "3. Using Python on Windows" document.
https://docs.python.org/3/using/windows.html#utf-8-mode

However, UTF-8 mode is still not very well known. How can we make
UTF-8 mode more user-friendly?

Right now, UTF-8 mode can be enabled using the `-Xutf8` option or the
`PYTHONUTF8` environment variable. This is a hurdle for beginners. In
particular, Jupyter users may not use the command line at all.

Is it possible to enable UTF-8 mode in a configuration file like `pyvenv.cfg`?

* User can enable UTF-8 mode per-install, and per-venv.
* But difficult to write the setting file when Python is installed for
system (not for user), or Windows Store Python
  * User can still enable UTF-8 mode in venv. But many beginners don't
need venv.

Is it possible to make it easier to configure?

* Put a checkbox in the installer?
* Provide a small tool to allow configuration after installation?
  * python3 -m utf8mode enable|disable?
* Accessible only for CLI user
  * Add "Enable UTF-8 mode" and "Disable UTF-8 mode" to Start menu?

Any ideas are welcome.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LQVK2UKPSOI2AHYFUWK6ZII2U6QKK6BP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-26 Thread Inada Naoki
nOn Tue, Jan 26, 2021 at 5:53 PM M.-A. Lemburg  wrote:
>
> Overall, I think the approach with two different binaries
> is not going work well. Users will get confused and many problems
> will arise due to users installing the wrong version for the apps
> they use.
>
> We already let them choose between 64-bit and 32-bit and embedded
> vs. installer. Some may understand the consequences of installing
> a 32-bit version on a 64-bit OS, but I suppose most don't know
> what "embedded" is for (including myself :-)).
>
> If you add UTF-8 vs. Locale dimensions on top, you'd create even
> more confusion.
>

OK. Confusing users only for getting stats is a bad idea...


> I think it would be better to have the Windows installers
> get an option to set the PYTHONUTF8 env var for the user
> or system-wide. This would be off initially and default to on
> a few years later.
>

I like the idea, and it is almost the same to what I proposed in PEP
597 (2nd) in last year.
https://discuss.python.org/t/pep-597-enable-utf-8-mode-by-default-on-windows/3122

The difference with the PEP 597 (2nd) is now I propose to not change
the subprocess.PIPE encoding in UTF-8 mode.
I need to reconsider about stdin/stdout encoding when they are redirected.
Maybe, we can use GetConsoleCP() for stdin encoding, and
GetConsoleOutputCP() for output encoding. I will write another
proposal for it.

--
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OTNUYZNIQ5VG3SOK75EVNRL2WU43J3X4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-26 Thread Inada Naoki
On Tue, Jan 26, 2021 at 4:36 PM Eryk Sun  wrote:
>
> One concern is what to do for the special "ansi" and "oem" encodings.
> If scripts rely on them for IPC, such as with subprocess.Popen(), then
> it could be frustrating if they're just synonyms for UTF-8 (code page
> 65001). I've tested that it's possible for Python to peg "ansi" and
> "oem" to the system ANSI and OEM code pages via GetLocaleInfoEx() with
> LOCALE_NAME_SYSTEM_DEFAULT and the LCType constants
> LOCALE_IDEFAULTANSICODEPAGE and LOCALE_IDEFAULTCODEPAGE (OEM). But
> then they're no longer accurate within the current process, for which
> ANSI and OEM are UTF-8.

You are right. That's why I didn't change the default encoding of
subprocess in the PEP 597.
UTF-8 version Python should change only default text encoding. So it
shouldn't use UTF-8 code page.

Current UTF-8 mode has the same problem. It affects PIPE encoding too.
But we can change its behavior on Windows to:

* The default encoding of TextIOWrapper and most wrappers (e.g.
open(), Path.open(), Path.read_text(), gzip.open(), ...) become
"utf-8".
* locale.getpreferredencoding(False) returns code page encoding (e.g. "cp932")
* subprocess module uses `locale.getpreferredencoding(False)` for the
default PIPE encoding.

And we can provide two versions of Python for Windows.

* "Python (UTF-8 version)" will enable the UTF-8 mode by default.
* "Python (ANSI version)" will disable the UTF-8 mode by default.

User can override the default by `-Xutf8` option and `PYTHONUTF8`
environment variable.

Does this idea make sense?

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KRT45DM5NJMLM22BHYHSWVYLPAXDK23A/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Inada Naoki
On Tue, Jan 26, 2021 at 4:01 PM Eryk Sun  wrote:
>
> > * Windows team needs to maintain more versions.
>
> I suppose the installer could install both sets of binaries, and copy
> to "python[w][_d].exe" based on an installer option. But then the
> UTF-8 selection statistics wouldn't be tracked, unless the installer
> phones home.

Can pip send `locale.getpreferredencoding(False)` to PyPI?

If so, we can set `PYTHONUTF8` environment variable from the installer too.
Or we can provide small tool to set/unset `PYTHONUTF8` environment variable.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/35FW2SAYH5JR7FLNZGMSPDMUE2NNVHQN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Inada Naoki
On Tue, Jan 26, 2021 at 3:07 PM Guido van Rossum  wrote:
>
>>
>> I agree that. But until we switch to the default encoding of open(),
>> we must recommend to avoid `open(filename)` anyway.
>> The default encoding of VS Code, Atom, Notepad is already UTF-8.
>>
>> Maybe, we need to update the tutorial (*) to use `encoding="utf-8"`.
>
>
> Telling people to always add `encoding='utf8'` makes much more sense to me 
> than introducing a new function and telling them to do that.
>

Ok, I will not add open_utf8() to PEP 597, and update the tutorial to
recommend `encoding="utf-8"`.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HXJKDIZUF6TMMHHPDZWQ3PYPFLXX6C66/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Inada Naoki
As my understanding, "Fusion manifest for an unpackaged Win32 app" (*)
works for non Store Apps too.
(*) 
https://docs.microsoft.com/ja-jp/windows/uwp/design/globalizing/use-utf8-code-page#examples
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MMZBS2QUXP73S6H6YDFUCW5HY2S7RADQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Provide UTF-8 version of Python for Windows.

2021-01-25 Thread Inada Naoki
Sorry for posting multiple threads so quickly.

Microsoft provides UTF-8 code page for process. It can be enabled by
manifest file.
https://docs.microsoft.com/ja-jp/windows/uwp/design/globalizing/use-utf8-code-page

How about providing Python binaris both of "UTF-8 version" and "ANSI version"?
This idea can provide a more smooth transition of the default encoding.

1. Provide UTF-8 version since Python 3.10
2. (Some years later) Recommend UTF-8 version
3. (Some years later) Provide only UTF-8 version
4. (Some years later, maybe) Change the default encoding

The upsides of this idea are:

* We don't need to emit a warning for `open(filename)`.
* We can see the download stats.

Especially, the last point is a huge advantage compared to current
UTF-8 mode (e.g. PYTHONUTF8=1).
We can know how many users need legacy behavior in new Python
versions. That is a very important information for us.

Of course, there are some downsides:

* Windows team needs to maintain more versions.
* More divisions for "Python on Windows" environment.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KMYPF7RKDUHHXLPELA2RZC7TSPUWSHNU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-25 Thread Inada Naoki
On Tue, Jan 26, 2021 at 10:22 AM Guido van Rossum  wrote:
>
>
> Older Pythons may be easy to drop, but I'm not so sure about older unofficial 
> docs. The open() function is very popular and there must be millions of blog 
> posts with examples using it, most of them reading text files (written by 
> bloggers naive in Python but good at SEO).
>
> I would be very sad if the official recommendation had to become "[for the 
> most common case] avoid open(filename), use open_text(filename)".
>

I agree that. But until we switch to the default encoding of open(),
we must recommend to avoid `open(filename)` anyway.
The default encoding of VS Code, Atom, Notepad is already UTF-8.

Maybe, we need to update the tutorial (*) to use `encoding="utf-8"`.

(*)  
https://docs.python.org/3.10/tutorial/inputoutput.html#reading-and-writing-files


> BTW remind me what open_text() would do? How would it differ from open() with 
> the same arguments? That's too many messages back.
>

Current proposal is "open_utf8()". The differences from open() are:

* There is no encoding parameter. It uses "utf-8" always. (*)
* "b" is not allowed for mode.

(*) Another option is to use "utf-8-sig" for reading and "utf-8" for
writing. But it has some drawbacks. utf-8-sig has overhead because it
is a wrapper implemented in Python. And TextIOWrapper has fast-paths
for utf-8, but not for utf-8-sig. "utf-8-sig" may be not tested well
compared to "utf-8".

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BCMUOSHJOA36AKOWKQINNJZYAC2WIBUF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Changing the default text encoding of pathlib

2021-01-24 Thread Inada Naoki
My previous thread is hijacked about "auto guessing" idea, so I split
this thread for pathlib.

Path.open() was added in Python 3.4. Path.read_text() and
Path.write_text() was added in Python 3.5.
Their history is shorter than built-in open(). Changing its default
encoding should be easier than built-in open and TextIOWrapper.
New default encodings are:

* read_text() default encoding is "utf-8-sig"
* write_text() default encoding is "utf-8"
* open() default encoding is "utf-8-sig" when mode is "r" or None,
"utf-8" otherwise.

Of course, we need a regular deprecation period.
When encoding is omitted, they emit DeprecationWarning (or
EncodingWarning which is a subclass of DeprecationWarning) in three
versions (Python 3.10~3.12).

How do you think this idea?
Should we "change all at once" rather than "step-by-step"?

Regards,
--
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/J5VR56YRXA3PVPUH3KM72OX7SUBAZUKL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-23 Thread Inada Naoki
On Sun, Jan 24, 2021 at 10:17 AM Guido van Rossum  wrote:
>
> I have definitely seen BOMs written by Notepad on Windows 10.
>
> Why can’t the future be that open() in text mode guesses the encoding?

I don't like guessing. As a Japanese, I have seen many mojibake caused
by the wrong guess.
I don't think guessing encoding is not a good part of reliable software.

On the other hand, if we add `open_utf8()`, it's easy to ignore BOM:

* When reading, use "utf-8-sig". (it can read UTF-8 without bom)
* When writing, use "utf-8".

Although UTF-8 with BOM is not recommended, and Notepad uses UTF-8
without BOM as default encoding from 1903, UTF-8 with BOM is still
used in some cases.
For example, Excel reads CSV file with UTF-8 with BOM or legacy
encoding. So some CSV files is written with BOM.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BJC6LCYNO2HHRLHF4TFHWTG53M4YL6LL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-23 Thread Inada Naoki
On Sat, Jan 23, 2021 at 7:31 PM Paul Sokolovsky  wrote:
> >
> > * Replacing open with open_text is easier than adding `,
> > encoding="utf-8"`.
>
> How is it easier, if "open_text" exists only in imagination, while
> encoding="utf-8" has been there all this time?
>

Note that the warning is not enabled by default anytime soon.
If we decide to change the default encoding and enable the
EncodingWarning by default in Python 3.15, user can use `open_text()`
for 3.10~3.15.
It will be enough backward compatibility for most users.

>
> > * Teachers can teach to use `open_text` to open text files. Students
> > can use "utf-8" by default without knowing about what encoding is.
>
> Let's also add max_int(), min_int(), max_float(), min_float() builtins.

It is off-topic. Please don't compare apple and orange.

>
> > So `open_text()` can provide better developer experience, without
> > waiting 10 years.
>
> Except that in 10 years, when the default encoding is finally changed,
> open_text() is a useless function, which now needs to be deprecated and
> all the fun process repeated again.

Yes, if we can change the default encoding in 2030, two open functions
will become messy.
But there is no promise for the change. Without mitigating the pain,
we can not change the default encoding forever.

Anyway, thank you for your feedback.
Two people prefer `encoding="utf-8"` to `open_text()`.

I still wait for feedbacks from more people before updating the PEP 597.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6UKLKB6JRAJZOCSYPTZTS6XA6VJPQYR3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-23 Thread Inada Naoki
On Sat, Jan 23, 2021 at 7:13 PM Chris Angelico  wrote:
>
> > On the other hand, if we add `open_text()`:
> >
> > * Replacing open with open_text is easier than adding `, encoding="utf-8"`.
> > * Teachers can teach to use `open_text` to open text files. Students
> > can use "utf-8" by default without knowing about what encoding is.
> >
> > So `open_text()` can provide better developer experience, without
> > waiting 10 years.
>
> But this has a far worse end goal - two open functions with subtly
> incompatible defaults, and a big question of "why should I choose this
> over that". And if you start using open_text, suddenly your code won't
> work on older Pythons.
>

Yes. There is cons too.
That's why I posted this thread before including the idea in the PEP.
Thank you for your feedback.


> >
> > Ultimate goal is make the "utf-8" default. But I don't know when we
> > can change it.
> > So I focus on what we can do in near future (< 5 years, I hope).
> >
>
> Okay. If the goal is to make UTF-8 the default, may I request that PEP
> 597 say so, please? With a heading of "deprecation", it's not really
> clear what its actual goal is.

No. I avoid it intentionally.  I am making the PEP useful even if we
can not change the default encoding.
The PEP can be discussed without discussing we can change the default
encoding or not.

Please read the first motivation section in the PEP.
https://www.python.org/dev/peps/pep-0597/#using-the-default-encoding-is-a-common-mistake

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6OIKAWIQ6OPVDJ5ZUJECZPAY4FDUOZVD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-23 Thread Inada Naoki
On Sat, Jan 23, 2021 at 2:43 PM Random832  wrote:
>
> On Fri, Jan 22, 2021, at 20:34, Inada Naoki wrote:
> > * Default encoding is "utf-8".
>
> it might be worthwhile to be a little more sophisticated than this.
>
> Notepad itself uses character set detection [it might not be reasonable to do 
> this on the whole file as notepad does, but maybe the first 512 bytes, or the 
> result of read1(512)?] when opening a file of unknown encoding, and msvcrt's 
> "ccs=UTF-8" option to fopen will at least detect at the presence of UTF-8 and 
> UTF-16 BOMs [and treat the file as UTF-16 in the latter case].

I meant Notepad (and VS code) use UTF-8 without BOM when creating new text file.
Students learning Python can not read it with `open()`.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5WYWXLCHL6MORJDU4V7JFRI2XD7E3G5Z/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-23 Thread Inada Naoki
On Sat, Jan 23, 2021 at 10:47 AM Chris Angelico  wrote:
>
>
> Highly dubious. I'd rather focus on just moving to UTF-8 as the
> default, rather than bringing in a new function - especially with such
> a confusing name.
>
> What exactly are the blockers on making open(fn) use UTF-8 by default?

Backward compatibility. That's what PEP 597 tries to solve.

1. Add optional warning for `open()` call without specifying
`encoding` option. (PEP 597)
2. (Several years later) Make the warning default.
3. (Several years later) Change the default encoding.

When (2) happens, users are forced to write `encoding="utf-8"` to
suppress the warning.

But note that the default encoding is "utf-8" already in (most) Linux
including WSL, macOS, iOS, and Android.
And Windows user can read ASCII text files without specifying
`encoding` regardless default encoding is legacy codec or "utf-8".
So adding `, encoding="utf-8"` everywhere `open()` is used might be tedious job.

On the other hand, if we add `open_text()`:

* Replacing open with open_text is easier than adding `, encoding="utf-8"`.
* Teachers can teach to use `open_text` to open text files. Students
can use "utf-8" by default without knowing about what encoding is.

So `open_text()` can provide better developer experience, without
waiting 10 years.

> Can the proposals be written with that as the ultimate goal (even if
> it's going to take X versions and multiple deprecation phases), rather
> than aiming for a messy goal where people aren't sure which function
> to use?
>

Ultimate goal is make the "utf-8" default. But I don't know when we
can change it.
So I focus on what we can do in near future (< 5 years, I hope).

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KGQFKMX2GBDIYITJCM6MHAS5ZGUA6YDL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Adding `open_text()` builtin function. (relating to PEP 597)

2021-01-22 Thread Inada Naoki
Hi, all.

I am rewriting PEP 597 to introduce a new EncodingWarning, which
subclass of DeprecationWarning and used to warn about future default
encoding change.
But I don't think we can change the default encoding of
`io.TextIOWrapper` and built-in `open()` anytime soon. It is
disruptive change. It may take 10 or more years.

To ease the pain caused by "default encoding is not UTF-8 (almost)
only on Windows" (*), I came up with another idea. This idea is not
mutually exclusive with PEP 597, but I want to include it in the PEP
because both ideas use EncodingWarning.

(*) Imagine that a new Python user writes a text file with notepad.exe
(default encoding is UTF-8 without BOM already) or VS Code, and try to
read it in Jupyter Notebook. They will see UnicodeDecodeError. They
might not know about what encoding yet.


## 1. Add `io.open_text()`, builtin `open_text()`, and
`pathlib.Path.open_text()`.

All functions are same to `io.open()` or `Path.open()`, except:

* Default encoding is "utf-8".
* "b" is not allowed in the mode option.

These functions have two benefits:

* `open_text(filename)` is shorter than `open(filename,
encoding="utf-8")`. Its easy to type especially with autocompletion.
* Type annotation for returned value is simple than `open`. It is
always TextIOWrapper.


## 2. Change the default encoding of `pathlib.Path.read_text()`.

For convenience and consistency with `Path.open_text()`, change the
default encoding of `Path.read_text()` to "utf-8" with regular
deprecation period.

* Python 3.10: `Path.read_text()` emits EncodingWarning when the
encoding option is omitted.
* Python 3.13: `Path.read_text()` change the default encoding to "utf-8".

If PEP 597 is accepted, users can pass `encoding="locale"` instead of
`encoding=locale.getpreferredencoding(False)` when they need to use
locale encoding.

We might change more places where the default encoding is used. But it
should be done slowly and carefully.

---

How do you think about this idea? Is this worth enough to add a new
built-in function?

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PZUYJ5XDY3WDUSBFW7BAFHP3QRYES2GZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Making TYPE_CHECKING builtin.

2021-01-18 Thread Inada Naoki
Thank you! I didn't know that.
I will use `if False:  # TYPE_CHECKING` so the compiler will remove
all imports inner it.

But the official way is preferred so that all typing ecosystems follow it.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WQQW5DW4KLHW4QMBQ7KYVGH4ICVUVHCA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Making TYPE_CHECKING builtin.

2021-01-18 Thread Inada Naoki
Hi, all.

I want to write type hints without worrying about runtime overhead.
Current best practice is:

```
from __future__ import annotations

import typing

if typing.TYPE_CHECKING:
import xxx # modules used only in type hints.
```

But it would be nice if I can avoid importing even "typing" module.
How about adding TYPE_CHECKING builtin that is False?

```
from __future__ import annotations

if TYPE_CHECKING:
from typing import Any, Optional

# We can use Any, Optional, etc here.
```

I wonder if we can make TYPE_CHECKING constant like True, False, and None.
But it will break existing `from typing import TYPE_CHECKING` codes.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LS2UZYWX3VHNIMKBGLYEE75N4E7D6CEE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: asyncio.to_process function, similar to asyncio.to_thread function

2021-01-03 Thread Inada Naoki
On Sun, Jan 3, 2021 at 6:44 PM Abdulla Al Kathiri
 wrote:
>
> I suppose asyncio.to_thread uses concurrent.futures.ThreadPoolExecutor under 
> the hood, so it saves us from getting the running event loop and calling the 
> loop.run_in_executor function . Why don’t we have a similar function for the 
> ProcessPoolExecutor, e.g. asyncio.to_process?
>

asyncio.to_thread doesn't use ThreadPoolExecutor. Since thread is
cheap, the function create a new thread every time.
On the other hand, process is not cheap. That's why the only thread
has convenient way to run function in.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YPGAST223BKCD7GFIEYYOZ75ZXVYZNQX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fixing and improving statement caching in sqlite3

2020-12-30 Thread Inada Naoki
On Thu, Dec 31, 2020 at 8:43 AM James Oldfield
 wrote:
>
> 1. My first suggestion is to replace the odd cache logic with a
> straightforward classic LRU cache. I'm hoping this is uncontroversial,
> but I can imagine there might be programs that somehow depend on the
> current behaviour. But my suspicion is that many more suffer and just no
> one looked hard enough to notice.
>

LRU has worst scenario too. And implementing LRU requires some memory.
For example, functools.lru_cache() uses doubly-linked list internally.
There are some LRU-like algorithms, like clock or double chance. But
all algorithms have worst scenario anyway.

Another option is a random eviction algorithm. It chose random entry to evict.

* Frequently used keys have a high chance for cache hit, like LFU
(least frequently used).
* When frequently used keys are changed in time, old frequently used
keys are eventually evicted. (behaves like least recently frequently
used?)
* Minimum overhead for managing keys.

So I think a random eviction algorithm is better than LRU.

Off topic: Can we add random_cache to functools, like lru_cache?

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PM64J2I77AIOMN7XXZSY2F6O4SFXEEHX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Making "Any" a builtin

2020-11-29 Thread Inada Naoki
On Mon, Nov 30, 2020 at 4:27 PM Paul Bryan  wrote:
>
> I believe the __future__ import makes any annotation a string, it doesn't 
> make Any magically resolvable later. If you don't import Any into the scope 
> of the annotation, it won't be resolved when getting type hints.
>

I don't say it works 100% use cases. But when you want to use type
checker (mypy, pyre), or code completions (e.g. pylance), it "works".
When you need to support typing.get_type_hints, you can still `from
typing import Any`.

And we can consider resolving typing.Any (and any other globals) in
typing.get_type_hints(), instead of adding Any to builtins.

As Abdulla said, having both `any` and `Any` in builtins makes Python
more confusing.
If typing.get_type_hints() is the problem, why don't we changing
typing.get_type_hints() behavior?


Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2QAWJJK4TWHD2OS7TFXRJCUJK2AF5CR6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Making "Any" a builtin

2020-11-29 Thread Inada Naoki
Oh, note that Abdulla said: "we can annotate our **functions** with
“Any" right away without the extra step."

Python 3.9.0 (default, Nov 21 2020, 14:01:55)
>>> from __future__ import annotations
>>> def foo(a: Any, b: Dict[Any, Any]) -> Any: pass

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VYGGSSP22LUD6L2C244RTCUGYYIRENGL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Making "Any" a builtin

2020-11-29 Thread Inada Naoki
Since Python 3.10, you can use "Any" without "from typing import Any".
You can do it in Python 3.7 by "from __future__ import annotations" too.

See https://www.python.org/dev/peps/pep-0563/

Regards,

On Mon, Nov 30, 2020 at 12:29 AM Abdulla Al Kathiri
 wrote:
>
> Instead of importing “Any" from the typing module, we can annotate our 
> functions with “Any" right away without the extra step. What do you think? We 
> have the builtin function “any” which some Python users could mistakingly 
> use, but static type checkers should catch that.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/ELI474TKP2OKHP4NW5HOVUPKDPLYE2JP/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VENGRL6T54XQUYDXONZRZE7LUCO6MKWI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Experimenting with dict performance, and an immutable dict

2020-09-22 Thread Inada Naoki
On Thu, Sep 17, 2020 at 11:50 PM Marco Sulla
 wrote:
>
> I do not like code duplication, but dictobject.c has already a lot of
> duplicated copies of the same function for optimization (see
> lookdict).

lookdict is very special case. It is used for namespace lookup. It is
the performance critical part of CPython.
We should take care of cost/merit ratio always.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/45NMAIRSHRNFKYAGRA2EGPQDJ7IS7EXY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf

2020-09-17 Thread Inada Naoki
On Fri, Sep 18, 2020 at 12:07 AM Paolo Lammens  wrote:
>
> Besides, I don't understand what the downside of overloading is, apart from 
> purism (?).

I am one of who are conservative about overloading. I agree this is
purism, but I want to explain behind of this purism.

In statically, and nominal typed language, overloading is simple and
clear because only one type is chosen by compiler.
On the other hand, compiler or VM can not choose single type in
duck-typed (or structural typed) languages.

For example,

* str subtype can implement read/write method. It is both of PathLike
and file-like.
* File subtype can implement `.__fspath__`. It is both of PathLike and File.

Of course, statically typed languages like Java allow implementing
multiple interfaces. But Java programmer must choose one interface
explicitly when it is ambiguous. So it is explicit what type is used
in overloading.

On the other hand, in case of Python, there are no compiler/VM support
for overloading, because Python is duck-typed language.

* `load(f, ...)` uses `f.read()`
* `dump(f, ...)` uses `f.write()`
* `loadf(path, ..)` and `dumpf(path, ...)` uses `open(path, ...)`

This is so natural design for duck-typed language.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZZOHJITXXEYEAW37KM6JB3HWS54TOFU6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf

2020-09-17 Thread Inada Naoki
On Thu, Sep 17, 2020 at 3:02 PM Wes Turner  wrote:
>
> Something like this in the docstring?: "In order to support the historical 
> JSON specification and closed ecosystem JSON, it is possible to specify an 
> encoding other than UTF-8."
>

I don't think dumpf should support encoding parameter.

1. Output is ASCII unless `ensure_ascii=True` is specified.
2. Writing new JSON file with obsolete spec is not recommended.
3. If user really need it, they can write obsolete JSON by `dump` or
`dumps` anyway.

I against adding `encoding` parameter to dumpf and loadf. They are
just shortcut for common cases.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7V3UAEBUKYEQDYYJIGJUVW3DYCDDQN46/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf

2020-09-16 Thread Inada Naoki
On Thu, Sep 17, 2020 at 6:54 AM Wes Turner  wrote:
>
>
> Why would we impose UTF-8 when the spec says UTF-8, UTF-16, or UTF-32?

Obsolete JSON spec said UTF-8, UTF-16, and UTF-32. Current spec says UTF-8.
See https://tools.ietf.org/html/rfc8259#section-8.1

So `dumpf` must use UTF-8, although `loadf` can support UTF-16 and
UTF-32 like `loads`.

>
> How could this be improved? (I'm on my phone, so)
>
> def dumpf(obj, path, *args, **kwargs):
> with open(getattr(path, '__path__', path), 'w', 
> encoding=kwargs.get('encoding', 'utf8')) as _file:
> return dump(_file, *args, **kwargs)
>
> def loadf(obj, path, *args, **kwargs):
> with open(getattr(path, '__path__', path), 
> encoding=kwargs.get('encoding', 'utf8')) as _file:
> return load(_file, *args, **kwargs)
>

def dumpf(obj, path, *, **kwargs):
with open(path, "w", encoding="utf-8") as f:
return dump(obj, f, **kwargs)

def loadf(obj, path, *, **kwargs):
with open(path, "rb") as f:
return load(f, **kwargs)

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5RPHOVBMAC3USBKY7S2G4WVEC4JR4IV6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Experimenting with dict performance, and an immutable dict

2020-09-16 Thread Inada Naoki
On Thu, Sep 17, 2020 at 8:03 AM Marco Sulla
 wrote:
>
> Well, it seems ok now:
> https://github.com/python/cpython/compare/master...Marco-Sulla:master
>
> I've done a quick speed test and speedup is quite high for a creation
> using keywods or a dict with "holes": about 30%:

30% on microbenchmark is not quite high.

For example, I have optimized "copy dict with holes" but I rejected my
PR because I am not sure performance / maintenance cost ratio is good
enough.

https://bugs.python.org/issue41431#msg374556
https://github.com/python/cpython/pull/21669

>
> python -m timeit -n 2000  --setup "from uuid import uuid4 ; o =
> {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i
> in range(1)}" "dict(**o)"
>

I don't think this use case is worth to optimize, because `dict(o)` or
`o.copy()` is Pythonic.


> python -m timeit -n 1  --setup "from uuid import uuid4 ; o =
> {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i
> in range(1)} ; it = iter(o) ; key0 = next(it) ; o.pop(key0)"
> "dict(o)"
>

It is controversial. If the optimization is very simple, it might be
worth enough.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LE6RLLKF4QRRA4P2EXUK5MXVH6X4CSUZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Experimenting with dict performance, and an immutable dict

2020-09-15 Thread Inada Naoki
On Tue, Sep 15, 2020 at 5:08 AM Marco Sulla
 wrote:
>
> 1. How can we check the size of an object only if it's an iterable
> using the Python C API?

There is no good way. Additionally, we need to know distinct count if
we want to preallocate hash table.
For example, `len(dict(["foo"]*1000))` is 1, not 1000.


> 2. Why, in your opinion, no relevant speedup was done?
>

We have "one big resize" logic in dict_merge already.
And I use dummy empty dictkeys for new empty dict.
So we don't allocate any temporary, intermediate dictkey object.

Bests,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DT3VMOMUJG7R7V2RLVRXJZAFXEPKLBKA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf

2020-09-13 Thread Inada Naoki
On Mon, Sep 14, 2020 at 10:52 AM David Mertz  wrote:
>
> Yes, that is a design flaw in the stdlib. There ought to be an opt-in switch 
> for accepting/producing those special values, not the current opt-out for 
> strictness... And the misnamed parameter is 'allow_nan' whereas it also 
> configures 'Infinity'.
>

In case of encoding, we deprecated and ignored it in json.loads since
Python 3.1, and removed in 3.9.
Users still can load/save JSON with legacy encodings with open() + dump/load.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OGXJMTTAHYVPOY7HW7PSJK56KJ3HSGPD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf

2020-09-11 Thread Inada Naoki
On Sat, Sep 12, 2020 at 7:59 AM Guido van Rossum  wrote:
>
> What happened to "not every three-line function needs to be a built-in"? This 
> is *literally* a three-line function.

This is not only common two line idiom. It creates huge amount of
potential bugs.

```
with open("myfile.json") as f:
data = json.load(f)

with open("myfile.json", "w") as f:
json.dump(f, ensure_ascii=False)
```

Both two lines have bugs; they don't specify `encoding` [1]. It uses
locale encoding to read/write JSON although the JSON file must be
encoded in UTF-8.
The locale encoding is legacy encoding on Windows. It is very easy to
write "not work on Windows".

My PEP 597 [2] will help to find such bugs. But it warns only in dev
mode to avoid too noisy DeprecationWarning.
Huge amounts of DeprecationWarning make people dismiss DeprecationWarning.

So helper functions will save people from this kind of bugs too.

[1] In case of `json.load(f)`, we can use binary file instead.
[2] https://www.python.org/dev/peps/pep-0597/

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/X5GEMWMPBGQGXPQLBRKV7D7KHKC6ODOA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-09-01 Thread Inada Naoki
On Mon, Aug 31, 2020 at 9:30 PM  wrote:
> I have a use case which relates to this request: iterating over a dict 
> starting from a given key. I would like to achieve this without having to pay 
> the full O(n) cost if I'm going to be iterating over only a few items. My 
> understanding is that this should be achievable without needing to iterate 
> through the entire dict, since the dict's internal key lookup points to a 
> particular index of dk_entries anyway.
>
[snip]
> Doing this efficiently would require either the addition of indexing to dicts 
> as well as some sort of get_key_index operation, or else could be done 
> without knowing indices if an iter_from_key operation were introduced (which 
> used the internal dk_indices to know where to start iterating over 
> dk_entries). I think this thread touches on the same sorts of debates however 
> so I'm mentioning this here.
>

Note that proposed index access is O(n), not O(1). So `get_key_index`
doesn't match your use case.

On the other hand, iter_from_key is possible idea.

Another API idea is `d.next_item(key)` and `d.prev_item(key)`. They
return None if the key is left/right end. They raise KeyError if key
not found. Other wise, they return (key, item) pair.


> I also think that even if adding new features to the built-in dict is 
> undesirable, adding a collections.IndexableDict would be very useful 
> (sortedcollections.IndexableDict exists but sorting is not acceptable for 
> many use cases).

Maybe, OrderedDict can have od.next_item(key) and od.prev_item(key).

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2BIYAHUZTPRR5P6Y3MDVFBVY7PEECEDE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Deferred, coalescing, and other very recent reference counting optimization

2020-08-23 Thread Inada Naoki
Python's cyclic GC collector uses exact reference count.
See https://devguide.python.org/garbage_collector/ for detail.

On Mon, Aug 24, 2020 at 12:29 AM Raihan Rasheed Apurbo
 wrote:
>
> In CPython we have reference counting. My question is can we optimize current 
> RC using strategies like Deferred RC and Coalescing? If no then where would I 
> face problem if I try to implement these sorts of strategies?
>
> These strategies all depend on the concept that we don't need the exact value 
> of reference count all the time. So far in my observation, we only need exact 
> value before running a cycle collector.  If we can manage to make sure that 
> we have exact value before entering the cycle collector then in my opinion we 
> can add these optimizations strategies to some extent.  Is there something 
> that I am missing? Or It is quite possible? If not possible please tell me 
> the factors I should consider.
>
> Thanks in advance.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/G2BCHK766Z6ABFEF5KOKC27W4VBNNVSE/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2Y64LLWFGPUGLOCAC42NPBFSYZGOYWLY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-03 Thread Inada Naoki
On Tue, Aug 4, 2020 at 3:35 AM Christopher Barker  wrote:
>
> On Sat, Aug 1, 2020 at 6:10 PM Inada Naoki  wrote:
>>
>> Repacking is mutation, and mutating dict while iterating it breaks the 
>> iterator.
>> But `d.items()[42]` don't looks like mutation.
>
>
> Pardon my ignorance, but IS repacking a mutation? Clearly it's mutating the 
> internal state, but at the logical structure of the dict will not have 
> changed.
>

You are totally right.  Repacking mutates internal state so it breaks
iterator. But it doesn't look like mutation from users' point of view.
It is the main problem.
We can not repack silently.

> Though I suppose if it's being iterated over, then the iterator is keeping an 
> index into the internal array, which would change on repacking?
>

Yes.

> which means that it's worse than not looking like a mutation, but it could 
> make active iterators return results that are actually incorrect.
>

In most cases, iterator can detect it and raise RuntimeError.

> I have to think that this could be accommodated somehow, but with ugly 
> special case code, so yeah, not worth it. Though you could keep track of if 
> there are any active views (ir even active iterators) and not repack in that 
> case. I'm sure most dict iterators are most commonly used right away.
>
> Is repacking ever currently done with dicts? If so, then how is this issue 
> worked around?
>

Repacking happens only when insertion resize; when inserting an item
but there is no space to insert.
dict.clear() also creates clean empty dict.

Currently, del/pop doesn't cause repacking.
https://bugs.python.org/issue32623

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SQKYMBAN4ROLRKPWBT7OVVCSTU6JKJFT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Inada Naoki
On Sun, Aug 2, 2020 at 2:34 AM Christopher Barker  wrote:
>
> On Sat, Aug 1, 2020 at 2:28 AM Marco Sulla  
> wrote:
>>
>> On Sat, 1 Aug 2020 at 03:00, Inada Naoki  wrote:
>>>
>>> Please teach me if you know any algorithm which has no hole, O(1)
>>> deletion, preserving insertion order, and efficient and fast as array.
>
>
> I would think the goal here would be to re-order once in a while to remove 
> the holes. But that would take time, of course, so you wouldn't want to do it 
> on every deletion. But when?
>
> One option: maybe too specialized, but it could re-pack the array when an 
> indexing operation is made -- since that operation is O(N) anyway. And that 
> would then address the issue of performance for multiple indexing operations 
> -- if you made a bunch of indexing operation in a row without deleting (which 
> would be the case, if this is an alternative to making a copy in a Sequence 
> first), then the first one would repack the internal array (presumably faster 
> than making a copy) and the rest would have O(1) access.
>

Repacking is mutation, and mutating dict while iterating it breaks the iterator.
But `d.items()[42]` don't looks like mutation.

> Given that this use case doesn't appear to be very important, I doubt it's 
> worth it, but it seems it would be possible.
>
> Another thought -- could the re-packing happen whenever the entire dict is 
> iterated through? Though maybe there's no way to know when that's going to 
> happen -- all you get are the individual calls for the next one, yes?
>

You are right. it couldn't.


-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ED2GRWD4RARR2LGP45PK4M6R3MLTAF75/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-08-01 Thread Inada Naoki
On Sat, Aug 1, 2020 at 12:40 PM Steven D'Aprano  wrote:
>
> On Fri, Jul 31, 2020 at 08:08:58PM -0700, Guido van Rossum wrote:
>
> > > The other simple solution is `next(iter(mydict.items()))`.
> > >
> >
> > That one always makes me uncomfortable, because the StopIteration it raises
> > when the dict is empty might be misinterpreted. Basically I never want to
> > call next() unless there's a try...except StopIteration: around it, and
> > that makes this a lot less simple.
>
> Acknowledged. But there are ways to solve that which perhaps aren't as
> well known as they should be.
>
> * Use a default: `next(iter(mydict.items()), MISSING)`
>
> * Use a helper to convert StopIteration to something else.
>

There is a most simple solution:

* `[first] = mydict.items()`, or `first, = mydict.items()`

Anyway, should we add some tools to itertools, instead of "itertools recipe"?

* `first(iterable, default=None)` -- same to `[first] = iterable`, but
return default value instead of ValueError when iterable is empty.
* `nth(iterable, n, default=None)`
* `consume(iterator, n=None)`

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/75EFRNZQS7FZWVS5BL2RZ73QKA3D4NZR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-31 Thread Inada Naoki
On Sat, Aug 1, 2020 at 10:19 AM Wes Turner  wrote:
>
> We should be reading the source: 
> https://github.com/python/cpython/blob/master/Objects/dictobject.c
>
> AFAIU, direct subscripting / addressing was not a use case in the design 
> phase of the current dict?
>
> Could a __getitem__(slice_or_int_index) be implemented which just skips over 
> the NULLs?
> Or would that be no faster than or exactly what islice does when next()'ing 
> through?
>

There are two major points to optimize.

* Iterating over `next(islice(dict.items(), n, n+1))` will produce n
temporary tuples.
* (CPython implementation detail) dict can detect if there is no hole.
index access is O(1) if there is no hole.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BL24ZADLYFMRXQXHJ3HNQQMCEDTORLZH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-31 Thread Inada Naoki
On Sat, Aug 1, 2020 at 8:14 AM Christopher Barker  wrote:
>
>
> If that's all we've come up with in this lengthy thread, it's probably not 
> worth it -- after all, both of those can be efficiently accomplished with a 
> little help from itertools.islice and/or next().
>

100% agree.

> But I think we all agree that those tools are less newbie-friendly -- but 
> they should be learned at some point, so maybe that's OK (and there is always 
> the wrap it with a list approach, which is pretty newbie friendly)

I don't agree with it.  It is somewhat newbie-frindly, but somewhat
newbie unfriendly.

Newbie will use random.sample(d.items()) or get-by-index without
knowing it is O(n) even when they can create a list once and use it
repeatedly.
When people learn islice, they will learn it is not O(1) operation too.

If dict views support direct indexing, it is very difficult to notice
it is O(n) operation.  They just think "Oh Python is fucking slow!".
So it is newbie-unfriendly at some point.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4FR2RPVA3HXVSSZMERSEYCPRHOVOX5A2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-31 Thread Inada Naoki
On Sat, Aug 1, 2020 at 10:04 AM Wes Turner  wrote:
>
> Actually, I think the reverse traversal case is the worst case because: it's 
> not possible to use negative subscripts with islice (because that would 
> require making a full copy).
>
> This doesn't work:
> >>> islice(dict.keys(), -1, -5)
>
> Reverse traversal did work in Python 2 but was foregone when making .keys() a 
> view in Python 3 in order to avoid lulling users into making usually 
> unnecessary copies.
>

dict is reversible now. You can do `islice(dict, 0, 5)`.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7F2UCQPDZ3SWSQHVPBBQQKNHVIQWKLV3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-31 Thread Inada Naoki
On Sat, Aug 1, 2020 at 9:55 AM Marco Sulla  wrote:
>
> On Sat, 1 Aug 2020 at 02:30, Stestagg  wrote:
>>
>> The dict keys is compact only *until* you delete an item, at which point, a 
>> hole is left in the array
>
> No, the array of items has no hole. The hole is inserted in the hashtable.

Yes, the array of items has hole. Otherwise, `del d[k]` become `O(n)`,
or `del d[k]` won't preserve insertion order.
Please teach me if you know any algorithm which has no hole, O(1)
deletion, preserving insertion order, and efficient and fast as array.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TH27RES3NQYIVZE7YD2QEQRAK7UCDVXL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Experimenting with dict performance, and an immutable dict

2020-07-28 Thread Inada Naoki
On Sun, Jul 26, 2020 at 4:44 AM Marco Sulla
 wrote:
>
> I also remembered another possible use-case: kwargs in CPython. In C code, 
> kwargs are PyDictObjects. I suppose they are usually not modified; if so, 
> fdict could be used, since it seems to be faster at creation.
>

I have not confirmed why frozendict is faster than dict for creation.
But note that kwargs is not created by `dict(d)`.
It is created by PyDict_New() and PyDict_SetItem(). Please benchmark
these C APIs if you want to propose the idea.

https://github.com/python/cpython/blob/a74eea238f5baba15797e2e8b570d153bc8690a7/Python/ceval.c#L4155
https://github.com/python/cpython/blob/a74eea238f5baba15797e2e8b570d153bc8690a7/Python/ceval.c#L4245


FWIW, I optimized dict(d) in https://bugs.python.org/issue41431
(https://github.com/python/cpython/pull/21674 )

$ ./python -m pyperf timeit --compare-to ./python-master -s
'd=dict.fromkeys(range(1000))' -- 'dict(d)'
python-master: . 21.5 us +- 0.2 us
python: . 4.52 us +- 0.16 us
Mean +- std dev: [python-master] 21.5 us +- 0.2 us -> [python] 4.52 us
+- 0.16 us: 4.76x faster (-79%)

Regards,

--
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/J6OOBGCTLEUX7CBDIDNKYPH4KNAZV4GQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Experimenting with dict performance, and an immutable dict

2020-07-22 Thread Inada Naoki
On Wed, Jul 22, 2020 at 4:29 PM Marco Sulla
 wrote:
>
> Furthermore, it seems that pyperf has not disabled ASLR. After `sudo python 
> -m pyperf system tune`, ASRL continues to be in "Full randomization" mode.
>

You are right. pyperf doesn't disable ASLR, because code performance
is changed by code layout.
pyperf runs benchmark multiple times in isolated processes and
measures stats instead.

Victor Stinner, the author of pyperf wrote a lot of information about
measuring performance.
It's very nice to read before benchmarking.
https://vstinner.readthedocs.io/benchmark.html

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BQJFBCGFFGVD4BVSJYDJHC7TQG3WDIBV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Experimenting with dict performance, and an immutable dict

2020-07-21 Thread Inada Naoki
On Wed, Jul 22, 2020 at 7:31 AM Marco Sulla
 wrote:
>
> For benchmarks, I used simply timeit, with autorange and repeat and, as 
> suggested in the module documentation, I got the minimum of the results. Here 
> is the code:
>
> https://github.com/Marco-Sulla/cpython/blob/master/frozendict/test/bench.py
>

I strongly recommend to use pyperf for benchmarking.
Otherwise, you will see random performance changes caused by random
reasons including ASLR.

https://pypi.org/project/pyperf/
https://pyperf.readthedocs.io/en/latest/

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HJJFTU42RMB5QTZ5HL7H2J73XMNT2BNW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-11 Thread Inada Naoki
On Sun, Jul 12, 2020 at 4:43 AM Christopher Barker  wrote:
>
> > The existing dictionary memory layout doesn't support direct indexing 
> > (without stepping), so this functionality is not being added as a 
> > requirement.
>
> But it does make it much more efficient if the stepping is done inside the 
> dict object by code that knows its internal structure. Both because it can be 
> in C, and can be done without any additional references or copying. yes, it's 
> all O(n) but a very different constant.
>

It's just a difference in proportional constants.
If the performance difference is really important, dict_views must
have `d.items().tolist()` (replacement for `list(d.items())`) before
random indexing. It is much more used.

Currently, list(iterable) doesn't have any specialized algorithm for
dict views. (As far as I know, it doesn't have specialized algorithm
even for dict).

>
> > If random.choice should support non-sequence ordered container,
> just propose it to random.choice.
>
> That would indeed solve the usability issue, and so may be a good idea,
>
> The problem here is that there is no way for random.choice to efficiently 
> work with generic Mappings. This whole discussion started because now that 
> dicts preserve order, there is both a logical reason, and a practical 
> implementation for indexing. But if that is not exposed, then 
> random.choice(), nor any other function, can take advantage of it.
>

Ditto.  Iterating internal structure directly is not so important.
And there is only little performance difference between current dict
and previous dict implementation for iteration.
I suppose new dict implementation is faster only 20~40%, when dict is
clean (no item is removed yet).

So I don't think preserving order is good reason to support indexing
while `random.choice` is the only use case.


> Which would lead to adding a random_choice protocol -- but THAT sure seems 
> like overkill.
> (OK, you could have the builtin random.choice check for an actual dict, and 
> then use custom code to make a random selection, but that would really be a 
> micro-optimization!)

I already wrote sample code using `itertools.islice()`. It works for
all containers with len() and iterator.
No need for adding protocol.


> If a feature is useful, and doesn't conflict with another feature, then we 
> can add it.

I believe this is a bad idea. It leads software to be huge,
unmaintainable, and unusable.
A Relatively high bar must be set for adding a feature to builtin type
than adding a third party package on PyPI.


Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EHUQLA2W4YQFXTCRF6W5BOY4UH7C6RKU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-10 Thread Inada Naoki
I'm -1 too.

Making ordered containers to sequence-like only for random.choice is a
wrong idea.

If random.choice should support non-sequence ordered container,
just propose it to random.choice.
And if the proposal was rejected, you can do it by yourself with
helper functions.

>>> import random
>>> from itertools import islice
>>> def nth(iterable, n):
... return next(islice(iterable, n, None))
>>> def choice_from_container(c):
... return nth(c, random.randrange(len(c)))
>>> d = dict.fromkeys(range(1))
>>> choice_from_container(d)
61
>>> choice_from_container(d)
9858
>>> choice_from_container(d.keys())
2436
>>> choice_from_container(d.items())
(7685, None)

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XEXTKEOLJBNWHYPUQMGFTNN4SZQHWLMH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-10 Thread Inada Naoki
Oh, sorry. I read you did "random pick from dict", but I hadn't read
you did "random pick and mutate dict" use case or not.

On Fri, Jul 10, 2020 at 3:08 PM Chris Angelico  wrote:
>
> On Fri, Jul 10, 2020 at 4:04 PM Inada Naoki  wrote:
> >
> > On Fri, Jul 10, 2020 at 2:56 PM Chris Angelico  wrote:
> > >
> > > but it might be
> > > mutated in between. So the list would have to be reconstructed fresh
> > > every time.
> > >
> >
> > Do you think the use case is common enough to add a feature to builtin type?
> > I can not imagine.
> >
>
> Go back and reread my post, I already answered this and several other
> questions :)
>
> ChrisA



-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JAIJB2R6JDRGN7HANOBGA53DZUCBXKHA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-10 Thread Inada Naoki
On Fri, Jul 10, 2020 at 2:56 PM Chris Angelico  wrote:
>
> but it might be
> mutated in between. So the list would have to be reconstructed fresh
> every time.
>

Do you think the use case is common enough to add a feature to builtin type?
I can not imagine.

Anyway, if you want to benchmark that use case, the benchmark should
include time to mutate dict too. Otherwise, performance benefit is exaggerated
and no one trusts the benchmark.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RCJRT4AAKZ6GEZIYENBOBPXRZJTMPRN4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-09 Thread Inada Naoki
On Fri, Jul 10, 2020 at 1:53 PM Chris Angelico  wrote:
>
>
> And immediately above that part, I said that I had made use of this,
> and had used it in Python by listifying the dict first. Okay, so I
> didn't actually dig up the code where I'd done this, but that's a use
> case. I have *actually done this*. In both languages.
>

Do you pick random item repeatedly from same dictionary?
If so, you can create a list once in Python.

Or do you pick random item from various dictionaries?
If so, you application must create many dictionaries, not only picking
random item.
Unless randam picking part is the bottleneck, micro optimization for
this part is not important.

That's why the higher level use case is needed to design useful benchmark.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TWCTMD4O2LQ2FATD6BG2BX4VEZPFZWNX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-09 Thread Inada Naoki
On Thu, Jul 9, 2020 at 12:45 PM Christopher Barker  wrote:
>
> On Wed, Jul 8, 2020 at 7:13 PM Inada Naoki  wrote:
>>
>> I think this comparison is unfair.
>
> well, benchmarks always lie 
>
>> > d.items()[0]vslist(d.items())[0]
>>
>> Should be compared with `next(iter(d.items())`
>
> why? the entire point of this idea is to have indexing syntax -- we can 
> already use the iteration protocol top do this. Not that it's a bad idea to 
> time that too, but since under the hood it's doing the same or less work, I'm 
> not sure what the point is.
>

Because this code solves "take the first item in the dict".

If you need to benchmark index access, you should compare your
dict.items()[0] and list index.
You shouldn't create list from d.items8) every loop.

>> > d.keys()[-1] vs  list(d.keys())[-1]
>>
>> Should be compared with `next(reversed(d.keys()))`, or `next(reversed(d))`.
>
>
> Same point - the idea is to have indexing syntax. Though yes, it would be 
> good to see how it compares. But I know predicting performance is usually 
> wrong, but this is going to require a full traversal of the underlying keys 
> in either case.
>

Same here.  And note that dict and dict views now supports reversed().

>>
>> > random.choice(d.items())   vsrandom.choice(list(d.items()))
>>
>> Should be compared with `random.choice(items_list)` with `items_list =
>> list(d.items())` setup too.
>
> I don't follow this one -- could you explain? what is items_list ?

I explained `item_list = list(d.items())`.  Do it in setup (e.g. before loop.)
("setup" is term used by timeit module.)

>
> But what this didn't check is how bad the performance could be for what I 
> expect would be a bad performance case -- indexing teh keys repeatedly:
>
> for i in lots_of_indexes:
>  a_dict.keys[i]
>
> vs:
>
> keys_list = list(a_dict.keys)
> for it in lots_of_indexes:
>  keys_list[i]
>

You should do this.

> I suspect it wouldn't take all that many indexes for making a list a better 
> option.
>

If you need to index access many times, creating list is the recommended way.
You shouldn't ignore it.  That's why I said it's an unfair comparison.
You should compare "current recommended way" vs "propsed way".

> But again, we are badk to use cases. As Stephen pointed out no one has 
> produced an actualy production code use case.

I agree.

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CVADSB7KDXTN6Y7VGY3M5DO76DXBLU5Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-08 Thread Inada Naoki
I think this comparison is unfair.

> d.items()[0]vslist(d.items())[0]

Should be compared with `next(iter(d.items())`

> d.keys()[-1] vs  list(d.keys())[-1]

Should be compared with `next(reversed(d.keys()))`, or `next(reversed(d))`.

> random.choice(d.items())   vsrandom.choice(list(d.items()))

Should be compared with `random.choice(items_list)` with `items_list =
list(d.items())` setup too.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/73YTQDLWN6SUSOZ62C23SFHK3FIJGY3Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-07-07 Thread Inada Naoki
On Tue, Jul 7, 2020 at 10:52 PM Dominik Vilsmeier
 wrote:
>
> Surely that must be a relic from pre-3.7 days where dicts were unordered
> and hence order-based comparison wouldn't be possible (though PEP 3106
> describes an O(n*m) algorithm). However the current behavior is
> unfortunate because it might trick users into believing that this is a
> meaningful comparison between distinct objects (given that it works with
> `dict.keys` and `dict.items`) when it isn't.
>
> So why not make dict_values a Sequence, providing __getitem__ and
> additionally order-based __eq__ comparison?

It was rejected in this thread.
https://mail.python.org/archives/list/python-...@python.org/thread/R2MPDTTMJXAF54SICFSAWPPCCEWAJ7WF/#K3SYX4DER3WAOWGQ4SPKCKXSXLXTIVAQ

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UTIVOHIB24BZETOGXMJJWRMGOM72UEHB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Access (ordered) dict by index; insert slice

2020-06-29 Thread Inada Naoki
On Mon, Jun 29, 2020 at 9:12 PM Hans Ginzel  wrote:
>
> Tahnk you,
>
> On Fri, Jun 26, 2020 at 10:45:07AM -0700, Brett Cannon wrote:
> >Why can't you do `tuple(dict.items())` to get your indexable pairs?
>
> of course, I can.
> But how it is expensive/effective?
> What are the reasons, why object dict.items() is not subscriptable – 
> dict.items()[0]?
>
>

Because dict is optimized for random access by key and iteration, but not for
random access by index.

For example:

sample 1:

items = [*d.items()]
for i in range(len(items)):
do(items[i])

sample 2:

for i in range(len(d)):
do(d.items()[i])  # if dict_items supports index access.

sample 1 is O(n) where n = len(d), but sample 2 is O(n^2).

By not supporting index access, dict_items prevents to write such
inefficient code.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KCCVM6PRXZPMTUQEKOF4VAG6ATS4XFSK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Permanent code objects (less memory, quicker load, less Unix Copy On Write)

2020-06-22 Thread Inada Naoki
On Mon, Jun 22, 2020 at 8:27 PM Barry Scott  wrote:
>
> * New code and pyc format
>  * pyc has "rodata" segment
>* It can be copied into single memory block, or can be mmapped.
>  * co_code should be aligned at least 2 bytes.
>
>
> Would higher alignment help? malloc is using 8 or 16 byte alignment isn't it?
> Would that be better  for packing the byte code into cache lines?
>

It may.  But I am not sure.
I said "at least 2 byte" because we use "word code".  We read the "word code"
by `uint16_t *`, not `unsignet char *`.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PMPWPP2YF7JSTT6XOVCXNF6G7X6WSSEU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Permanent code objects (less memory, quicker load, less Unix Copy On Write)

2020-06-22 Thread Inada Naoki
On Mon, Jun 22, 2020 at 12:00 AM Guido van Rossum  wrote:
>
>
> I believe this was what Greg Stein's idea here was about. (As well as 
> Jonathan Fine's in this thread?) But the current use of code objects makes 
> this hard. Perhaps the code objects could have a memoryview object to hold 
> the bytecode instead of a bytes object.
>

memoryview is heavy object.  Using memoryview instead of bytes object
will increase memory usage.
I think lightweight bytes-like object is better.  My rough idea is:

* New code and pyc format
  * pyc has "rodata" segment
* It can be copied into single memory block, or can be mmapped.
  * co_code should be aligned at least 2 bytes.
  * code.co_code can point to memory block in "rodata".
  * docstring, signature, and lnotab can be lazy load from "rodata".
* signature is serialized in JSON like format.
  * Allow multiple modules in single file
* Reduce fileno when using mmap
* Merge more constants
  * New Python object: PyROData.
* It is like read-only bytearray.
  * But body may be mmap-ped, instead of malloc-ed
* code objects owns reference of PyROData.
  * When PyROData is deallocated, it munmap or free "rodata" segment.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VKBXY7KDI2OGESB7IPAMAIIHKR4TC7TQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Doc preview in Github PR

2020-06-16 Thread Inada Naoki
It was suspended because of limitation of netlify.

For now, you can download zipped HTML files from Github Action.
For example, visit this page and click "Artifacts" in the right side.
https://github.com/python/cpython/pull/20879/checks?check_run_id=771369070

Regards,

On Wed, Jun 17, 2020 at 8:19 AM  wrote:
>
> I think the possibility to have publish a preview when a PR changes the 
> documentation was considered at one time but I can't find anything about this 
> now. Has this been scraped?
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/4KUGG5M3ECLXMH4M2JWPKA7B6GRLZHYD/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VWXNSFESEQHSMP4TQDWGDNVAVJKCUOL6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] PEP 597 -- Soft deprecation of omitting encoding

2020-05-03 Thread Inada Naoki
   with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()

subprocess module doesn't warn
<https://www.python.org/dev/peps/pep-0597/#id14>

While the subprocess module uses TextIOWrapper, it doesn't raise
PendingDeprecationWarning. It uses the "locale" encoding by default.
Rationale <https://www.python.org/dev/peps/pep-0597/#id15>
"locale" is not a codec alias
<https://www.python.org/dev/peps/pep-0597/#id16>

We don't add the "locale" to the codec alias because locale can be changed
in runtime.

Additionally, TextIOWrapper checks os.device_encoding() when encoding=None.
This behavior can not be implemented in the codec.
Use a PendingDeprecationWarning
<https://www.python.org/dev/peps/pep-0597/#id17>

This PEP doesn't make decision about changing default text encoding. So we
use PendingDeprecationWarning instead of DeprecationWarning for now.
Raise warning only in dev mode
<https://www.python.org/dev/peps/pep-0597/#id18>

This PEP will produce a huge amount of PendingDeprecationWarning. It will
be too noisy for most Python developers.

We need to fix warnings in standard library, pip, and major dev tools like
pytest before raise this warning by default.
subprocess module doesn't warn
<https://www.python.org/dev/peps/pep-0597/#id19>

The default encoding for PIPE is relating to the encoding of the stdio. It
should be discussed later.
Reference Implementation <https://www.python.org/dev/peps/pep-0597/#id20>

https://github.com/python/cpython/pull/19481
References <https://www.python.org/dev/peps/pep-0597/#id21>
[1] "Packages can't be installed when encoding is not UTF-8" (
https://github.com/methane/pep597-pypi-ascii)
[2] "Logging - Inconsistent behaviour when handling unicode" (
https://bugs.python.org/issue37111)
[3] Packaging tutorial in packaging.python.org didn't specify encoding to
read a README.md (https://github.com/pypa/packaging.python.org/pull/682)
[4] json.tool had used locale encoding to read JSON files. (
https://bugs.python.org/issue33684)
Copyright <https://www.python.org/dev/peps/pep-0597/#id22>

This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/master/pep-0597.rst

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PUYQRUEIUKI6UHI44QWWCZDZR2XKZKI5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Allow using the or operator to denote unions in type annotations

2020-03-13 Thread Inada Naoki
On Sat, Mar 14, 2020 at 1:12 PM Inada Naoki  wrote:
> I'm sorry, I meant (a) looks more consistent with PEP 560.
>

Sorry again, I meant PEP 585, not PEP 560 as Guido explained already.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CW2OXROXBPVIWXDF2KMNRHTHTCBPMBSR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Allow using the or operator to denote unions in type annotations

2020-03-13 Thread Inada Naoki
On Sat, Mar 14, 2020 at 11:24 AM Inada Naoki  wrote:
>
[snip]
> a) Add `|` to all types.
> b) Support it only statically (`from __future__ import annotations`).
>
[snip]
>  But (b) seems more consistent with PEP 560.
>

I'm sorry, I meant (a) looks more consistent with PEP 560.

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RT7IC22JWPIW5EYTDUZUG4FXC3XRXPK7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Allow using the or operator to denote unions in type annotations

2020-03-13 Thread Inada Naoki
First of all, I am not so happy about typing is increasing
Python runtime complexity.

TypeScript is the most successful language with gradual typing.
It has almost zero-cost about typing.  It doesn't make JavaScript
runtime complex.  I hoped Python goes in same way.
But Python went the different way.  typing affected Python runtime
complexity, application memory footprint and startup time...


Anyway, I like the idea of using `|` for union.
I think there are two approaches to support it.

a) Add `|` to all types.
b) Support it only statically (`from __future__ import annotations`).

Both ideas are explained in PEP 604 already.
I prefer (b) to (a) because I don't want to increase runtime
behavior for static typing.  But (b) seems more consistent with
PEP 560.

Regards,
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/H7ZNYEXC6UQ43I2TUOJUASOTAVBF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: List - append

2020-01-18 Thread Inada Naoki
On Sun, Jan 19, 2020 at 2:45 PM Siddharth Prajosh  wrote:
>
> Moreover, shouldn't it work?
> How do I add that feature in Python?

How you can do it with warus operator.

>>> (xs := list(range(10))).append(42)
>>> xs
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 42]

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/24YMTGYKCNFHS2JROYSNSP6MUPZAFUCV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Recommend UTF-8 mode on Windows

2020-01-14 Thread Inada Naoki
On Sun, Jan 12, 2020 at 9:32 PM Eryk Sun  wrote:
>
> In both of the above cases, what I'd prefer is for UTF-8 mode to take
> precedence over legacy modes, i.e. to disable
> config->legacy_windows_fs_encoding and config->legacy_windows_stdio in
> the startup configuration.
>

UTF-8 mode shouldn't take precedence over legacy FS encoding.

Mercurial uses legacy encoding for file paths.  They use
sys._enablelegacywindowsfsencoding() on Windows.
https://www.mercurial-scm.org/repo/hg/rev/8d5489b048b7

Since Mercurial uses binary file almost always, I think the UTF-8 mode
doesn't break Mercurial.  But I'm not sure. (Note that Mercurial on Python 3
on Windows is still beta.)

Regards,

-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6DK4FJIPACW7ELGFLT5I3QFHWYFWEWYJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Recommend UTF-8 mode on Windows

2020-01-11 Thread Inada Naoki
On Sat, Jan 11, 2020 at 11:03 AM Kyle Stanley  wrote:
>
> > 1. Recommend it in the official document "Using Python on Windows" [2].
> > 2. Show the UTF-8 mode status in the command line mode header [3] on 
> > Windows.
> > 3. Show the link to the UTF-8 mode document in the command line mode header 
> > too.
> > 4. Add checkbox to set "PYTHONUTF8=1" environment variable in the installer.
>
> > How do you think?
>
> At the least, I'm in favor of recommending UTF-8 mode in the documentation 
> for Windows users. It seems like it would fit well under the "Configuring 
> Python" section of the page 
> (https://docs.python.org/3/using/windows.html#configuring-python).
>
> I'm undecided on the others, as I don't know what (2) and (3) would 
> specifically entail.

Current header is:

  Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC
v.1916 64 bit (AMD64)] on win32
  Type "help", "copyright", "credits" or "license" for more information.

I'm proposing adding one more line:

  UTF-8 mode is disabled.  (See https://url.to/utf8mode)


> As far as (4) goes:
>
> > If setting "PYTHONUTF8=1" environment variable is too danger
> > to recommend widely, we may be able to add per-installation
> > (and per-venv if needed) option file (site.cfg in the directory same to
> > python.exe) to enable UTF-8 mode.
>
> Would you mind elaborating on this point? In particular, what specific 
> dangers/risks might be associated with setting that specific env var during 
> installation? IIRC, the installer already configures a few others by default 
> (I don't recall their names).
>

If the Python installer set PYTHONUTF8=1 environment variable,
it may affects applications using embeddable Python or py2exe.

So it may break applications which assume the default text
encoding is "mbcs".


> > But it may make Python startup process more complex...
>
> I would definitely prefer to have a checkbox to configure "PYTHONUTF8" during 
> installation rather than requiring it to be done manually, assuming it can be 
> done safely and effectively across different systems. Not only for the sake 
> of lowering complexity, but also because it takes less time and effort. That 
> can add up significantly when you consider the volume of users.
>

Even when we add per-installation config file, we can add it from
the Python installer.


--
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YCPTZJGHA3TMRPEU7JWQE3PUKICSU3HR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Recommend UTF-8 mode on Windows

2020-01-10 Thread Inada Naoki
Hi, all.

I believe UTF-8 should be chosen by default for text encoding.

* The default encoding for Python source file is UTF-8.
* VS Code and even notepad uses UTF-8 by default.
* Text files downloaded from the Internet is probably UTF-8.
* UTF-8 is used when you use WSL regardless your system code page.
* Windows 10 (1903) adds per-process option to change active code page
to UTF-8 and call the system code page "legacy". [1]

But it is difficult to change the default text encoding of Python
for backward compatibility.  So I want to recommend the UTF-8 mode:

* The default text encoding become UTF-8
* When you need to use legacy ANSI code page, you can
  use "mbcs" codec.
* You can disable it when you need to run Python application
  relying on the legacy system encoding.

But it is not well known yet.  And setting the environment variable
is a bit difficult for people who are learning programming with Python.

So I want to propose this:

1. Recommend it in the official document "Using Python on Windows" [2].
2. Show the UTF-8 mode status in the command line mode header [3] on Windows.
3. Show the link to the UTF-8 mode document in the command line mode header too.
4. Add checkbox to set "PYTHONUTF8=1" environment variable in the installer.

How do you think?

If setting "PYTHONUTF8=1" environment variable is too danger
to recommend widely, we may be able to add per-installation
(and per-venv if needed) option file (site.cfg in the directory same to
python.exe) to enable UTF-8 mode.
But it may make Python startup process more complex...

Regards,

[1]: 
https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page
[2]: https://docs.python.org/3/using/windows.html
[3]: Currently, Python version and "Type "help",..." are printed.

--
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/524JQFZ4RMRLU7DBPSQDR735Z2UMSWQG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Moving PEP 584 forward (dict + and += operators)

2019-12-03 Thread Inada Naoki
I think the set operation of dict_keys accepts any iterable by accident.

There is an issue for it: https://bugs.python.org/issue38538


On Wed, Dec 4, 2019 at 4:32 PM Serhiy Storchaka  wrote:
>
> 03.12.19 21:04, Andrew Barnert via Python-ideas пише:
> > On Dec 3, 2019, at 02:00, Serhiy Storchaka  wrote:
> >> What it will return if implement | for dicts? It should be mentioned in 
> >> the PEP. It should be tested with a preliminary implementation what 
> >> behavior is possible and more natural.
> >
> > What is there to document or test here? There’s no dicts involved in either 
> > operator, only a set and a key view, both of which are set types and 
> > implement set union.
>
> Oh, sorry, it was a wrong example. Here is the right one:
>
>  >>> {(1, 2): 3}.keys() | {4: 5}
> {(1, 2), 4}
>
>
>
>  >>> {4: 5} | {(1, 2): 3}.keys()
> {(1, 2), 4}
>
>
>
>
> How the results will change after implementing PEP 584? It all should be
> considered in the PEP. Note that dictkeys.__or__ does not return
> NotImplemented (there is an issue for this), so the PEP can require more
> wider changes than just adding __or__, __ror__ and __ior__ to dict.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/OYUGN5FOIQGEHS5XJFGHNLQO4AUJMUON/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ETRZMH6T4JTMNFCTXO6Q4S5ZCIB2CKK4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Sets for easy interning(?)

2019-12-02 Thread Inada Naoki
FWIW, you can do it with dict already.

o = memo.setdefault(o, o)

On Tue, Dec 3, 2019 at 9:29 AM Soni L.  wrote:
>
> This is an odd request but it'd be nice if, given a set s = {"foo"},
> s["foo"] returned the "foo" object that is actually in the set, or
> KeyError if the object is not present.
>
> Even use-cases where you have different objects whose differences are
> ignored for __eq__ and __hash__ and you want to grab the one from the
> set ignoring their differences would benefit from this.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/T3Z32DEMWK46EBPULYB4CVI2QF4FS3WJ/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Q2EPKWPSVG55A3CKVCLJGRX6SPKKSIEE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 584: Add + and += operators to the built-in dict class.

2019-10-24 Thread Inada Naoki
On Thu, Oct 24, 2019 at 1:20 AM Christopher Barker  wrote:
>
> On Wed, Oct 23, 2019 at 5:42 AM Rhodri James  wrote:
>>
> frankly, the | is obscure to most of us. And it started as "bitwise or", and 
> evokes the __or__ magic method -- so why are we all convinced that somehow 
> it's inextricably linked to "set union"?

It is because "bitwise or" is very similar to "set union".

You can regard integer as bitset (set of bits).
5 is {bit 1, bit 3} and 6 is {bit 2, bit 3}.  So 5 | 4 is 7 {bit 1,
bit 2, bit 3}.

So reusing | to set union is very natural to me.

But if we use + for dict merging, I think we should add + to set too.
Then the set has `.union()`, `|` and `+` for the same behavior.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WWCQBMRZGLAKBK5M4LNZLG3CBURHR3FK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 584: Add + and += operators to the built-in dict class.

2019-10-20 Thread Inada Naoki
On Mon, Oct 21, 2019 at 7:07 AM Guido van Rossum  wrote:
>
> So the choice is really only three way.
>
> So if we want to cater to what most beginners will know, + and += would be 
> the best choice. But if we want to be more future-proof and consistent, | and 
> |= are best -- after all dicts are closer to sets (both are hash tables) than 
> to lists. (I know you can argue that dicts are closer to lists because both 
> support __getitem__ -- but I find that similarity shallower than the hash 
> table nature.)
>
> In the end I'm +0.5 on | and |=, +0 on + and +=, and -0 on doing nothing.
>

If we choose `+`, `+` is now "merging two containers",
not just "concatenate two sequences".
So it looks very inconsistent that set uses `|` instead of `+`.
This inconsistency looks very ugly to me.

How do you feel about this?
I think we should add + to set too.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AB7O4FM5RMCJ5HNYGXWDOLFFNCZY3JSL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 584: Add + and += operators to the built-in dict class.

2019-10-19 Thread Inada Naoki
https://www.python.org/dev/peps/pep-0584/#use-a-merged-method-instead-of-an-operator

In this section, unbound method form is introduced first.

But unbound method is just an option.  It is not so important.
The method form is the key part of this proposal.

So please introduce (bound) method first, or even remove
unbound method from the PEP.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3ZOXL47CW7J6F434KGO556WJZ5OTE5TP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 584: Add + and += operators to the built-in dict class.

2019-10-19 Thread Inada Naoki
I think this PEP doesn't include one big disadvantage of the + operator.

If we use + for dict merging, set doesn't support + looks strange
and inconsistent.

And if we add + to set too, set has three way (method, |, and +) to do merging.

Then, all builtin container classes support +.  It looks + is the common
way to merge two containers in some way.  Shouldn't we add it to abc?

So I think we shouldn't focus just adding `+` to dict.  It comes with huge
side effect.  We should think about general API design of the (esp. builtin)
containers.

Regards,
-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5UWXVBMPRCVA36VAAIFG3GGSCVBR4WLA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 584: Add + and += operators to the built-in dict class.

2019-10-17 Thread Inada Naoki
I think this PEP is very relating to language design philosophy.

(a) Overload operator heavily for convenience.

(b) Prefer methods over operators.  Set a high bar for
overloading operators on core types.

I prefer (b) philosophy.  And I don't think described usefulness
is enough for adding the operator.

I know this is a subjective opinion, but I'm -1 on this PEP.

Regards,

On Thu, Oct 17, 2019 at 2:37 PM Brandt Bucher  wrote:
>
> At long last, Steven D'Aprano and I have pushed a second draft of PEP 584 
> (dictionary addition):
>
> https://www.python.org/dev/peps/pep-0584/
>
> The accompanying reference implementation is on GitHub:
>
> https://github.com/brandtbucher/cpython/tree/addiction
>
> This new draft incorporates much of the feedback that we received during the 
> first round of debate here on python-ideas. Most notably, the difference 
> operators (-/-=) have been dropped from the proposal, and the implementations 
> have been updated to use "new = self.copy(); new.update(other)" semantics, 
> rather than "new = type(self)(); new.update(self); new.update(other)" as 
> proposed before. It also includes more background information and summaries 
> of major objections (with rebuttals).
>
> Please let us know what you think – we'd love to hear any *new* feedback that 
> hasn't yet been addressed in the PEP or the related discussions it links to! 
> We plan on updating the PEP at least once more before review.
>
> Thanks!
>
> Brandt
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/W2FCSC3JDA7NUBXAVSTVCUDEGAKWWPTH/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Inada Naoki  
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YXEZZMEMOLHNTFZ25GUZ7ZMNZQ5CKTR3/
Code of Conduct: http://python.org/psf/codeofconduct/


  1   2   >