from:"Chris Angelico"

Re: on str.format and f-strings

2022-09-06 Thread Chris Angelico

On Wed, 7 Sept 2022 at 03:52, Meredith Montgomery  wrote:
>
> It seems to me that str.format is not completely made obsolete by the
> f-strings that appeared in Python 3.6.  But I'm not thinking that this
> was the objective of the introduction of f-strings: the PEP at
>
>   https://peps.python.org/pep-0498/#id11
>
> says so explicitly.

Precisely. It was never meant to obsolete str.format, and it does not.

> My question is whether f-strings can do the
> following nice thing with dictionaries that str.format can do:
>
> --8<---cut here---start->8---
> def f():
>   d = { "name": "Meredith", "email": "mmontgom...@levado.to" }
>   return "The name is {name} and the email is {email}".format(**d)
> --8<---cut here---end--->8---
>
> Is there a way to do this with f-strings?

No. That's not their job. That's str.format's job.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Local variable definition in Python list comprehension

2022-09-01 Thread Chris Angelico

On Fri, 2 Sept 2022 at 06:55, James Tsai  wrote:
>
> 在 2022年9月1日星期四 UTC+2 18:34:36， 写道：
> > On 9/1/22, James Tsai  wrote:
> > >
> > > I find it very useful if I am allowed to define new local variables in a
> > > list comprehension. For example, I wish to have something like
> > > [(x, y) for x in range(10) for y := x ** 2 if x + y < 80], or
> > > [(x, y) for x in range(10) with y := x ** 2 if x + y < 80].
> > >
> > > For now this functionality can be achieved by writing
> > > [(x, y) for x in range(10) for y in [x ** 2] if x + y < 80].
> > You can assign a local variable in the `if` expression. For example:
> >
> > >>> [(x, y) for x in range(10) if x + (y := x**2) < 30]
> > [(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]
>
> Yeah this works great but like [(x, y) for x in range(10) for y in [x**2]] I 
> written before, is kind of a hack. And if initially I do not need an "if" 
> condition in the list comprehension, this becomes less convenient. I still 
> can write
> >>> [(x, y) for x in range(10) if (y := x**2) or True]
>
> But I wonder if Python could have a specific syntax to support this.
>

But why would you need to assign to y in that example? If you're using
it more than once, you can use :=, and if you aren't, you don't need
to. But do be aware that := does not create a comprehension-local name
binding, but a nonlocal instead.

> No but very often when I have written a neat list/dict/set comprehension, I 
> find it very necessary
> to define local variable(s) to make it more clear and concise. Otherwise I 
> have to break it down
> to several incrementally indented lines of for loops, if statements, and 
> variable assignments,
> which I think look less nice.

Well, if it's outgrown a list comp, write it on multiple lines. Like I
said, not everything has to be a one-liner.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Local variable definition in Python list comprehension

2022-09-01 Thread Chris Angelico

On Fri, 2 Sept 2022 at 02:10, James Tsai  wrote:
>
> Hello,
>
> I find it very useful if I am allowed to define new local variables in a list 
> comprehension. For example, I wish to have something like
> [(x, y) for x in range(10) for y := x ** 2 if x + y < 80], or
> [(x, y) for x in range(10) with y := x ** 2 if x + y < 80].
>
> For now this functionality can be achieved by writing
> [(x, y) for x in range(10) for y in [x ** 2] if x + y < 80].
>
> Is it worthwhile to add a new feature like this in Python? If so, how can I 
> propose this to PEP?

Not everything has to be a one-liner.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Running two separate servers (was Re: venv questions)

2022-08-30 Thread Chris Angelico

On Tue, 30 Aug 2022 at 19:51, gene heskett  wrote:
> So I'm thinking of venv's named rock64prusa, and rock64ender5+, each with
> "port#" on my local net. So chromium could have two tabs open, one to
> localhost:5000 and one to localhost:5001, totally independent of each other.
>

As I said, that has absolutely nothing to do with venvs, so you'd have
to figure out how to change their port numbers independently.

(Although you could probably add an env var to the venv's activation
script, if that would help.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Running two separate servers (was Re: venv questions)

2022-08-29 Thread Chris Angelico

On Tue, 30 Aug 2022 at 12:59, gene heskett  wrote:
>
> But that might create another problem. how to differentiate the servers,
> both of which
> will want to use localhost:5000 to serve up their web pages we run
> things with.
>
> Suggested solutions?

This is nothing to do with venvs, so I'm forking the thread.

By far the easiest way to differentiate them is to NOT have them both
on localhost:5000. Depending on how you invoke the servers, you should
be able to find a way to configure one (or both) of them to a
different port; common methods include a "--port" argument, setting
the PORT environment variable, and poking in the code to find the
number 5000 and changing it to some other value.

(Less common methods include poking in ctypes to find the number 5000
and changing it to some other value. Mentioned only because I realise
the alternative interpretation of my previous comment.)

Another method would be to change the "localhost" part. The standard
for IP addresses is that 127.x.y.z means localhost, regardless of what
x, y, and z are; so you could have one of them bind to 127.0.0.2 and
the other to 127.0.0.3, which you could then use in your browser the
same way (http://127.0.0.2:5000/ and http://127.0.0.3:5000/
respectively).

But if you can't change anything else, you'll have to make the two
processes cooperate in some way, or worst case, just make sure you
shut one down before you start the other up.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: venv questions

2022-08-29 Thread Chris Angelico

On Tue, 30 Aug 2022 at 12:59, gene heskett  wrote:
>
> Greetings all;
>
> The command to setup a venv, "python -m venv venv" has no man page that
> I have
> found.
>

$ python3 -m venv --help
usage: venv [-h] [--system-site-packages] [--symlinks | --copies] [--clear]
[--upgrade] [--without-pip] [--prompt PROMPT] [--upgrade-deps]
ENV_DIR [ENV_DIR ...]

Creates virtual Python environments in one or more target directories.

positional arguments:
  ENV_DIR   A directory to create the environment in.

(chomp all the explanation of options)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to make a variable's late binding crosses the module boundary?

2022-08-29 Thread Chris Angelico

On Tue, 30 Aug 2022 at 02:38, Jach Feng  wrote:
>
> Chris Angelico 在 2022年8月29日 星期一下午1:58:58 [UTC+8] 的信中寫道：
> > On Mon, 29 Aug 2022 at 15:54, Jach Feng  wrote:
> > >
> > > Richard Damon 在 2022年8月29日 星期一上午10:47:08 [UTC+8] 的信中寫道：
> > > > On 8/27/22 7:42 AM, Mark Bourne wrote:
> > > > > Jach Feng wrote:
> > > > >> I have two files: test.py and test2.py
> > > > >> --test.py--
> > > > >> x = 2
> > > > >> def foo():
> > > > >> print(x)
> > > > >> foo()
> > > > >>
> > > > >> x = 3
> > > > >> foo()
> > > > >>
> > > > >> --test2.py--
> > > > >> from test import *
> > > > >> x = 4
> > > > >> foo()
> > > > >>
> > > > >> -
> > > > >> Run test.py under Winows8.1, I get the expected result:
> > > > >> e:\MyDocument>py test.py
> > > > >> 2
> > > > >> 3
> > > > >>
> > > > >> But when run test2.py, the result is not my expected 2,3,4:-(
> > > > >> e:\MyDocument>py test2.py
> > > > >> 2
> > > > >> 3
> > > > >> 3
> > > > >>
> > > > >> What to do?
> > > > >
> > > > > `from test import *` does not link the names in `test2` to those in
> > > > > `test`. It just binds objects bound to names in `test` to the same
> > > > > names in `test2`. A bit like doing:
> > > > >
> > > > > import test
> > > > > x = test.x
> > > > > foo = test.foo
> > > > > del test
> > > > >
> > > > > Subsequently assigning a different object to `x` in one module does
> > > > > not affect the object assigned to `x` in the other module. So `x = 4`
> > > > > in `test2.py` does not affect the object assigned to `x` in `test.py`
> > > > > - that's still `3`. If you want to do that, you need to import `test`
> > > > > and assign to `test.x`, for example:
> > > > >
> > > > > import test
> > > > > test.x = 4
> > > > > test.foo()
> > > > >
> > > > Yes, fundamental issue is that the statement
> > > >
> > > > from x import y
> > > >
> > > > makes a binding in this module to the object CURRECTLY bound to x.y to
> > > > the name y, but if x.y gets rebound, this module does not track the 
> > > > changes.
> > > >
> > > > You can mutate the object x.y and see the changes, but not rebind it.
> > > >
> > > > If you need to see rebindings, you can't use the "from x import y" form,
> > > > or at a minimum do it as:
> > > >
> > > >
> > > > import x
> > > >
> > > > from x import y
> > > >
> > > > then later to get rebindings to x.y do a
> > > >
> > > > y = x.y
> > > >
> > > > to rebind to the current x.y object.
> > > >
> > > > --
> > > > Richard Damon
> > > Yes, an extra "import x" will solve my problem too! Sometimes I am 
> > > wondering why "from x import y" hides x? hum...can't figure out the 
> > > reason:-)
> > >
> > "from x import y" doesn't hide x - it just grabs y. Python does what
> > you tell it to. :)
> >
> > ChrisA
> But I had heard people say that "from x import y" did import the whole x 
> module into memory, just as "import x" did, not "grabs y" only. Is this 
> correct?
>

In order to do any sort of import, Python has to run the whole module.
But after that, something gets set in your module so that you can get
access to it.

import x
# is kinda like
go_and_run("x")
x = fetch_module("x")

from x import y
# is kinda like
go_and_run("x")
y = fetch_module("x").y

Either way, the whole module gets run, but then there's an assignment
into your module that depends on what you're importing.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to make a variable's late binding crosses the module boundary?

2022-08-28 Thread Chris Angelico

On Mon, 29 Aug 2022 at 15:54, Jach Feng  wrote:
>
> Richard Damon 在 2022年8月29日 星期一上午10:47:08 [UTC+8] 的信中寫道：
> > On 8/27/22 7:42 AM, Mark Bourne wrote:
> > > Jach Feng wrote:
> > >> I have two files: test.py and test2.py
> > >> --test.py--
> > >> x = 2
> > >> def foo():
> > >>  print(x)
> > >> foo()
> > >>
> > >> x = 3
> > >> foo()
> > >>
> > >> --test2.py--
> > >> from test import *
> > >> x = 4
> > >> foo()
> > >>
> > >> -
> > >> Run test.py under Winows8.1, I get the expected result:
> > >> e:\MyDocument>py test.py
> > >> 2
> > >> 3
> > >>
> > >> But when run test2.py, the result is not my expected 2,3,4:-(
> > >> e:\MyDocument>py test2.py
> > >> 2
> > >> 3
> > >> 3
> > >>
> > >> What to do?
> > >
> > > `from test import *` does not link the names in `test2` to those in
> > > `test`.  It just binds objects bound to names in `test` to the same
> > > names in `test2`.  A bit like doing:
> > >
> > > import test
> > > x = test.x
> > > foo = test.foo
> > > del test
> > >
> > > Subsequently assigning a different object to `x` in one module does
> > > not affect the object assigned to `x` in the other module. So `x = 4`
> > > in `test2.py` does not affect the object assigned to `x` in `test.py`
> > > - that's still `3`.  If you want to do that, you need to import `test`
> > > and assign to `test.x`, for example:
> > >
> > > import test
> > > test.x = 4
> > > test.foo()
> > >
> > Yes, fundamental issue is that the statement
> >
> > from x import y
> >
> > makes a binding in this module to the object CURRECTLY bound to x.y to
> > the name y, but if x.y gets rebound, this module does not track the changes.
> >
> > You can mutate the object x.y and see the changes, but not rebind it.
> >
> > If you need to see rebindings, you can't use the "from x import y" form,
> > or at a minimum do it as:
> >
> >
> > import x
> >
> > from x import y
> >
> > then later to get rebindings to x.y do a
> >
> > y = x.y
> >
> > to rebind to the current x.y object.
> >
> > --
> > Richard Damon
> Yes, an extra "import x" will solve my problem too! Sometimes I am wondering 
> why "from x import y" hides x? hum...can't figure out the reason:-)
>

"from x import y" doesn't hide x - it just grabs y. Python does what
you tell it to. :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What can I do about this?

2022-08-28 Thread Chris Angelico

On Mon, 29 Aug 2022 at 08:41, gene heskett  wrote:
>
> Greatings all;
>
> Persuant to my claim the py3.10 is busted, here is a sample. This is me,
> trying to make
> pronterface, inside a venv: When the package manager version will only
> run the gui-less "pronsole"
> but nothing else from that all python kit runs as it should or at all.
>  From the package-managers install in /usr/share/doc/printrun-common/ I
> copied requirements.txt
> into the venv, and ran this command line:
>
> gene@rock64:~/venv$ pip3 install -r requirements.txt
> Defaulting to user installation because normal site-packages is not
> writeable

I don't think Python 3.10 is busted; it's more likely your venv is not
providing a pip3 command. Try "pip3 --version", "python3 --version",
and then "python3 -m pip install -r requirements.txt".

Why do you keep blaming Python as if it's fundamentally broken?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: subprocess.popen how wait complete open process

2022-08-21 Thread Chris Angelico

On Mon, 22 Aug 2022 at 13:41, Dan Stromberg  wrote:
>
>
>
> On Sun, Aug 21, 2022 at 2:05 PM Chris Angelico  wrote:
>>
>> On Mon, 22 Aug 2022 at 05:39, simone zambonardi
>>  wrote:
>> >
>> > Hi, I am running a program with the punishment subrocess.Popen(...) what I 
>> > should do is to stop the script until the launched program is fully open. 
>> > How can I do this? I used a time.sleep() function but I think there are 
>> > other ways. Thanks
>> >
>>
>> First you have to define "fully open". How would you know?
>
>
> If you're on X11, you could conceivably use:
>  xwininfo -tree -root
>

That's only one possible definition: it has some sort of window. But
to wait until a program is "fully open", you might have to wait past a
splash screen until it has its actual application window. Or maybe
even then, it's not ready for operation. Only the OP can know what
defines "fully open".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Mutating an HTML file with BeautifulSoup

2022-08-21 Thread Chris Angelico

On Mon, 22 Aug 2022 at 10:04, Buck Evan  wrote:
>
> I've had much success doing round trips through the lxml.html parser.
>
> https://lxml.de/lxmlhtml.html
>
> I ditched bs for lxml long ago and never regretted it.
>
> If you find that you have a bunch of invalid html that lxml inadvertently 
> "fixes", I would recommend adding a stutter-step to your project: perform a 
> noop roundtrip thru lxml on all files. I'd then analyze any diff by 
> progressively excluding changes via `grep -vP`.
> Unless I'm mistaken, all such changes should fall into no more than a dozen 
> groups.
>

Will this round-trip mutate every single file and reorder the tag
attributes? Because I really don't want to manually eyeball all those
changes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Mutating an HTML file with BeautifulSoup

2022-08-21 Thread Chris Angelico

On Mon, 22 Aug 2022 at 05:43, Jon Ribbens via Python-list
 wrote:
>
> On 2022-08-21, Chris Angelico  wrote:
> > On Sun, 21 Aug 2022 at 09:31, Jon Ribbens via Python-list
> > wrote:
> >> On 2022-08-20, Chris Angelico  wrote:
> >> > On Sun, 21 Aug 2022 at 03:27, Stefan Ram  wrote:
> >> >> 2qdxy4rzwzuui...@potatochowder.com writes:
> >> >> >textual representations.  That way, the following two elements are the
> >> >> >same (and similar with a collection of sub-elements in a different 
> >> >> >order
> >> >> >in another document):
> >> >>
> >> >>   The /elements/ differ. They have the /same/ infoset.
> >> >
> >> > That's the bit that's hard to prove.
> >> >
> >> >>   The OP could edit the files with regexps to create a new version.
> >> >
> >> > To you and Jon, who also suggested this: how would that be beneficial?
> >> > With Beautiful Soup, I have the line number and position within the
> >> > line where the tag starts; what does a regex give me that I don't have
> >> > that way?
> >>
> >> You mean you could use BeautifulSoup to read the file and identify the
> >> bits you want to change by line number and offset, and then you could
> >> use that data to try and update the file, hoping like hell that your
> >> definition of "line" and "offset" are identical to BeautifulSoup's
> >> and that you don't mess up later changes when you do earlier ones (you
> >> could do them in reverse order of line and offset I suppose) and
> >> probably resorting to regexps anyway in order to find the part of the
> >> tag you want to change ...
> >>
> >> ... or you could avoid all that faff and just do re.sub()?
> >
> > Stefan answered in part, but I'll add that it is far FAR easier to do
> > the analysis with BS4 than regular expressions. I'm not sure what
> > "hoping like hell" is supposed to mean here, since the line and offset
> > have been 100% accurate in my experience;
>
> Given the string:
>
> b"\n \r\r\n\v\n\r\xed\xa0\x80\xed\xbc\x9f\xcc\x80e\xc3\xa8?"
>
> what is the line number and offset of the question mark - and does
> BeautifulSoup agree with your answer? Does the answer to that second
> question change depending on what parser you tell BeautifulSoup to use?

I'm not sure, because I don't know how to ask BS4 about the location
of a question mark. But I replaced that with a tag, and:

>>> raw = b"\n 
>>> \r\r\n\v\n\r\xed\xa0\x80\xed\xbc\x9f\xcc\x80e\xc3\xa8"
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(raw, "html.parser")
>>> soup.body.sourceline
4
>>> soup.body.sourcepos
12
>>> raw.split(b"\n")[3]
b'\r\xed\xa0\x80\xed\xbc\x9f\xcc\x80e\xc3\xa8'
>>> raw.split(b"\n")[3][12:]
b''

So, yes, it seems to be correct. (Slightly odd in that the sourceline
is 1-based but the sourcepos is 0-based, but that is indeed the case,
as confirmed with a much more straight-forward string.)

And yes, it depends on the parser, but I'm using html.parser and it's fine.

> (If your answer is "if the input contains \xed\xa0\x80\xed\xbc\x9f then
> I am happy with the program throwing an exception" then feel free to
> remove that substring from the question.)

Malformed UTF-8 doesn't seem to be a problem. Every file here seems to
be either UTF-8 or ISO-8859, and in the latter case, I'm assuming
8859-1. So I would probably just let this one go through as 8859-1.

> > the only part I'm unsure about is where the _end_ of the tag is (and
> > maybe there's a way I can use BS4 again to get that??).
>
> There doesn't seem to be. More to the point, there doesn't seem to be
> a way to find out where the *attributes* are, so as I said you'll most
> likely end up using regexps anyway.

I'm okay with replacing an entire tag that needs to be changed.
Especially if I can replace just the opening tag, not the contents and
closing tag. And in fact, I may just do that part by scanning for an
unencoded greater-than, on the assumptions that (a) BS4 will correctly
encode any greater-thans in attributes, and (b) if there's a
mis-encoded one in the input, the diff will be small enough to
eyeball, and a human should easily notice that the text has been
massively expanded and duplicated.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: subprocess.popen how wait complete open process

2022-08-21 Thread Chris Angelico

On Mon, 22 Aug 2022 at 05:39, simone zambonardi
 wrote:
>
> Hi, I am running a program with the punishment subrocess.Popen(...) what I 
> should do is to stop the script until the launched program is fully open. How 
> can I do this? I used a time.sleep() function but I think there are other 
> ways. Thanks
>

First you have to define "fully open". How would you know?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Mutating an HTML file with BeautifulSoup

2022-08-21 Thread Chris Angelico

On Sun, 21 Aug 2022 at 17:26, Barry  wrote:
>
>
>
> > On 19 Aug 2022, at 22:04, Chris Angelico  wrote:
> >
> > On Sat, 20 Aug 2022 at 05:12, Barry  wrote:
> >>
> >>
> >>
> >>>> On 19 Aug 2022, at 19:33, Chris Angelico  wrote:
> >>>
> >>> What's the best way to precisely reconstruct an HTML file after
> >>> parsing it with BeautifulSoup?
> >>
> >> I recall that in bs4 it parses into an object tree and loses the detail of 
> >> the input.
> >> I recently ported from very old bs to bs4 and hit the same issue.
> >> So no it will not output the same as went in.
> >>
> >> If you can trust the input to be parsed as xml, meaning all the rules of 
> >> closing
> >> tags have been followed. Then I think you can parse and unparse thru xml to
> >> do what you want.
> >>
> >
> >
> > Yeah, no I can't, this is HTML 4 with a ton of inconsistencies. Oh
> > well. Thanks for trying, anyhow.
> >
> > So I'm left with a few options:
> >
> > 1) Give up on validation, give up on verification, and just run this
> > thing on the production site with my fingers crossed
>
> Can you build a beta site with original intack?

In a naive way, a full copy would be quite a few gigabytes. I could
cut that down a good bit by taking only HTML files and the things they
reference, but then we run into the same problem of broken links,
which is what we're here to solve in the first place.

But I would certainly not want to run two copies of the site and then
manually compare.

> Also wonder if using selenium to walk the site may work as a verification 
> step?
> I cannot recall if you can get an image of the browser window to do image 
> compares with to look for rendering differences.

Image recognition won't necessarily even be valid; some of the changes
will have visual consequences (eg a broken image reference now
becoming correct), and as soon as that happens, the whole document can
reflow.

> From my one task using bs4 I did not see it produce any bad results.
> In my case the problems where in the code that built on bs1 using bad 
> assumptions.

Did that get run on perfect HTML, or on messy real-world stuff that
uses quirks mode?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Mutating an HTML file with BeautifulSoup

2022-08-20 Thread Chris Angelico

On Sun, 21 Aug 2022 at 13:41, dn  wrote:
>
> On 21/08/2022 13.00, Chris Angelico wrote:
> > Well, I don't like headaches, but I do appreciate what the G&S Archive
> > has given me over the years, so I'm taking this on as a means of
> > giving back to the community.
>
> This point will be picked-up in the conclusion. NB in the same way that
> you want to 'give back', so also do others - even if in minor ways or
> 'when-relevant'!

Very true.

> >> In fact, depending upon frequency, making the changes manually (and with
> >> improved confidence in the result).
> >
> > Unfortunately the frequency is very high.
>
> Screechingly so? Like you're singing Three Little Maids?

You don't want to hear me singing that although I do recall once
singing Lady Ella's part at a Qwert, to gales of laughter.

> > Yeah. I do a first pass to enumerate all domains that are ever linked
> > to with http:// URLs, and then I have a script that goes through and
> > checks to see if they redirect me to the same URL on the other
> > protocol, or other ways of checking. So yes, the list of valid domains
> > is part of the program's effective input.
>
> Wow! Having got that far, you have achieved data-validity. Is there a
> need to perform a before-after check or diff?

Yes, to ensure that nothing has changed that I *didn't* plan. The
planned changes aren't the problem here, I can verify those elsewhere.

> Perhaps start making the one-for-one replacements without further
> anxiety. As long as there's no silly-mistake, eg failing to remove an
> opening or closing angle-bracket; isn't that about all the checking needed?
> (for this category of updates)

Maybe, but probably not.

> BTW in talk of "line-number", you will have realised the need to re-run
> the identification of such after each of these steps - in case the 'new
> stuff' relating to earlier steps (assuming above became also a temporal
> sequence) is shorter/longer than the current HTML.

Yep, that's not usually a problem.

> >>> And there'll be other fixes to be done too. So it's a bit complicated,
> >>> and no simple solution is really sufficient. At the very very least, I
> >>> *need* to properly parse with BS4; the only question is whether I
> >>> reconstruct from the parse tree, or go back to the raw file and try to
> >>> edit it there.
> >>
> >> At least the diffs would give you something to work-from, but it's a bit
> >> like git-diffs claiming a 'change' when the only difference is that my
> >> IDE strips blanks from the ends of code-lines, or some-such silliness.
> >
> > Right; and the reconstructed version has a LOT of those unnecessary
> > changes. I'm seeing a lot of changes to whitespace. The only problem
> > is whether I can be confident that none of those changes could ever
> > matter.
>
> "White-space" has lesser-meaning in HTML - this is NOT Python! In HTML
> if I write "HTML  file" (with two spaces), the browser will shorten the
> display to a single space (hence some uses of   - non-broken
> space). Similarly, if attempt to use "\n" to start a new line of text...

Yes, whitespace has less meaning... except when it doesn't.

https://developer.mozilla.org/en-US/docs/Web/CSS/white-space

Text can become preformatted by the styling, and there could be
nothing whatsoever in the HTML page that shows this. I think most of
the HTML files in this site have been created by a WYSIWYG editor,
partly because of clues like a single bold space in a non-bold
sequence of text, and the styles aren't consistent everywhere. Given
that poetry comes up a lot on this site, I wouldn't put it past the
editor to have set a whitespace rule on something.

But I'm probably going to just ignore that and hope that any such
errors are less significant than the current set of broken links.

> Is there a danger of 'chasing your own tail', ie seeking a solution to a
> problem which really doesn't matter (particularly if we add the phrase:
> at the user-level)?

Unfortunately not. I now know of three categories of change that, in
theory, shouldn't affect anything: whitespace, order of attributes
("" becoming ""), and
self-closing tags. Whitespace probably won't matter, until it does.
Order of attributes is absolutely fine unless one of them is
miswritten and now we've lost a lot of information about how it ought
to have been written. And self-closing tags are probably
insignificant, but I don't know how browsers handle things like
"..." - and I wouldn't know wh

Re: Mutating an HTML file with BeautifulSoup

2022-08-20 Thread Chris Angelico

On Sun, 21 Aug 2022 at 09:48, dn  wrote:
>
> On 20/08/2022 12.38, Chris Angelico wrote:
> > On Sat, 20 Aug 2022 at 10:19, dn  wrote:
> >> On 20/08/2022 09.01, Chris Angelico wrote:
> >>> On Sat, 20 Aug 2022 at 05:12, Barry  wrote:
> >>>>> On 19 Aug 2022, at 19:33, Chris Angelico  wrote:
> >>>>>
> >>>>> What's the best way to precisely reconstruct an HTML file after
> >>>>> parsing it with BeautifulSoup?
> ...
>
> >>> well. Thanks for trying, anyhow.
> >>>
> >>> So I'm left with a few options:
> >>>
> >>> 1) Give up on validation, give up on verification, and just run this
> >>> thing on the production site with my fingers crossed
> >>> 2) Instead of doing an intelligent reconstruction, just str.replace()
> >>> one URL with another within the file
> >>> 3) Split the file into lines, find the Nth line (elem.sourceline) and
> >>> str.replace that line only
> >>> 4) Attempt to use elem.sourceline and elem.sourcepos to find the start
> >>> of the tag, manually find the end, and replace one tag with the
> >>> reconstructed form.
> >>>
> >>> I'm inclined to the first option, honestly. The others just seem like
> >>> hard work, and I became a programmer so I could be lazy...
> >> +1 - but I've noticed that sometimes I have to work quite hard to be
> >> this lazy!
> >
> > Yeah, that's very true...
> >
> >> Am assuming that http -> https is not the only 'change' (if it were,
> >> you'd just do that without BS). How many such changes are planned/need
> >> checking? Care to list them?
>
> This project has many of the same 'smells' as a database-harmonisation
> effort. Particularly one where 'the previous guy' used to use field-X
> for certain data, but his replacement decided that field-Y 'sounded
> better' (or some such user-logic). Arrrggg!
>
> If you like head-aches, and users coming to you with ifs-buts-and-maybes
> AFTER you've 'done stuff', this is your sort of project!

Well, I don't like headaches, but I do appreciate what the G&S Archive
has given me over the years, so I'm taking this on as a means of
giving back to the community.

> > Assumption is correct. The changes are more of the form "find all the
> > problems, add to the list of fixes, try to minimize the ones that need
> > to be done manually". So far, what I have is:
>
> Having taken the trouble to identify this list of improvements and given
> the determination to verify each, consider working through one item at a
> time, rather than in a single pass. This will enable individual logging
> of changes, a manual check of each alteration, and the ability to
> choose/tailor the best tool for that specific task.
>
> In fact, depending upon frequency, making the changes manually (and with
> improved confidence in the result).

Unfortunately the frequency is very high.

> The presence of (or allusion to) the word "some" in this list-items is
> 'the killer'. Automation doesn't like 'some' (cf "all") unless the
> criteria can be clearly and unambiguously defined. Ouch!
>
> (I don't think you need to be told any of this, but hey: dreams are free!)

Right; the criteria are quite well defined, but I omitted the details
for brevity.

> > 1) A bunch of http -> https, but not all of them - only domains where
> > I've confirmed that it's valid
>
> The search-criteria is the list of valid domains, rather than the
> "http/https" which is likely the first focus.

Yeah. I do a first pass to enumerate all domains that are ever linked
to with http:// URLs, and then I have a script that goes through and
checks to see if they redirect me to the same URL on the other
protocol, or other ways of checking. So yes, the list of valid domains
is part of the program's effective input.

> > 2) Some absolute to relative conversions:
> > https://www.gsarchive.net/whowaswho/index.htm should be referred to as
> > /whowaswho/index.htm instead
>
> Similarly, if you have a list of these.

It's more just the pattern "https://www.gsarchive.net/" and
"https://gsarchive.net/", and the corresponding "http://";
URLs, plus a few other malformed versions that are worth correcting
(if ever I find a link to "www.gsarchive.net/", it's almost
certainly missing its protocol).

> > 3) A few outdated URLs for which we know the replacement, eg
> > http://www.cr

Re: Mutating an HTML file with BeautifulSoup

2022-08-20 Thread Chris Angelico

On Sun, 21 Aug 2022 at 09:31, Jon Ribbens via Python-list
 wrote:
>
> On 2022-08-20, Chris Angelico  wrote:
> > On Sun, 21 Aug 2022 at 03:27, Stefan Ram  wrote:
> >> 2qdxy4rzwzuui...@potatochowder.com writes:
> >> >textual representations.  That way, the following two elements are the
> >> >same (and similar with a collection of sub-elements in a different order
> >> >in another document):
> >>
> >>   The /elements/ differ. They have the /same/ infoset.
> >
> > That's the bit that's hard to prove.
> >
> >>   The OP could edit the files with regexps to create a new version.
> >
> > To you and Jon, who also suggested this: how would that be beneficial?
> > With Beautiful Soup, I have the line number and position within the
> > line where the tag starts; what does a regex give me that I don't have
> > that way?
>
> You mean you could use BeautifulSoup to read the file and identify the
> bits you want to change by line number and offset, and then you could
> use that data to try and update the file, hoping like hell that your
> definition of "line" and "offset" are identical to BeautifulSoup's
> and that you don't mess up later changes when you do earlier ones (you
> could do them in reverse order of line and offset I suppose) and
> probably resorting to regexps anyway in order to find the part of the
> tag you want to change ...
>
> ... or you could avoid all that faff and just do re.sub()?

Stefan answered in part, but I'll add that it is far FAR easier to do
the analysis with BS4 than regular expressions. I'm not sure what
"hoping like hell" is supposed to mean here, since the line and offset
have been 100% accurate in my experience; the only part I'm unsure
about is where the _end_ of the tag is (and maybe there's a way I can
use BS4 again to get that??).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Mutating an HTML file with BeautifulSoup

2022-08-20 Thread Chris Angelico

On Sun, 21 Aug 2022 at 03:27, Stefan Ram  wrote:
>
> 2qdxy4rzwzuui...@potatochowder.com writes:
> >textual representations.  That way, the following two elements are the
> >same (and similar with a collection of sub-elements in a different order
> >in another document):
>
>   The /elements/ differ. They have the /same/ infoset.

That's the bit that's hard to prove.

>   The OP could edit the files with regexps to create a new version.

To you and Jon, who also suggested this: how would that be beneficial?
With Beautiful Soup, I have the line number and position within the
line where the tag starts; what does a regex give me that I don't have
that way?

>   Soup := BeautifulSoup.
>
>   Then have Soup read both the new version and the old version.
>
>   Then have Soup also edit the old version read in, the same way as
>   the regexps did and verify that now the old version edited by
>   Soup and the new version created using regexps agree.
>
>   Or just use Soup as a tool to show the diffs for visual inspection
>   by having Soup read both the original version and the version edited
>   with regexps. Now both are normalized by Soup and Soup can show the
>   diffs (such a diff feature might not be a part of Soup, but it should
>   not be too much effort to write one using Soup).
>

But as mentioned, the entire problem *is* the normalization, as I have
no proof that it has had no impact on the rendering of the page.
Comparing two normalized versions is no better than my original option
1, whereby I simply ignore the normalization and write out the
reconstructed content.

It's easy if you know for certain that the page is well-formed. Much
harder if you do not - or, as in some cases, if you know the page is
badly-formed.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread Chris Angelico

On Sat, 20 Aug 2022 at 10:19, dn  wrote:
>
> On 20/08/2022 09.01, Chris Angelico wrote:
> > On Sat, 20 Aug 2022 at 05:12, Barry  wrote:
> >>
> >>
> >>
> >>> On 19 Aug 2022, at 19:33, Chris Angelico  wrote:
> >>>
> >>> What's the best way to precisely reconstruct an HTML file after
> >>> parsing it with BeautifulSoup?
> >>
> >> I recall that in bs4 it parses into an object tree and loses the detail of 
> >> the input.
> >> I recently ported from very old bs to bs4 and hit the same issue.
> >> So no it will not output the same as went in.
> >>
> >> If you can trust the input to be parsed as xml, meaning all the rules of 
> >> closing
> >> tags have been followed. Then I think you can parse and unparse thru xml to
> >> do what you want.
> >>
> >
> >
> > Yeah, no I can't, this is HTML 4 with a ton of inconsistencies. Oh
> > well. Thanks for trying, anyhow.
> >
> > So I'm left with a few options:
> >
> > 1) Give up on validation, give up on verification, and just run this
> > thing on the production site with my fingers crossed
> > 2) Instead of doing an intelligent reconstruction, just str.replace()
> > one URL with another within the file
> > 3) Split the file into lines, find the Nth line (elem.sourceline) and
> > str.replace that line only
> > 4) Attempt to use elem.sourceline and elem.sourcepos to find the start
> > of the tag, manually find the end, and replace one tag with the
> > reconstructed form.
> >
> > I'm inclined to the first option, honestly. The others just seem like
> > hard work, and I became a programmer so I could be lazy...
> +1 - but I've noticed that sometimes I have to work quite hard to be
> this lazy!

Yeah, that's very true...

> Am assuming that http -> https is not the only 'change' (if it were,
> you'd just do that without BS). How many such changes are planned/need
> checking? Care to list them?
>

Assumption is correct. The changes are more of the form "find all the
problems, add to the list of fixes, try to minimize the ones that need
to be done manually". So far, what I have is:

1) A bunch of http -> https, but not all of them - only domains where
I've confirmed that it's valid
2) Some absolute to relative conversions:
https://www.gsarchive.net/whowaswho/index.htm should be referred to as
/whowaswho/index.htm instead
3) A few outdated URLs for which we know the replacement, eg
http://www.cris.com/~oakapple/gasdisc/ to
http://www.gasdisc.oakapplepress.com/ (this one can't go on
HTTPS, which is one reason I can't shortcut that)
4) Some internal broken links where the path is wrong - anything that
resolves to /books/ but can't be found might be better
rewritten as /html/perf_grps/websites/ if the file can be
found there
5) Any external link that yields a permanent redirect should, to save
clientside requests, get replaced by the destination. We have some
Creative Commons badges that have moved to new URLs.

And there'll be other fixes to be done too. So it's a bit complicated,
and no simple solution is really sufficient. At the very very least, I
*need* to properly parse with BS4; the only question is whether I
reconstruct from the parse tree, or go back to the raw file and try to
edit it there.

For the record, I have very long-term plans to migrate parts of the
site to Markdown, which would make a lot of things easier. But for
now, I need to fix the existing problems in the existing HTML files,
without doing gigantic wholesale layout changes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread Chris Angelico

On Sat, 20 Aug 2022 at 10:04, David  wrote:
>
> On Sat, 20 Aug 2022 at 04:31, Chris Angelico  wrote:
>
> > What's the best way to precisely reconstruct an HTML file after
> > parsing it with BeautifulSoup?
>
> > Note two distinct changes: firstly, whitespace has been removed, and
> > secondly, attributes are reordered (I think alphabetically). There are
> > other canonicalizations being done, too.
>
> > I'm trying to make some automated changes to a huge number of HTML
> > files, with minimal diffs so they're easy to validate. That means that
> > spurious changes like these are very much unwanted. Is there a way to
> > get BS4 to reconstruct the original precisely?
>
> On Sat, 20 Aug 2022 at 07:02, Chris Angelico  wrote:
> > On Sat, 20 Aug 2022 at 05:12, Barry  wrote:
>
> > > I recall that in bs4 it parses into an object tree and loses the detail
> > > of the input.  I recently ported from very old bs to bs4 and hit the
> > > same issue.  So no it will not output the same as went in.
>
> > So I'm left with a few options:
>
> > 1) Give up on validation, give up on verification, and just run this
> >thing on the production site with my fingers crossed
>
> > 2) Instead of doing an intelligent reconstruction, just str.replace() one
> >URL with another within the file
>
> > 3) Split the file into lines, find the Nth line (elem.sourceline) and
> >str.replace that line only
>
> > 4) Attempt to use elem.sourceline and elem.sourcepos to find the start of
> >the tag, manually find the end, and replace one tag with the
> >reconstructed form.
>
> > I'm inclined to the first option, honestly. The others just seem like
> > hard work, and I became a programmer so I could be lazy...
>
> Hi, I don't know if you will like this option, but I don't see it on the
> list yet so ...

Hey, all options are welcomed :)

> I'm assuming that the phrase "with minimal diffs so they're easy to
> validate" means being eyeballed by a human.
>
> Have you considered two passes through BS? Do the first pass with no
> modification, so that the intermediate result gets the BS default
> "spurious" changes.
>
> Then do the second pass with the desired changes, so that the human will
> see only the desired changes in the diff.

I'm 100% confident of the actual changes, so that wouldn't really
solve anything. The problem is that, without eyeballing the actual
changes, I can't easily see if there's been something else changed or
broken. This is a scripted change that will affect probably hundreds
of HTML files across a large web site, so making sure I don't break
anything means either (a) minimize the diff so it's clearly correct,
or (b) eyeball the rendered versions of every page - manually - to see
if there were any unintended changes. (There WILL be intended visual
changes, so I can't render the page to bitmap and ensure that it
hasn't changed. This is not React snapshot testing, which IMO is one
of the most useless testing features ever devised. No, actually, that
can't be true, someone MUST have made a worse one.)

Appreciate the suggestion, though!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread Chris Angelico

On Sat, 20 Aug 2022 at 05:12, Barry  wrote:
>
>
>
> > On 19 Aug 2022, at 19:33, Chris Angelico  wrote:
> >
> > What's the best way to precisely reconstruct an HTML file after
> > parsing it with BeautifulSoup?
>
> I recall that in bs4 it parses into an object tree and loses the detail of 
> the input.
> I recently ported from very old bs to bs4 and hit the same issue.
> So no it will not output the same as went in.
>
> If you can trust the input to be parsed as xml, meaning all the rules of 
> closing
> tags have been followed. Then I think you can parse and unparse thru xml to
> do what you want.
>

Yeah, no I can't, this is HTML 4 with a ton of inconsistencies. Oh
well. Thanks for trying, anyhow.

So I'm left with a few options:

1) Give up on validation, give up on verification, and just run this
thing on the production site with my fingers crossed
2) Instead of doing an intelligent reconstruction, just str.replace()
one URL with another within the file
3) Split the file into lines, find the Nth line (elem.sourceline) and
str.replace that line only
4) Attempt to use elem.sourceline and elem.sourcepos to find the start
of the tag, manually find the end, and replace one tag with the
reconstructed form.

I'm inclined to the first option, honestly. The others just seem like
hard work, and I became a programmer so I could be lazy...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Mutating an HTML file with BeautifulSoup

2022-08-19 Thread Chris Angelico

What's the best way to precisely reconstruct an HTML file after
parsing it with BeautifulSoup?

Using the Alice example from the BS4 docs:

>>> html_doc = """The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and
their names were
http://example.com/elsie"; class="sister" id="link1">Elsie,
http://example.com/lacie"; class="sister" id="link2">Lacie and
http://example.com/tillie"; class="sister" id="link3">Tillie;
and they lived at the bottom of a well.

...
"""
>>> print(soup)
The Dormouse's story

The Dormouse's story
Once upon a time there were three little sisters; and
their names were
http://example.com/elsie"; id="link1">Elsie,
http://example.com/lacie"; id="link2">Lacie and
http://example.com/tillie"; id="link3">Tillie;
and they lived at the bottom of a well.
...

>>>

Note two distinct changes: firstly, whitespace has been removed, and
secondly, attributes are reordered (I think alphabetically). There are
other canonicalizations being done, too.

I'm trying to make some automated changes to a huge number of HTML
files, with minimal diffs so they're easy to validate. That means that
spurious changes like these are very much unwanted. Is there a way to
get BS4 to reconstruct the original precisely?

The mutation itself would be things like finding an anchor tag and
changing its href attribute. Fairly simple changes, but might alter
the length of the file (eg changing "http://example.com/"; into
"https://example.com/";). I'd like to do them intelligently rather than
falling back on element.sourceline and element.sourcepos, but worst
case, that's what I'll have to do (which would be fiddly).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Problem using cx_Freeze > auto-py-to-exe

2022-08-18 Thread Chris Angelico

On Fri, 19 Aug 2022 at 10:07, Grant Edwards  wrote:
>
> On 2022-08-18, Chris Angelico  wrote:
> > On Fri, 19 Aug 2022 at 05:05, Grant Edwards  
> > wrote:
> >> On 2022-08-18, Chris Angelico  wrote:
> >>
> >> > It's one of the frustrations with JSON, since that format doesn't
> >> > allow the trailing comma :)
> >>
> >> Yep, that's a constant, low-level pain for all the C code I deal with
> >> which generates JSON. You'd think after 10+ years of maintaining code
> >> that outputs JSON, I wouldn't trip over that any longer...
> >
> > With some JSON files, I just cheat and define a shim at the end of arrays...
> >
> > https://raw.githubusercontent.com/Rosuav/MustardMine/master/template.json
>
> That's OK if it's strictly internal. Almost all of the JSON data I
> work with is part of published APIs — many of which are defined by
> industry consortiums or corporate-wide "standards".
>

That's an export/import format that I defined, so I mandated (a) that
there's an empty-string key as a signature (on import, it can be
anywhere, but on export, it's that final shim), and (b) all arrays are
allowed to have an empty string at the end, which is ignored on
import. Saves so much trouble.

That particular export format is actually designed as a git-managed
config file as well, which is why the line breaks are done the way
they are (anything on a single line is intended to be added/removed as
a single unit), which is why I definitely don't want the "add a comma
to the previous line" deltas.

"Strictly internal" is a subset of "protocols/standards that you are
in control of". :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UTF-8 and latin1

2022-08-18 Thread Chris Angelico

On Fri, 19 Aug 2022 at 08:15, Tobiah  wrote:
>
> > You configure the web server to send:
> >
> >  Content-Type: text/html; charset=...
> >
> > in the HTTP header when it serves HTML files.
>
> So how does this break down?  When a person enters
> Montréal, Quebéc into a form field, what are they
> doing on the keyboard to make that happen?  As the
> string sits there in the text box, is it latin1, or utf-8
> or something else?  How does the browser know what
> sort of data it has in that text box?
>

As it sits there in the text box, it is *a text string*.

When it gets sent to the server, the encoding is defined by the
browser (with reference to the server's specifications) and identified
in a request header.

The server should then receive that and interpret it as a text string.

Encodings should ONLY be relevant when data is stored in files or
transmitted across a network etc, and the rest of the time, just think
in Unicode.

Also - migrate to Python 3, your life will become a lot easier.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Problem using cx_Freeze > auto-py-to-exe

2022-08-18 Thread Chris Angelico

On Fri, 19 Aug 2022 at 05:05, Grant Edwards  wrote:
>
> On 2022-08-18, Chris Angelico  wrote:
> > On Fri, 19 Aug 2022 at 04:19, David at Booomer  wrote:
> >
> >> The trailing , does make commenting out arguments easier but
> >> unexpected coming from ‘older’ languages. ;-)
> >
> > It's one of the frustrations with JSON, since that format doesn't
> > allow the trailing comma :)
>
> Yep, that's a constant, low-level pain for all the C code I deal with
> which generates JSON. You'd think after 10+ years of maintaining code
> that outputs JSON, I wouldn't trip over that any longer...
>

With some JSON files, I just cheat and define a shim at the end of arrays...

https://raw.githubusercontent.com/Rosuav/MustardMine/master/template.json

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Problem using cx_Freeze > auto-py-to-exe

2022-08-18 Thread Chris Angelico

On Fri, 19 Aug 2022 at 04:19, David at Booomer  wrote:
> > This is really common in modern programming languages (read: programming
> > languages younger than 30 years or so), because it makes it much more
> > convenient to extend/shorten/reorder a list. Otherwise you alway have to
> > remember add or remove a comma in the right place. (Some people
> > (especially SQL programmers for some reason) resorted to put the comma
> > at the start of each line to get around this, which is really ugly.)
> >
> >hp
>
> The trailing , does make commenting out arguments easier but unexpected 
> coming from ‘older’ languages. ;-)
>

It's one of the frustrations with JSON, since that format doesn't
allow the trailing comma :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: setup.py + cython == chicken and the egg problem

2022-08-16 Thread Chris Angelico

On Wed, 17 Aug 2022 at 07:05, Dan Stromberg  wrote:
>
> Hi folks.
>
> I'm attempting to package up a python package that uses Cython.
>
> Rather than build binaries for everything under the sun, I've been focusing
> on including the .pyx file and running cython on it at install time.  This
> requires a C compiler, but I'm OK with that.
>

Is keeping the cythonized file an option? That would still require a C
compiler, but wouldn't require Cython.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Trying to understand nested loops

2022-08-06 Thread Chris Angelico

On Sat, 6 Aug 2022 at 22:39, Richard Damon  wrote:
>
> On 8/6/22 8:12 AM, Chris Angelico wrote:
> > On Sat, 6 Aug 2022 at 22:08, Richard Damon  wrote:
> >> On 8/6/22 12:01 AM, Chris Angelico wrote:
> >>> On Sat, 6 Aug 2022 at 13:54, Dan Stromberg  wrote:
> >>>> On Fri, Aug 5, 2022 at 12:54 PM Grant Edwards 
> >>>> wrote:
> >>>>
> >>>>> In C, this doesn't do what it looks like it's supposed to do.
> >>>>>
> >>>>>  if (foo)
> >>>>>do_this();
> >>>>>and_this();
> >>>>>  then_do_this();
> >>>>>
> >>>> It's been quite a while since I used C, but with the right compiler
> >>>> flag(s), I think this may be a thing of the past when compiling with gcc:
> >>>> https://developers.redhat.com/blog/2016/02/26/gcc-6-wmisleading-indentation-vs-goto-fail
> >>> Ah yes, because compiler warnings are always viewed and acted upon.
> >>>
> >>> Have you ever watched the compilation of a large open-source project,
> >>> done using the project's own build system and therefore the team's
> >>> preferred warning settings? It's normal to have such a spew of
> >>> warnings that you can't find anything interesting, or to have new
> >>> warnings in new versions of GCC be utterly useless for the same
> >>> reason.
> >>>
> >>> ChrisA
> >> You make it so you HAVE to fix the warning by adding the option to make
> >> warnings into errors.
> >>
> >> This does mean that you need to fix all the warnings that don't actually
> >> mean anything,
> >>
> >> Good code shouldn't generate many warnings, either you have warnings
> >> enabled that you don't care about, or your code is doing things you have
> >> told the complier you shouldn't do.
> >>
> > I say again: have you ever watched the compilation of a large
> > open-source project? You cannot turn warnings into errors, because
> > there are ALWAYS warnings. Maybe, once upon a time, the policy was to
> > ensure that there were no warnings on any major compiler; but times
> > change, compilers add new warnings, new compilers join the club, and
> > it becomes practically impossible to prevent warnings. Which, in turn,
> > makes all warnings basically meaningless.
> >
> > Hmm. I don't think I've ever compiled gcc from source. Maybe I should
> > do that, just to see whether gcc itself compiles with no warnings
> > under gcc.
> >
> > ChrisA
>
> And for any project, that is a choice THEY made.

Indeed. So you can't really say "good code shouldn't generate many
warnings" unless (a) you're saying that lots of projects are made up
of bad code, or (b) your statement that this is "a thing of the past"
is flat-out false, because it can only be valid if you assume that
everyone has that warning enabled, and preferably set to be an error.

So, for the vast majority of projects out there, indentation errors
are going to continue to go uncaught by C compilers. It's not "a thing
of the past" until most projects use the flag, and preferably, the
flag becomes active by default.

And for the record, I have seen spurious warnings from *that exact
flag* in a large project (an image parsing library). Spurious in that
the code was actually correct, despite the compiler warning about it.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Trying to understand nested loops

2022-08-06 Thread Chris Angelico

On Sat, 6 Aug 2022 at 22:08, Richard Damon  wrote:
>
> On 8/6/22 12:01 AM, Chris Angelico wrote:
> > On Sat, 6 Aug 2022 at 13:54, Dan Stromberg  wrote:
> >> On Fri, Aug 5, 2022 at 12:54 PM Grant Edwards 
> >> wrote:
> >>
> >>> In C, this doesn't do what it looks like it's supposed to do.
> >>>
> >>> if (foo)
> >>>   do_this();
> >>>   and_this();
> >>> then_do_this();
> >>>
> >> It's been quite a while since I used C, but with the right compiler
> >> flag(s), I think this may be a thing of the past when compiling with gcc:
> >> https://developers.redhat.com/blog/2016/02/26/gcc-6-wmisleading-indentation-vs-goto-fail
> > Ah yes, because compiler warnings are always viewed and acted upon.
> >
> > Have you ever watched the compilation of a large open-source project,
> > done using the project's own build system and therefore the team's
> > preferred warning settings? It's normal to have such a spew of
> > warnings that you can't find anything interesting, or to have new
> > warnings in new versions of GCC be utterly useless for the same
> > reason.
> >
> > ChrisA
>
> You make it so you HAVE to fix the warning by adding the option to make
> warnings into errors.
>
> This does mean that you need to fix all the warnings that don't actually
> mean anything,
>
> Good code shouldn't generate many warnings, either you have warnings
> enabled that you don't care about, or your code is doing things you have
> told the complier you shouldn't do.
>

I say again: have you ever watched the compilation of a large
open-source project? You cannot turn warnings into errors, because
there are ALWAYS warnings. Maybe, once upon a time, the policy was to
ensure that there were no warnings on any major compiler; but times
change, compilers add new warnings, new compilers join the club, and
it becomes practically impossible to prevent warnings. Which, in turn,
makes all warnings basically meaningless.

Hmm. I don't think I've ever compiled gcc from source. Maybe I should
do that, just to see whether gcc itself compiles with no warnings
under gcc.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Trying to understand nested loops

2022-08-05 Thread Chris Angelico

On Sat, 6 Aug 2022 at 13:54, Dan Stromberg  wrote:
>
> On Fri, Aug 5, 2022 at 12:54 PM Grant Edwards 
> wrote:
>
> > In C, this doesn't do what it looks like it's supposed to do.
> >
> >if (foo)
> >  do_this();
> >  and_this();
> >then_do_this();
> >
> It's been quite a while since I used C, but with the right compiler
> flag(s), I think this may be a thing of the past when compiling with gcc:
> https://developers.redhat.com/blog/2016/02/26/gcc-6-wmisleading-indentation-vs-goto-fail

Ah yes, because compiler warnings are always viewed and acted upon.

Have you ever watched the compilation of a large open-source project,
done using the project's own build system and therefore the team's
preferred warning settings? It's normal to have such a spew of
warnings that you can't find anything interesting, or to have new
warnings in new versions of GCC be utterly useless for the same
reason.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Dictionary order?

2022-08-01 Thread Chris Angelico

On Tue, 2 Aug 2022 at 07:48, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>
> On 2022-08-01 at 13:41:11 -0700,
> Dan Stromberg  wrote:
>
> > keys = [5, 10, 15, 14, 9, 4, 1, 2, 8, 6, 7, 12, 11]
> >
> > dict_ = {}
> > for key in keys:
> > dict_[key] = 1
>
> $ python
> Python 3.10.5 (main, Jun  6 2022, 18:49:26) [GCC 12.1.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> [hash(x) for x in range(20)]
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>
> Just sayin'.  :-)

Yes, but I'm pretty sure that's been true for a LONG time. The hashes
for small integers have been themselves for as long as I can remember.
But the behaviour of the dictionary, when fed such keys, is what's
changed.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Dictionary order?

2022-08-01 Thread Chris Angelico

On Tue, 2 Aug 2022 at 06:50, Skip Montanaro  wrote:
>
> >
> > So I decided to write a little test program to run on a variety of
> > CPythons, to confirm what I was thinking.
> >
> > And instead I got a surprise.
> >
> > On 1.4 through 2.1 I got descending key order.  I expected the keys to be
> > scattered, but they weren't.
> >
> > On 2.2 through 3.5 I got ascending key order.  I expected the keys to be
> > scattered, but they weren't.
> >
> > On 3.6 through 3.10 I got insertion order, as expected.
> >
> > But why are 1.4 through 3.5 ordering so much?
> >
>
> That's long in the past, but I seem to recall that key order was
> unspecified. That would give the implementer (likely Tim Peters much of the
> time) the freedom to do whatever worked best for performance or simplicity
> of implementation.
>

One thing that you might notice also is that using strings as keys
will begin randomizing them with Python 3.3. But other than strings,
it's always been "arbitrary" rather than "random".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: PEP about recommended project folder layout

2022-07-31 Thread Chris Angelico

On Sun, 31 Jul 2022 at 20:27, Weatherby,Gerard  wrote:
>
> I’m not aware of any standard convention for laying out packages.
>
> PEP 8 (https://peps.python.org/pep-0008/) specifies conventions for how to 
> write Python, so a standard layout PEP would not be inconsistent.
>

PEP 8 species rules for laying out the code of the Python standard
library. Its adoption by other projects does not constitute the Python
developers declaring that it's a convention for how to write all
Python code.

A better example would be PEP 257 https://peps.python.org/pep-0257/
but even that is more for the purpose of tooling. It does at least try
to describe usage conventions, though.

Conventions for laying out packages (as opposed to actual requirements
defined by the packaging system itself) would be better described
somewhere other than a PEP.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Simple TCP proxy

2022-07-29 Thread Chris Angelico

On Sat, 30 Jul 2022 at 04:54, Morten W. Petersen  wrote:
>
> OK.
>
> Well, I've worked with web hosting in the past, and proxies like squid were 
> used to lessen the load on dynamic backends.  There was also a website 
> opensourcearticles.com that we had with Firefox, Thunderbird articles etc. 
> that got quite a bit of traffic.
>
> IIRC, that website was mostly static with some dynamic bits and heavily 
> cached by squid.

Yep, and squid almost certainly won't have a thread for every incoming
connection, spinning and waiting for the back end server. But squid
does a LOT more than simply queue connections - it'll be inspecting
headers and retaining a cache of static content, so it's not really
comparable.

> Most websites don't get a lot of traffic though, and don't have a big budget 
> for "website system administration".  So maybe that's where I'm partly going 
> with this, just making a proxy that can be put in front and deal with a lot 
> of common situations, in a reasonably good way.
>
> If I run into problems with threads that can't be managed, then a switch to 
> something like the queue_manager function which has data and then functions 
> that manage the data and connections is an option.
>

I'll be quite frank with you: this is not production-quality code. It
should not be deployed by anyone who doesn't have a big budget for
"website system administration *training*". This code is good as a
tool for YOU to learn how these things work; it shouldn't be a tool
for anyone who actually has server load issues.

I'm sorry if that sounds harsh, but the fact is, you can do a lot
better by using this to learn more about networking than you'll ever
do by trying to pitch it to any specific company.

That said though: it's still good to know what your (theoretical)
use-case is. That'll tell you what kinds of connection spam to throw
at your proxy (lots of idle sockets? lots of HTTP requests? billions
of half open TCP connections?) to see what it can cope with.

Keep on playing with this code. There's a lot you can gain from it, still.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Simple TCP proxy

2022-07-28 Thread Chris Angelico

On Fri, 29 Jul 2022 at 11:42, Andrew MacIntyre  wrote:
>
> On 29/07/2022 8:08 am, Chris Angelico wrote:
> > It takes a bit of time to start ten thousand threads, but after that,
> > the system is completely idle again until I notify them all and they
> > shut down.
> >
> > (Interestingly, it takes four times as long to start 20,000 threads,
> > suggesting that something in thread spawning has O(n²) cost. Still,
> > even that leaves the system completely idle once it's done spawning
> > them.)
>
> Another cost of threads can be memory allocated as thread stack space,
> the default size of which varies by OS (see e.g.
> https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/).
>
> threading.stack_size() can be used to check and perhaps adjust the
> allocation size.
>

Yeah, they do have quite a few costs, and a naive approach of "give a
thread to every client", while very convenient, will end up limiting
throughput. (But I'll be honest: I still have a server that's built on
exactly that model, because it's much much safer than risking one
client stalling out the whole server due to a small bug. But that's a
MUD server.) Thing is, though, it'll most likely limit throughput to
something in the order of thousands of concurrent connections (or
thousands per second if it's something like HTTP where they tend to
get closed again), maybe tens of thousands. So if you have something
where every thread needs its own database connection, well, you're
gonna have database throughput problems WAY before you actually run
into thread count limitations!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Simple TCP proxy

2022-07-28 Thread Chris Angelico

On Fri, 29 Jul 2022 at 07:24, Morten W. Petersen  wrote:
>
> Forwarding to the list as well.
>
> -- Forwarded message -
> From: Morten W. Petersen 
> Date: Thu, Jul 28, 2022 at 11:22 PM
> Subject: Re: Simple TCP proxy
> To: Chris Angelico 
>
>
> Well, an increase from 0.1 seconds to 0.2 seconds on "polling" in each
> thread whether or not the connection should become active doesn't seem like
> a big deal.

Maybe, but polling *at all* is the problem here. It shouldn't be
hammering the other server. You'll quickly find that there are limits
that simply shouldn't exist, because every connection is trying to
check to see if it's active now. This is *completely unnecessary*.
I'll reiterate the advice given earlier in this thread (of
conversation): Look into the tools available for thread (of execution)
synchronization, such as mutexes (in Python, threading.Lock) and
events (in Python, threading.Condition). A poll interval enforces a
delay before the thread notices that it's active, AND causes inactive
threads to consume CPU, neither of which is a good thing.

> And there's also some point where it is pointless to accept more
> connections, and where maybe remedies like accepting known good IPs,
> blocking IPs / IP blocks with more than 3 connections etc. should be
> considered.

Firewalling is its own science. Blocking IPs with too many
simultaneous connections should be decided administratively, not
because your proxy can't handle enough connections.

> I think I'll be getting closer than most applications to an eventual
> ceiling for what Python can handle of threads, and that's interesting and
> could be beneficial for Python as well.

Here's a quick demo of the cost of threads when they're all blocked on
something.

>>> import threading
>>> finish = threading.Condition()
>>> def thrd(cond):
... with cond: cond.wait()
...
>>> threading.active_count() # Main thread only
1
>>> import time
>>> def spawn(n):
... start = time.monotonic()
... for _ in range(n):
... t = threading.Thread(target=thrd, args=(finish,))
... t.start()
... print("Spawned", n, "threads in", time.monotonic() - start, "seconds")
...
>>> spawn(1)
Spawned 1 threads in 7.548425202025101 seconds
>>> threading.active_count()
10001
>>> with finish: finish.notify_all()
...
>>> threading.active_count()
1

It takes a bit of time to start ten thousand threads, but after that,
the system is completely idle again until I notify them all and they
shut down.

(Interestingly, it takes four times as long to start 20,000 threads,
suggesting that something in thread spawning has O(n²) cost. Still,
even that leaves the system completely idle once it's done spawning
them.)

If your proxy can handle 20,000 threads, I would be astonished. And
this isn't even close to a thread limit.

Obviously the cost is different if the threads are all doing things,
but if you have thousands of active socket connections, you'll start
finding that there are limitations in quite a few places, depending on
how much traffic is going through them. Ultimately, yes, you will find
that threads restrict you and asynchronous I/O is the only option; but
you can take threads a fairly long way before they are the limiting
factor.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Simple TCP proxy

2022-07-28 Thread Chris Angelico

On Thu, 28 Jul 2022 at 21:01, Morten W. Petersen  wrote:
>
> Well, I was thinking of following the socketserver / handle layout of code 
> and execution, for now anyway.
>
> It wouldn't be a big deal to make them block, but another option is to 
> increase the sleep period 100% for every 200 waiting connections while 
> waiting in handle.

Easy denial-of-service attack then. Spam connections and the queue
starts blocking hard. The sleep loop seems like a rather inefficient
way to do things.

> Another thing is that it's nice to see Python handling 500+ threads without 
> problems. :)

Yeah, well, that's not all THAT many threads, ultimately :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Simple TCP proxy

2022-07-28 Thread Chris Angelico

On Thu, 28 Jul 2022 at 19:41, Morten W. Petersen  wrote:
>
> Hi Martin.
>
> I was thinking of doing something with the handle function, but just this
> little tweak:
>
> https://github.com/morphex/stp/commit/9910ca8c80e9d150222b680a4967e53f0457b465
>
> made a huge difference in CPU usage.  Hundreds of waiting sockets are now
> using 20-30% of CPU instead of 10x that.

 wait, what?

Why do waiting sockets consume *any* measurable amount of CPU? Why
don't the threads simply block until it's time to do something?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: random.SystemRandom().randint() inefficient

2022-07-27 Thread Chris Angelico

On Thu, 28 Jul 2022 at 05:36, Cecil Westerhof via Python-list
 wrote:
>
> Roel Schroeven  writes:
>
> > Cecil Westerhof via Python-list schreef op 27/07/2022 om 17:43:
> >> "Michael F. Stemper"  writes:
> >>
> >> > This is orthogonal to your question, but might be of some use to you:
> >> >
> >> > The combination of using len(to_try) as an argument to randint() and
> >> > saving the output to a variable named "index" suggests that you might
> >> > be setting up to select a random element from to_try, as in:
> >> >   something = to_try[index]
> >> >
> >> > If that is the case, you might want to consider using random.choice() 
> >> > instead:
> >> >
> >> >   >>> from random import choice
> >> >   >>> to_try = [2,3,5,7,11,13,"seventeen",19]
> >> >   >>> choice(to_try)
> >> >   2
> >> >   >>> choice(to_try)
> >> >   'seventeen'
> >> >   >>> choice(to_try)
> >> >   13
> >> >   >>> choice(to_try)
> >> >   5
> >> >   >>>
> >>
> >> Yes, I try to select a random element, but it has also to be removed,
> >> because an element should not be used more as once.
> >> This is the code I use:
> >>  # index = randbelow(len(to_try))
> >>  index = randrange(len(to_try))
> >>  found = permutation[to_try.pop(index)]
> > Do you know in advance how many items you'll need, or maybe an upper
> > limit on the amount? In that case it might be more efficient to use
> > random.sample(to_try, k=nr_items_needed).
>
> Something else to try. :-)
> And yes: I will be using half of the list.
>

A perfect job for random.sample() then, if that's *exactly* half the list.

But if you don't know for sure how many you'll need, an easy way to
make it more efficient would be to take the n'th element, then move
the last element down to the n'th position, and finally pop off the
last element. That way, you only ever pop the last element away. The
list will get reordered arbitrarily while you do this, but if the only
purpose of it is to remove elements from random consideration, that
won't be a problem.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Simple TCP proxy

2022-07-27 Thread Chris Angelico

On Thu, 28 Jul 2022 at 04:32, Morten W. Petersen  wrote:
>
> Hi Chris.
>
> You're thinking of the backlog argument of listen?

Yes, precisely.

> Well, STP will accept all connections, but can limit how many of the accepted 
> connections that are active at any given time.
>
> So when I bombed it with hundreds of almost simultaneous connections, all of 
> them were accepted, but only 25 were actively sending and receiving data at 
> any given time. First come, first served.
>

Hmm. Okay. Not sure what the advantage is, but sure.

If the server's capable of handling the total requests-per-minute,
then a queueing system like this should help with burst load, although
I would have thought that the listen backlog would do the same. What
happens if the server actually gets overloaded though? Do connections
get disconnected after appearing connected? What's the disconnect
mode?

BTW, you probably don't want to be using the _thread module - Python
has a threading module which is better suited to this sort of work.
Although you may want to consider asyncio instead, as that has far
lower overhead when working with large numbers of sockets.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Simple TCP proxy

2022-07-27 Thread Chris Angelico

On Thu, 28 Jul 2022 at 02:15, Morten W. Petersen  wrote:
>
> Hi.
>
> I'd like to share with you a recent project, which is a simple TCP proxy
> that can stand in front of a TCP server of some sort, queueing requests and
> then allowing n number of connections to pass through at a time:

How's this different from what the networking subsystem already does?
When you listen, you can set a queue length. Can you elaborate?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: random.SystemRandom().randint() inefficient

2022-07-26 Thread Chris Angelico

On Wed, 27 Jul 2022 at 09:28, Dennis Lee Bieber  wrote:
>
> On Tue, 26 Jul 2022 16:38:38 +0200, Cecil Westerhof 
> declaimed the following:
>
> >I need to get a random integer. At first I tried it with:
> >from secrets import randbelow
> >index = randbelow(len(to_try))
> >
> >This works perfectly, but it took some time. So I thought I try:
> >from random  import SystemRandom
> >index = SystemRandom().randint(0, len(to_try) - 1)
> >
> >A first indication is that the second version would take about two
> >times as much time as the first. Is there a reason for this, or should
> >this not be happening?
>
> Well, off the top of my head...
>
> For one generation of "index" you are first creating an instance of
> SystemRandom(), using it to generate your random integer, and then
> disposing of the instance.
>
> If you only need ONE random integer, the time difference probably
> doesn't matter. OTOH, if you need many during the run, using
>
> sr = SystemRandom()
> #stuff in some loop that generates multiple ints
> index = sr.randint(...)
>
> Hmmm, wonder if there is a speed difference between
> .randint(0, len(to_try) - 1)
> and
> .randint(1, len(to_try)) - 1
>

Probably not significant, since the same amount of arithmetic gets
done either way. But switching to single-arg randrange(len(to_try))
will definitely help, and IMO is clearer as well (since the
implication is selecting one from a group of items).

Incidentally - if you are actually trying to select a specific item,
you may want to consider random.choice.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: random.SystemRandom().randint() inefficient

2022-07-26 Thread Chris Angelico

On Wed, 27 Jul 2022 at 08:18, Cecil Westerhof via Python-list
 wrote:
>
> Chris Angelico  writes:
>
> > On Wed, 27 Jul 2022 at 06:06, Cecil Westerhof via Python-list
> >  wrote:
> >>
> >> Chris Angelico  writes:
> >>
> >> > On Wed, 27 Jul 2022 at 01:06, Cecil Westerhof via Python-list
> >> >  wrote:
> >> >>
> >> >> I need to get a random integer. At first I tried it with:
> >> >> from secrets import randbelow
> >> >> index = randbelow(len(to_try))
> >> >>
> >> >> This works perfectly, but it took some time. So I thought I try:
> >> >> from random  import SystemRandom
> >> >> index = SystemRandom().randint(0, len(to_try) - 1)
> >> >>
> >> >> A first indication is that the second version would take about two
> >> >> times as much time as the first. Is there a reason for this, or should
> >> >> this not be happening?
> >> >>
> >> >
> >> > You're setting up a brand new SystemRandom instance just for a single
> >> > random number. For a fairer comparison, set up the instance, then
> >> > generate far more than just a single number, and see how that goes.
> >>
> >> Thanks. I thought I did something wrong and I did.
> >> I will try to implement like you said and look what the result will
> >> be. (And share it.)
> >
> > Thanks! Don't feel bad; performance testing is *hard*, getting
> > meaningful results takes a lot of of fiddling with parameters, and
> > getting interesting AND meaningful results can sometimes seem about
> > impossible.
> >
> >> (As I understand it both do more, or less the same and should have
> >> comparable performance.)
> >
> > In normal production work? Yes (the SystemRandom object doesn't have
> > any significant state - a seeded RNG could have a lot more overhead
> > here). But for performance testing? The work of instantiating the
> > class could be completely irrelevant, or it could be dominating your
> > results. It's hard to say, hence the suggestion to try it without
> > reinstantiating.
>
> It had a very big influence. Original it took about three times more
> time to run my program. (The program was still running when I posted
> the original post and the difference was higher as I anticipated.)
> Removing that did cut about 45% of the execution time of the program.
> (So the initiation is quit expensive.)
> But it still takes about 50% more time. So I am still a bit
> flabbergasted.
>
> The new code:
> from random  import SystemRandom
> system_random   = SystemRandom()
> index = system_random.randint(0, len(to_try) - 1)
>
> The first two statements are executed once.
> The last statement I think about 75 * 10 ** 6.
>
> So it seems that my first idea of using randbelow was the correct one.
> But if anyone could explain why SystemRandom is so much more
> expensive, I would be interested to know it.
> (Or am I still doing something wrong?)

Hmm. There are still a lot of differences here. Are you able to make
use of randrange() instead, to make them more consistent?

According to the source code, secrets.randbelow is calling on an
internal method _randbelow of the SystemRandom object, but randrange
(if called with only one arg) will go straight into that same method.
Here's my results:

rosuav@sikorsky:~$ python3 -m timeit -s 'from random import randrange'
'randrange(1)'
100 loops, best of 5: 322 nsec per loop
rosuav@sikorsky:~$ python3 -m timeit -s 'from random import
SystemRandom; r = SystemRandom()' 'r.randint(0, 1)'
20 loops, best of 5: 1.92 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s 'from random import
SystemRandom; r = SystemRandom()' 'r.randrange(1)'
20 loops, best of 5: 1.87 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s 'from secrets import
randbelow' 'randbelow(1)'
20 loops, best of 5: 1.64 usec per loop

(The difference with the first one is that it isn't using the system
RNG, so it has the limitations of an internal PRNG.)

When you call randint, what happens is (1) the endpoint is incremented
to transform it from inclusive-inclusive to inclusive-exclusive; (2)
randrange is called with two args; (3) fast path 1 is skipped, fast
path 2 is taken, and _randbelow gets called to get an actual random
number, which gets zero added to it before returning.

If, instead, you use randrange(len(to_try)), what would happen is (1)
fast path 1 is used, and (2) _randbelow is called to get the

Re: random.SystemRandom().randint() inefficient

2022-07-26 Thread Chris Angelico

On Wed, 27 Jul 2022 at 06:06, Cecil Westerhof via Python-list
 wrote:
>
> Chris Angelico  writes:
>
> > On Wed, 27 Jul 2022 at 01:06, Cecil Westerhof via Python-list
> >  wrote:
> >>
> >> I need to get a random integer. At first I tried it with:
> >> from secrets import randbelow
> >> index = randbelow(len(to_try))
> >>
> >> This works perfectly, but it took some time. So I thought I try:
> >> from random  import SystemRandom
> >> index = SystemRandom().randint(0, len(to_try) - 1)
> >>
> >> A first indication is that the second version would take about two
> >> times as much time as the first. Is there a reason for this, or should
> >> this not be happening?
> >>
> >
> > You're setting up a brand new SystemRandom instance just for a single
> > random number. For a fairer comparison, set up the instance, then
> > generate far more than just a single number, and see how that goes.
>
> Thanks. I thought I did something wrong and I did.
> I will try to implement like you said and look what the result will
> be. (And share it.)

Thanks! Don't feel bad; performance testing is *hard*, getting
meaningful results takes a lot of of fiddling with parameters, and
getting interesting AND meaningful results can sometimes seem about
impossible.

> (As I understand it both do more, or less the same and should have
> comparable performance.)

In normal production work? Yes (the SystemRandom object doesn't have
any significant state - a seeded RNG could have a lot more overhead
here). But for performance testing? The work of instantiating the
class could be completely irrelevant, or it could be dominating your
results. It's hard to say, hence the suggestion to try it without
reinstantiating.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: random.SystemRandom().randint() inefficient

2022-07-26 Thread Chris Angelico

On Wed, 27 Jul 2022 at 01:06, Cecil Westerhof via Python-list
 wrote:
>
> I need to get a random integer. At first I tried it with:
> from secrets import randbelow
> index = randbelow(len(to_try))
>
> This works perfectly, but it took some time. So I thought I try:
> from random  import SystemRandom
> index = SystemRandom().randint(0, len(to_try) - 1)
>
> A first indication is that the second version would take about two
> times as much time as the first. Is there a reason for this, or should
> this not be happening?
>

You're setting up a brand new SystemRandom instance just for a single
random number. For a fairer comparison, set up the instance, then
generate far more than just a single number, and see how that goes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: list indices must be integers or slices, not str

2022-07-20 Thread Chris Angelico

On Wed, 20 Jul 2022 at 23:50, Peter Otten <__pete...@web.de> wrote:
>
> I found
>
> https://peps.python.org/pep-3101/
>
> """
> PEP 3101 – Advanced String Formatting
> ...
> An example of the ‘getitem’ syntax:
>
> "My name is {0[name]}".format(dict(name='Fred'))
>
> It should be noted that the use of ‘getitem’ within a format string is
> much more limited than its conventional usage. In the above example, the
> string ‘name’ really is the literal string ‘name’, not a variable named
> ‘name’. The rules for parsing an item key are very simple. If it starts
> with a digit, then it is treated as a number, otherwise it is used as a
> string.
>

Cool. I think this is a good justification for a docs patch, since
that really should be mentioned somewhere other than a historical
document.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: list indices must be integers or slices, not str

2022-07-20 Thread Chris Angelico

On Wed, 20 Jul 2022 at 21:06, Frank Millman  wrote:
> I saw this from Paul Rubin - for some reason his posts appear in google
> groups, but not python-list.
>
> "It seems to only want integer constants. x[2+2] and x[k] where k=2
> don't work either.

Yes, that's for the same reason that x[spam] can be used usefully with
a dictionary. Otherwise you'd need to use quotes. It makes perfect
sense that both 2+2 and k are treated as strings.

> I think the preferred style these days is f'{x[-1]}' which works."

Not true; there's no single "preferred style", and f-strings are
absolutely NOT replacements for everything else. They have their
place, as do the others. Yes, including percent formatting, it is not
deprecated, and it's really tiresome when people claim that it is.

> Unfortunately the 'f' option does not work for me in this case, as I am
> using a string object, not a string literal.

Right. An f-string uses the exact syntax of a Python expression, which
is often too powerful, but also restricts it to the string literal
style (since it's actual code, not a method call). For other purposes,
.format() is a better choice.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: list indices must be integers or slices, not str

2022-07-20 Thread Chris Angelico

On Wed, 20 Jul 2022 at 20:55, Frank Millman  wrote:
>
> On 2022-07-20 11:37 AM, Chris Angelico wrote:
> > On Wed, 20 Jul 2022 at 18:34, Frank Millman  wrote:
> >>
> >> Hi all
> >>
> >> C:\Users\E7280>python
> >> Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64
> >> bit (AMD64)] on win32
> >> Type "help", "copyright", "credits" or "license" for more information.
> >>   >>>
> >>   >>> x = list(range(10))
> >>   >>>
> >>   >>> '{x[1]}'.format(**vars())
> >> '1'
> >>   >>>
> >>   >>> '{x[-1]}'.format(**vars())
> >> Traceback (most recent call last):
> >> File "", line 1, in 
> >> TypeError: list indices must be integers or slices, not str
> >>   >>>
> >>
> >> Can anyone explain this error? It seems that a negative index is deemed
> >> to be a string in this case.
> >>
> >
> > Yeah, that does seem a little odd. What you're seeing is the same as
> > this phenomenon:
> >
> >>>> "{x[1]} {x[spam]}".format(x={1: 42, "spam": "ham"})
> > '42 ham'
> >>>> "{x[1]} {x[spam]}".format(x={"1": 42, "spam": "ham"})
> > Traceback (most recent call last):
> >File "", line 1, in 
> > KeyError: 1
> >
> > But I can't find it documented anywhere that digits-only means
> > numeric. The best I can find is:
> >
> > https://docs.python.org/3/library/string.html#formatstrings
> > """The arg_name can be followed by any number of index or attribute
> > expressions. An expression of the form '.name' selects the named
> > attribute using getattr(), while an expression of the form '[index]'
> > does an index lookup using __getitem__()."""
> >
> > and in the corresponding grammar:
> >
> > field_name::=  arg_name ("." attribute_name | "[" element_index 
> > "]")*
> > index_string  ::=   +
> >
> > In other words, any sequence of characters counts as an argument, as
> > long as it's not ambiguous. It doesn't seem to say that "all digits is
> > interpreted as an integer, everything else is interpreted as a
> > string". ISTM that a negative number should be interpreted as an
> > integer too, but that might be a backward compatibility break.
> >
>
> Thanks for investigating this further. I agree it seems odd.
>
> As quoted above, an expression of the form '[index]' does an index
> lookup using __getitem()__.
>
> The only __getitem__() that I can find is in the operator module, and
> that handles negative numbers just fine.

In general, __getitem__ is the method used to handle those sorts of lookups:

class X:
def __getitem__(self, item):
print("Get item", type(item), item)

"{x[0]} {x[1]} {x[-1]} {x[spam]} {x[1.0]}".format(x=X())

Outside of a format directive, you'd need to quote those:

x[0], x[1], x["spam"]

The distinction is that unquoted bare numbers are interpreted as
integers, not as strings. I'm unable to find the exact definition of
that documented.

> Do you think it is worth me raising an issue, if only to find out the
> rationale if there is one?
>

I'd wait for other people's responses first, there may be a better
insight to be found than what I was able to come across.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: list indices must be integers or slices, not str

2022-07-20 Thread Chris Angelico

On Wed, 20 Jul 2022 at 18:34, Frank Millman  wrote:
>
> Hi all
>
> C:\Users\E7280>python
> Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64
> bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>  >>>
>  >>> x = list(range(10))
>  >>>
>  >>> '{x[1]}'.format(**vars())
> '1'
>  >>>
>  >>> '{x[-1]}'.format(**vars())
> Traceback (most recent call last):
>File "", line 1, in 
> TypeError: list indices must be integers or slices, not str
>  >>>
>
> Can anyone explain this error? It seems that a negative index is deemed
> to be a string in this case.
>

Yeah, that does seem a little odd. What you're seeing is the same as
this phenomenon:

>>> "{x[1]} {x[spam]}".format(x={1: 42, "spam": "ham"})
'42 ham'
>>> "{x[1]} {x[spam]}".format(x={"1": 42, "spam": "ham"})
Traceback (most recent call last):
  File "", line 1, in 
KeyError: 1

But I can't find it documented anywhere that digits-only means
numeric. The best I can find is:

https://docs.python.org/3/library/string.html#formatstrings
"""The arg_name can be followed by any number of index or attribute
expressions. An expression of the form '.name' selects the named
attribute using getattr(), while an expression of the form '[index]'
does an index lookup using __getitem__()."""

and in the corresponding grammar:

field_name::=  arg_name ("." attribute_name | "[" element_index "]")*
index_string  ::=   +

In other words, any sequence of characters counts as an argument, as
long as it's not ambiguous. It doesn't seem to say that "all digits is
interpreted as an integer, everything else is interpreted as a
string". ISTM that a negative number should be interpreted as an
integer too, but that might be a backward compatibility break.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: NILEARN - WHY THIS CODE THROWS AN ERROR?????

2022-07-08 Thread Chris Angelico

On Sat, 9 Jul 2022 at 10:57, MRAB  wrote:
>
> On 08/07/2022 23:20, Avi Gross via Python-list wrote:
> > Nati Stern has asked several questions here, often about relatively 
> > technical uses of python code that many of us have never used and still is 
> > not providing more exact info that tends to be needed before anyone can 
> > even think of diagnosing the problem.
> >
> > I have learned to stay away from some such questioners. But I am wondering 
> > if some people (others too) think this forum is a generalized help desk 
> > staffed by College Professors with nothing else to do.
> >
> > Many questions are best handled locally where people can look over your 
> > shoulder or use the same software and may have some fluency in your native 
> > language. And sometimes you need to do more investigating on your own, and 
> > perhaps tell us what you tried and why it was not useful, or we end up 
> > making endless suggestions and being told we are not working on your real 
> > issue and so on.
> >
> > The code below is just babel or maybe babble. Something nested in a loop 
> > had a problem. Why not try something drastic and look at the  files and 
> > PICK ONE and use it step by step and see when it fails?
> >
> > It looks like the code wants to ask for all files then ignore some.
> >
> > Why you would import numpy repeatedly in a loop is beyond me! LOL!
> >
> > But which command line failed? My GUESS is:
> >
> > data = img.get_fdata()
> >
> >
> > If so, did you try to see the current value of the filename you call "i" in 
> > the loop and see what name was loaded in what looks like a file ending in 
> > .nii in this code:
> >
> > img = nib.load(path+"/"+i)
> >
> >
> > You need to proceed step by step and see if any previous steps failed.
> > But what is possible is you got a file with .nii in middle of the name that 
> > does not end in .gz, or is not in the format needed.
>
> Indeed, it writes JPEG files whose filename contains the original
> filename (with the ".nii") into the same folder, so if it has already
> been run and produced an output file, the next time it's run, it'll trip
> itself up.
>
> All this would've been clear to the OP if it had printed messages as it
> went.

Or if the OP had renamed them all to "shrubbery" along the way.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Creating lambdas inside generator expression

2022-06-29 Thread Chris Angelico

On Thu, 30 Jun 2022 at 02:49, Johannes Bauer  wrote:
> But now consider what happens when we create the lambdas inside a list
> comprehension (in my original I used a generator expresison, but the
> result is the same). Can you guess what happens when we create conds
> like this?
>
> conds = [ lambda msg: msg.hascode(z) for z in ("foo", "bar") ]
>
> I certainly could not. Here's what it outputs:
>
> Check for bar
> False
> Check for bar
> False
>
> I.e., the iteration variable "z" somehow gets bound inside the lambda
> not by its value, but by its reference. All checks therefore refence
> only the last variable.
>

Yep, that is the nature of closures. (Side point: This isn't actually
a generator expression, it's a list comprehension; current versions of
Python treat them broadly the same way, but there was previously a
difference in the way scoping worked.) What you're seeing is a
consequence of the way that closures work, and it is a very good thing
most of the time :)

The usual way to "snapshot" a variable is what you showed in your
followup: a default argument value.

def f(..., z=z):
... z has been snapshot

(As others have pointed out, this isn't unique to lambdas; any
function will behave that way.)

Antoon offered another variant, but written as a pair of lambda
functions, it's a little hard to see what's going on. Here's the same
technique written as a factory function:

def does_it_have(z):
return lambda msg: msg.hascode(z)

conds = [does_it_have(z) for z in ("foo", "bar")]

Written like this, it's clear that the variable z in the comprehension
is completely different from the one inside does_it_have(), and they
could have different names if you wanted to. This is a fairly clean
way to snapshot too, and has the advantage that it doesn't pretend
that the function takes an extra parameter.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: REPL with multiple function definitions

2022-06-28 Thread Chris Angelico

On Wed, 29 Jun 2022 at 11:00, Rob Cliffe via Python-list
 wrote:
>
> On 26/06/2022 23:22, Jon Ribbens via Python-list wrote:
> > On 2022-06-26, Rob Cliffe  wrote:
> >> This 2-line program
> >>
> >> def f(): pass
> >> def g(): pass
> >>
> >> runs silently (no Exception).  But:
> >>
> >> 23:07:02 c:\>python
> >> Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32
> >> bit (Intel)] on win32
> >> Type "help", "copyright", "credits" or "license" for more information.
> > def f(): pass
> >> ... def g(): pass
> >> File "", line 2
> >>   def g(): pass
> >>   ^
> >> SyntaxError: invalid syntax
> >> Is there a good reason for this?
> > For some reason, the REPL can't cope with one-line blocks like that.
> > If you put a blank line after each one-block line then it will work.
> It's actually not to do with 1-line blocks, just attempting to define 2
> functions "at once":
>
>
> 22:27:23 C:\>python
> Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32
> bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> def f():
> ... return 42
> ... def g():
>File "", line 3
>  def g():
>  ^
> SyntaxError: invalid syntax
>  >>>
>
> But you are right that adding a blank line after the first function
> definition solves the "problem".

And if you have something where you want to copy and paste multiple
statements, there are a few ways to do it:

1) Put "if 1:" at the top. That makes it a single block, so you can
paste in as much as you like, as long as the only blank line is at the
end.

2) Put the code into a file and then use "python3 -i setup.py". That
runs all the code, then drops you into the REPL in that context.

3) Put the code into a file, and inside the REPL, "from setup import
*". Unlike option 2, this can be done after the beginning of the
session. Downside: editing setup.py and reimporting won't apply your
changes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: WHAT THE ERROR ON MY CODE???

2022-06-28 Thread Chris Angelico

‪On Wed, 29 Jun 2022 at 01:37, ‫נתי שטרן‬‎  wrote:‬
> headers["Authorization"] = "Basic
> YjMwMzcwODY3NTUzNDMwNTg5NzA2MjkyNDFmMDE1YWY6VjNKYTk2Y1F4RTFzeTdYbzRnbkt0a2k1djhscXUyU01oSE5VWUwwRg=="
>

The error is that you just revealed your credentials to the whole
world. This is a public mailing list.

In fact, you just revealed your credentials to TWO mailing lists at once.

Good job.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: REPL with multiple function definitions

2022-06-26 Thread Chris Angelico

On Mon, 27 Jun 2022 at 08:15, Rob Cliffe via Python-list
 wrote:
>
> This 2-line program
>
> def f(): pass
> def g(): pass
>
> runs silently (no Exception).  But:
>
> 23:07:02 c:\>python
> Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32
> bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> def f(): pass
> ... def g(): pass
>File "", line 2
>  def g(): pass
>  ^
> SyntaxError: invalid syntax
>  >>>
>
> Is there a good reason for this?

The REPL compiles one statement at a time. A file is allowed to
contain multiple statements.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: sre_constants MODIFIED CLASS - ERROR

2022-06-24 Thread Chris Angelico

‪On Fri, 24 Jun 2022 at 22:16, ‫נתי שטרן‬‎  wrote:‬
>
> My TARGET  is to bind many code libraries to one Huge code file that works
> optimally and do optimizations if needed.
> In this file have code of huge part of falconpy, ALL code of re, argparse,
> are and many other code libraries
>
> This code file is contained 10k lines of python code
>

Did you check the license terms of all of those libraries? Are you
even legally allowed to do that?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: sre_constants MODIFIED CLASS - ERROR

2022-06-24 Thread Chris Angelico

‪On Fri, 24 Jun 2022 at 18:43, ‫נתי שטרן‬‎  wrote:‬
>
> class _NamedIntConstant(int):
> def __new__(cls, value, name):
> self = super(_NamedIntConstant, cls).__new__(cls, value)
> self.name = name
> return self
>
> def __repr__(self):
> return self.name
>
> __reduce__ = None
>  MAXREPEAT = _NamedIntConstant(32,name=str(32))
>
> what's the problem with the code

You ripped a bunch of code from the standard library without
understanding what it does, and now it doesn't work. The problem is
more with your methodology than your code.

Why are you doing this? Why not simply use what's there?

If you REALLY need to make source-level changes, make *changes*, don't
try to lift small parts out. Also, you will need to spend some hours
getting to know the code that you're mutating.

Is there an alternative newsgroup for
lazy-python-users-who-dont-want-to-do-the-w...@groups.google.com ?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: argparse modify

2022-06-23 Thread Chris Angelico

On Fri, 24 Jun 2022 at 09:03, Mats Wichmann  wrote:
> Also note that while it's claimed to be fine These Days, inheriting from
> a base type like this is sometimes tricky, sometimes broken... be
> somewhat aware.

Depends on your definition of "broken". If you want to make a custom
integer type, you'll probably find that arithmetic operations on it
just return vanilla ints again, but in this case, it seems to be more
akin to IntEnum than to an arithmetic type. so it should be safe.

But that said: why not just use IntEnum? It looks like the purpose of
it is simply to be a name for a number, and that's something that an
enum does well (even if it's not strictly "enumerated" in that sense).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: "CPython"

2022-06-20 Thread Chris Angelico

On Tue, 21 Jun 2022 at 13:12, Paulo da Silva
 wrote:
>
> Às 03:20 de 21/06/22, MRAB escreveu:
> > On 2022-06-21 02:33, Chris Angelico wrote:
> >> On Tue, 21 Jun 2022 at 11:13, Paulo da Silva
> >>  wrote:
> >>>
> >>> Às 20:01 de 20/06/22, Paulo da Silva escreveu:
> >>> > Às 18:19 de 20/06/22, Stefan Ram escreveu:
> >>> >>The same personality traits that make people react
> >>> >>to troll postings might make them spread unconfirmed
> >>> >>ideas about the meaning of "C" in "CPython".
> >>> >>
> >>> >>The /core/ of CPython is written in C.
> >>> >>
> >>> >>CPython is the /canonical/ implementation of Python.
> >>> >>
> >>> >>The "C" in "CPython" stands for C.
> >>> >>
> >>> >>
> >>> >
> >>> > Not so "unconfirmed"!
> >>> > Look at this article, I recently read:
> >>> >
> >>> https://www.analyticsinsight.net/cpython-to-step-over-javascript-in-developing-web-applications/
> >>>
> >>> >
> >>> >
> >>> > There is a sentence in ther that begins with "CPython, short for Core
> >>> > Python, a reference implementation that other Python distributions are
> >>> > derived from, ...".
> >>> >
> >>> > Anyway, I wrote "IMHO".
> >>> >
> >>> > Do you have any credible reference to your assertion "The "C" in
> >>> > "CPython" stands for C."?
> >>> >
> >>> > Thank you.
> >>>
> >>> Well ... I read the responses and they are not touching the point!
> >>> I just answered, with my opinion based on articles I have read in the
> >>> past. Certainly I could not be sure. That's why I responded as an
> >>> opinion (IMHO) and not as an assertion.
> >>> Stefan Ram responded with a, at least, not very polite post.
> >>> That's why I needed to somehow "defend" why I posted that response, and,
> >>> BTW, trying to learn why he said that the C in CPython means "written
> >>> in C".
> >>>
> >>> I still find very strange, to not say weird, that a compiler or
> >>> interpreter has a name based in the language it was written. But, again,
> >>> is just my opinion and nothing more.
> >>>
> >>
> >> Not sure why it's strange. The point is to distinguish "CPython" from
> >> "Jython" or "Brython" or "PyPy" or any of the other implementations.
> >> Yes, CPython has a special place because it's the reference
> >> implementation and the most popular, but the one thing that makes it
> >> distinct from all the others is that it's implemented in C.
> >>
> > And just to make it clear, the interpreter/compiler _itself_ is still
> > called "python". "CPython" is a name/term that was applied retroactively
> > to that particular implementation when another implementation appeared.
> Yes, but that does not necessarily means that the C has to refer to the
> language of implementation. It may well be a "core" reference to
> distinguish that implementation from others with different behaviors.
>
> Let's say they reimplement "reference python" CPython in Rust. What is
> better? Change the "reference python" CPython name to RPython, for
> example, or let it as CPython?
> It's my opinion that it should stay as CPython.
> After all who cares in which language it is implemented?
>

It is HIGHLY unlikely that the reference implementation would change
overnight. Far far more likely, if the reference implementation were
to change, would be that the new interpreter is built for a number of
years as an alternative, and then eventually becomes the more popular
implementation, and finally, the core devs begin using that more than
CPython, and perhaps deprecating CPython altogether. If that were to
happen, the other implementation would have its own name for all those
years, and would keep it after being promoted to reference
implementation.

Also, "PyPy" is a perfectly fine name and doesn't need to be changed.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: "CPython"

2022-06-20 Thread Chris Angelico

On Tue, 21 Jun 2022 at 12:53, Avi Gross via Python-list
 wrote:
>
> I don't even want to think fo what sound a C# Python would make.

Probably about 277 Hz...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: "CPython"

2022-06-20 Thread Chris Angelico

On Tue, 21 Jun 2022 at 11:13, Paulo da Silva
 wrote:
>
> Às 20:01 de 20/06/22, Paulo da Silva escreveu:
> > Às 18:19 de 20/06/22, Stefan Ram escreveu:
> >>The same personality traits that make people react
> >>to troll postings might make them spread unconfirmed
> >>ideas about the meaning of "C" in "CPython".
> >>
> >>The /core/ of CPython is written in C.
> >>
> >>CPython is the /canonical/ implementation of Python.
> >>
> >>The "C" in "CPython" stands for C.
> >>
> >>
> >
> > Not so "unconfirmed"!
> > Look at this article, I recently read:
> > https://www.analyticsinsight.net/cpython-to-step-over-javascript-in-developing-web-applications/
> >
> >
> > There is a sentence in ther that begins with "CPython, short for Core
> > Python, a reference implementation that other Python distributions are
> > derived from, ...".
> >
> > Anyway, I wrote "IMHO".
> >
> > Do you have any credible reference to your assertion "The "C" in
> > "CPython" stands for C."?
> >
> > Thank you.
>
> Well ... I read the responses and they are not touching the point!
> I just answered, with my opinion based on articles I have read in the
> past. Certainly I could not be sure. That's why I responded as an
> opinion (IMHO) and not as an assertion.
> Stefan Ram responded with a, at least, not very polite post.
> That's why I needed to somehow "defend" why I posted that response, and,
> BTW, trying to learn why he said that the C in CPython means "written in C".
>
> I still find very strange, to not say weird, that a compiler or
> interpreter has a name based in the language it was written. But, again,
> is just my opinion and nothing more.
>

Not sure why it's strange. The point is to distinguish "CPython" from
"Jython" or "Brython" or "PyPy" or any of the other implementations.
Yes, CPython has a special place because it's the reference
implementation and the most popular, but the one thing that makes it
distinct from all the others is that it's implemented in C.

I could, perhaps, create my own interpreter and name it "RosuavPython"
after myself, but when something's made by a team, it's usually more
useful to pick something that is fundamental to it (Brython is
designed to be run in a browser, Jython is written in Python to make
it easy to call on Java classes, etc).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: "CPython"

2022-06-20 Thread Chris Angelico

On Tue, 21 Jun 2022 at 08:01, dn  wrote:
>
> On 21/06/2022 09.47, Roel Schroeven wrote:
> ...
>
> > So we have an untrustworthy site that's the only one to claim that
> > CPython is short for Core Python, and we have an official site that says
> > CPython is so named because it's written in C. Hm, which one to believe?
>
>
> ...and so you can C that the only important part is the Python!

I should have cn that coming.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: "CPython"

2022-06-20 Thread Chris Angelico

On Tue, 21 Jun 2022 at 07:48, Roel Schroeven  wrote:
>
> Paulo da Silva schreef op 20/06/2022 om 21:01:
> > Às 18:19 de 20/06/22, Stefan Ram escreveu:
> > >The same personality traits that make people react
> > >to troll postings might make them spread unconfirmed
> > >ideas about the meaning of "C" in "CPython".
> > >
> > >The /core/ of CPython is written in C.
> > >
> > >CPython is the /canonical/ implementation of Python.
> > >
> > >The "C" in "CPython" stands for C.
> > >
> > >
> >
> > Not so "unconfirmed"!
> > Look at this article, I recently read:
> > https://www.analyticsinsight.net/cpython-to-step-over-javascript-in-developing-web-applications/
> >
> > There is a sentence in ther that begins with "CPython, short for Core
> > Python, a reference implementation that other Python distributions are
> > derived from, ...".
>
> Counterpoint: https://wiki.python.org/moin/SummerOfCode/2017/python-core
> says "The reference implementation of Python is CPython, so named
> because it's written in C." Even in the absence of other evidence I'd
> much rather trust a python.org page than a www.analyticsinsight.net page
> on the subject of Python implementations.

Be aware that this is a wiki, so anyone can edit it. But that also
means you can check the "Info" link to see the history of the page,
and in this case, the text in question was added by user TerriOda, who
- as can be confirmed in various places - is heavily involved in GSOC
Python projects and the like, so I would consider this to be fairly
good information.

(Though I can't honestly say whether many of the core Python devs read
that wiki, so it's always possible that false information stays there
untouched.)

> But there's more.
>
> Apart from www.analyticsinsight.net I can't find any website that
> mentions "Core Python" as a Python implementation. That's a strong
> indication that www.analyticsinsight.net is wrong on that point. Frankly
> that website seems very low quality in general. In that same article
> they say:
>
> "CPython is a descendant of Pyscript built on Pyodide, a port of
> CPython, or a Python distribution for the browser and Node.js that is
> based on Webassembly and Emscripten."
>
> CPython is definitely not a descendant of Pyscript! Looks like someone
> found something (semi-) interesting and tried to write something
> insightful about it, but without really understanding any of it. Other
> articles don't seem to be any better.
>
> So we have an untrustworthy site that's the only one to claim that
> CPython is short for Core Python, and we have an official site that says
> CPython is so named because it's written in C. Hm, which one to believe?
>

I think that's about as settled as it'll ever be. Like many things, it
doesn't necessarily have any stronger origin than "someone started
using the term, and it stuck". Reminds me of trying to research the
origin of the name "Idle" (or "IDLE" - the Integrated Development and
Learning Environment") and being unable to find any proof that it was
named after a certain Eric, but nothing to disprove it either...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: "CPython"

2022-06-20 Thread Chris Angelico

On Tue, 21 Jun 2022 at 06:31, Stefan Ram  wrote:
>
> Paulo da Silva  writes:
> >Do you have any credible reference to your assertion "The "C" in
> >"CPython" stands for C."?
>
>   Whether a source is considered "credible" is something
>   everyone must decide for themselves.
>
>   I can say that the overwhelming majority of results of Web
>   searches about this topic yields expressions of the view
>   that the "C" in "CPython" stands for C, "overwhelming
>   majority" when compared to expressions of other interpretations
>   of that "C", and "overwhelming majority" meaning something
>   like more than 90 percent.
>
>   For one example, there seems to be a book "CPython Internals"
>   which seems to say, according to one Web search engine:
>
> |The C in CPython is a reference to the C programming
> |language, indicating that this Python distribution is
> |written in the C language.
>

Does python.org count as "credible"?

https://docs.python.org/3/reference/introduction.html

CPython: This is the original and most-maintained implementation of
Python, written in C.

I think that's about as close as you're going to get to an answer.
Given that it is, in that page, being distinguished from Jython
(implemented in Python), PyPy (implemented in Python), Python for .NET
(implemented for the .NET runtime), and IronPython (one of these is
not like the others, whatever, but it's the one originally implemented
for .NET), it seems fairly safe to say that the C in CPython means the
implementation language.

If someone wants to contradict this, they'll need a strong source,
like a post from a core dev back when Jython was brand new.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: mapLast, mapFirst, and just general iterator questions

2022-06-20 Thread Chris Angelico

On Tue, 21 Jun 2022 at 06:16, Leo  wrote:
>
> On Wed, 15 Jun 2022 04:47:31 +1000, Chris Angelico wrote:
>
> > Don't bother with a main() function unless you actually need to be
> > able to use it as a function. Most of the time, it's simplest to
> > just have the code you want, right there in the file. :) Python
> > isn't C or Java, and code doesn't have to get wrapped up in
> > functions in order to exist.
>
> Actually a main() function in Python is pretty useful, because Python
> code on the top level executes a lot slower. I believe this is due to
> global variable lookups instead of local.
>
> Here is benchmark output from a small test.
>
> ```
> Benchmark 1: python3 test1.py
>   Time (mean ± σ): 662.0 ms ±  44.7 ms
>   Range (min … max):   569.4 ms … 754.1 ms
>
> Benchmark 2: python3 test2.py
>   Time (mean ± σ): 432.1 ms ±  14.4 ms
>   Range (min … max):   411.4 ms … 455.1 ms
>
> Summary
>   'python3 test2.py' ran
> 1.53 ± 0.12 times faster than 'python3 test1.py'
> ```
>
> Contents of test1.py:
>
> ```
> l1 = list(range(5_000_000))
> l2 = []
>
> while l1:
> l2.append(l1.pop())
>
> print(len(l1), len(l2))
> ```
>
> Contents of test2.py:
>
> ```
> def main():
> l1 = list(range(5_000_000))
> l2 = []
>
> while l1:
> l2.append(l1.pop())
>
> print(len(l1), len(l2))
> main()
> ```
>

To be quite honest, I have never once in my life had a time when the
execution time of a script is dominated by global variable lookups in
what would be the main function, AND it takes long enough to care
about it. Yes, technically it might be faster, but I've probably spent
more time reading your post than I'll ever save by putting stuff into
a function :)

Also, often at least some of those *need* to be global in order to be
useful, so you'd lose any advantage you gain.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

PEP 401 origin and authorship

2022-06-17 Thread Chris Angelico

Somewhere around the place, I remember reading something about how PEP
401 (the retirement of the BDFL and the accession of the FLUFL) came
to be. It involved a joke being turned on its originator, I think. But
I can't find it back. Anyone have a reference handy?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ModuleNotFoundError

2022-06-15 Thread Chris Angelico

On Thu, 16 Jun 2022 at 05:00, Zoltan Szenderak  wrote:
>
>
>
> Only on my Windows 10, on my Windows 11 works perfectly. I uninstalled and 
> reinstalled python, it did not help, I tried everything I found online, 
> Stackoverflow, python.org, did not help. Not just this module, others too. 
> They are installed where they are supposed to be and they produce the name 
> not found error. Some modules work. The Path is set in the System Environment 
> Variables list.
>
> (env) C:\Users\zszen>req3.py
> Traceback (most recent call last):
>   File "C:\Users\zszen\env\Scripts\req3.py", line 2, in 
> import requests_html
> ModuleNotFoundError: No module named 'requests_html'
>
> (env) C:\Users\zszen>pip3 install requests_html
> Requirement already satisfied: requests_html in 
> c:\users\zszen\env\lib\site-packages (0.10.0)
> Requirement already satisfied: requests in 
> c:\users\zszen\env\lib\site-packages (from requests_html) (2.27.1)
> Requirement already satisfied: bs4 in c:\users\zszen\env\lib\site-packages 
> (from requests_html) (0.0.1)
> Requirement already satisfied: fake-useragent in 
> c:\users\zszen\env\lib\site-packages (from requests_html) (0.1.11)
> Requirement already satisfied: pyppeteer>=0.0.14 in 
> c:\users\zszen\env\lib\site-packages (from requests_html) (1.0.2)
> Requirement already satisfied: pyquery in 
> c:\users\zszen\env\lib\site-packages (from requests_html) (1.4.3)
> Requirement already satisfied: parse in c:\users\zszen\env\lib\site-packages 
> (from requests_html) (1.19.0)
> Requirement already satisfied: w3lib in c:\users\zszen\env\lib\site-packages 
> (from requests_html) (1.22.0)
> Requirement already satisfied: appdirs<2.0.0,>=1.4.3 in 
> c:\users\zszen\env\lib\site-packages (from pyppeteer>=0.0.14->requests_html) 
> (1.4.4)
> Requirement already satisfied: websockets<11.0,>=10.0 in 
> c:\users\zszen\env\lib\site-packages (from pyppeteer>=0.0.14->requests_html) 
> (10.3)
> Requirement already satisfied: urllib3<2.0.0,>=1.25.8 in 
> c:\users\zszen\env\lib\site-packages (from pyppeteer>=0.0.14->requests_html) 
> (1.26.9)
> Requirement already satisfied: tqdm<5.0.0,>=4.42.1 in 
> c:\users\zszen\env\lib\site-packages (from pyppeteer>=0.0.14->requests_html) 
> (4.64.0)
> Requirement already satisfied: importlib-metadata>=1.4 in 
> c:\users\zszen\env\lib\site-packages (from pyppeteer>=0.0.14->requests_html) 
> (4.11.4)
> Requirement already satisfied: pyee<9.0.0,>=8.1.0 in 
> c:\users\zszen\env\lib\site-packages (from pyppeteer>=0.0.14->requests_html) 
> (8.2.2)
> Requirement already satisfied: certifi>=2021 in 
> c:\users\zszen\env\lib\site-packages (from pyppeteer>=0.0.14->requests_html) 
> (2022.5.18.1)
> Requirement already satisfied: beautifulsoup4 in 
> c:\users\zszen\env\lib\site-packages (from bs4->requests_html) (4.11.1)
> Requirement already satisfied: cssselect>0.7.9 in 
> c:\users\zszen\env\lib\site-packages (from pyquery->requests_html) (1.1.0)
> Requirement already satisfied: lxml>=2.1 in 
> c:\users\zszen\env\lib\site-packages (from pyquery->requests_html) (4.9.0)
> Requirement already satisfied: charset-normalizer~=2.0.0 in 
> c:\users\zszen\env\lib\site-packages (from requests->requests_html) (2.0.12)
> Requirement already satisfied: idna<4,>=2.5 in 
> c:\users\zszen\env\lib\site-packages (from requests->requests_html) (3.3)
> Requirement already satisfied: six>=1.4.1 in 
> c:\users\zszen\env\lib\site-packages (from w3lib->requests_html) (1.16.0)
> Requirement already satisfied: zipp>=0.5 in 
> c:\users\zszen\env\lib\site-packages (from 
> importlib-metadata>=1.4->pyppeteer>=0.0.14->requests_html) (3.8.0)
> Requirement already satisfied: colorama in 
> c:\users\zszen\env\lib\site-packages (from 
> tqdm<5.0.0,>=4.42.1->pyppeteer>=0.0.14->requests_html) (0.4.4)
> Requirement already satisfied: soupsieve>1.2 in 
> c:\users\zszen\env\lib\site-packages (from 
> beautifulsoup4->bs4->requests_html) (2.3.2.post1)
>
> (env) C:\Users\zszen>
>

What does "pip3 --version" tell you, and what happens if you print out
sys.version at the top of your script? Also possibly sys.executable,
in case your script isn't running from inside the venv that it looks
like it ought to be working in.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: mapLast, mapFirst, and just general iterator questions

2022-06-14 Thread Chris Angelico

On Wed, 15 Jun 2022 at 05:45, Roel Schroeven  wrote:
>
> Chris Angelico schreef op 14/06/2022 om 20:47:
> > > def main():
> > > for each in (iterEmpty, iter1, iter2, iterMany):
> > > baseIterator = each()
> > > chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
> > > andCapLast = mapLast(chopFirst, lambda x: x.upper())
> > > print(repr(" ".join(andCapLast)))
> >
> > Don't bother with a main() function unless you actually need to be
> > able to use it as a function. Most of the time, it's simplest to just
> > have the code you want, right there in the file. :) Python isn't C or
> > Java, and code doesn't have to get wrapped up in functions in order to
> > exist.
> Not (necessarily) a main function, but these days the general
> recommendation seems to be to use the "if __name__ == '__main__':"
> construct, so that the file can be used as a module as well as as a
> script. Even for short simple things that can be helpful when doing
> things like running tests or extracting docstrings.

If it does need to be used as a module as well as a script, sure. But
(a) not everything does, and (b) even then, you don't need a main()
function; what you need is the name-is-main check. The main function
is only necessary when you need to be able to invoke your main entry
point externally, AND this main entry point doesn't have a better
name. That's fairly rare in my experience.

My recommendation is to write the code you need, and only add
boilerplate when you actually need it. Don't just start every script
with an if-name-is-main block at the bottom just for the sake of doing
it.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: mapLast, mapFirst, and just general iterator questions

2022-06-14 Thread Chris Angelico

On Wed, 15 Jun 2022 at 04:07, Travis Griggs  wrote:
> def mapFirst(stream, transform):
> try:
> first = next(stream)
> except StopIteration:
> return
> yield transform(first)
> yield from stream

Small suggestion: Begin with this:

stream = iter(stream)

That way, you don't need to worry about whether you're given an
iterator or some other iterable (for instance, you can't call next()
on a list, but it would make good sense to be able to use your
function on a list).

(BTW, Python's convention would be to call this "map_first" rather
than "mapFirst". But that's up to you.)

> def mapLast(stream, transform):
> try:
> previous = next(stream)
> except StopIteration:
> return
> for item in stream:
> yield previous
> previous = item
> yield transform(previous)

Hmm. This might be a place to use multiple assignment, but what you
have is probably fine too.

> def main():
> for each in (iterEmpty, iter1, iter2, iterMany):
> baseIterator = each()
> chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
> andCapLast = mapLast(chopFirst, lambda x: x.upper())
> print(repr(" ".join(andCapLast)))

Don't bother with a main() function unless you actually need to be
able to use it as a function. Most of the time, it's simplest to just
have the code you want, right there in the file. :) Python isn't C or
Java, and code doesn't have to get wrapped up in functions in order to
exist.

> Is this idiomatic? Especially my implementations of mapFirst and mapList 
> there in the middle? Or is there some way to pull this off that is more 
> elegant?
>

Broadly so. Even with the comments I've made above, I wouldn't say
there's anything particularly *wrong* with your code. There are, of
course, many ways to do things, and what's "best" depends on what your
code is doing, whether it makes sense in context.

> I've been doing more with iterators and stacking them (probably because I've 
> been playing with Elixir elsewhere), I am generally curious what the 
> performance tradeoffs of heavy use of iterators and yield functions in python 
> is. I know the argument for avoiding big list copies when moving between 
> stages. Is it one of those things where there's also some overhead with them, 
> where for small stuff, you'd just be better list-ifying the first iterator 
> and then working with lists (where, for example, I could do the first/last 
> clamp operation with just indexing operations).
>

That's mostly right, but more importantly: Don't worry about
performance. Worry instead about whether the code is expressing your
intent. If that means using a list instead of an iterator, go for it!
If that means using an iterator instead of a list, go for it! Python
won't judge you. :)

But if you really want to know which one is faster, figure out a
reasonable benchmark, and then start playing around with the timeit
module. Just remember, it's very very easy to spend hours trying to
make the benchmark numbers look better, only to discover that it has
negligible impact on your code's actual performance - or, in some
cases, it's *worse* than before (because the benchmark wasn't truly
representative). So if you want to spend some enjoyable time exploring
different options, go for it! And we'd be happy to help out. Just
don't force yourself to write bad code "because it's faster".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Suggestion. Replace Any with *

2022-06-13 Thread Chris Angelico

On Tue, 14 Jun 2022 at 01:59, h3ck phy  wrote:
>
> It would be nice if we could write something like this
> data: dict[str, *] = {}
> instead of
> data: dict[str, Any] = {}
>
> In import statement asterisk means "all names" in a module.
> But in type closure it should mean "all types".

Type hints are normal Python syntax. For this to be legal in a type
hint, it would need to be legal in the rest of Python. What would it
mean?

some_object[other_object, *]

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Function to Print a nicely formatted Dictionary or List?

2022-06-09 Thread Chris Angelico

On Fri, 10 Jun 2022 at 03:44, Dave  wrote:
>
> Hi,
>
> Before I write my own I wondering if anyone knows of a function that will 
> print a nicely formatted dictionary?
>
> By nicely formatted I mean not all on one line!
>

https://docs.python.org/3/library/pprint.html

from pprint import pprint
pprint(thing)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to test characters of a string

2022-06-08 Thread Chris Angelico

On Thu, 9 Jun 2022 at 04:14, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>
> On 2022-06-09 at 03:18:56 +1000,
> Chris Angelico  wrote:
>
> > On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote:
> > >
> > > On 2022-06-08 at 08:07:40 -,
> > > De ongekruisigde  wrote:
> > >
> > > > Depending on the problem a regular expression may be the much simpler
> > > > solution. I love them for e.g. text parsing and use them all the time.
> > > > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines
> > > > like these:
> > > >
> > > >   root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
> > > >   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
> > > >   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
> > > >   avahi:x:997:996:avahi-daemon privilege separation 
> > > > user:/var/empty:/run/current-system/sw/bin/nologin
> > > >   sshd:x:998:993:SSH privilege separation 
> > > > user:/var/empty:/run/current-system/sw/bin/nologin
> > > >   geoclue:x:999:998:Geoinformation 
> > > > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
> > > >
> > > > Compare a regexp solution like this:
> > > >
> > > >   >>> g = 
> > > > re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , s)
> > > >   >>> print(g.groups())
> > > >   ('geoclue', 'x', '999', '998', 'Geoinformation service', 
> > > > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
> > > >
> > > > to the code one would require to process it manually, with all the edge
> > > > cases. The regexp surely reads much simpler (?).
> > >
> > > Uh...
> > >
> > > >>> import pwd # https://docs.python.org/3/library/pwd.html
> > > >>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
> > > [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, 
> > > pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', 
> > > pw_shell='/sbin/nologin')]
> >
> > That's great if the lines are specifically coming from your system's
> > own /etc/passwd, but not so much if you're trying to compare passwd
> > files from different systems, where you simply have the files
> > themselves.
>
> In addition to pwent to get specific entries from the local password
> database, POSIX has fpwent to get a specific entry from a stream that
> looks like /etc/passwd.  So even POSIX agrees that if you think you have
> to process this data manually, you're doing it wrong.  Python exposes
> neither functon directly (at least not in the pwd module or the os
> module; I didn't dig around or check PyPI).

So.. we can go find some other way of calling fpwent, or we can
just parse the file ourselves. It's a very VERY simple format.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to test characters of a string

2022-06-08 Thread Chris Angelico

On Thu, 9 Jun 2022 at 03:15, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>
> On 2022-06-08 at 08:07:40 -,
> De ongekruisigde  wrote:
>
> > Depending on the problem a regular expression may be the much simpler
> > solution. I love them for e.g. text parsing and use them all the time.
> > Unrivaled when e.g. parts of text have to be extracted, e.g. from lines
> > like these:
> >
> >   root:x:0:0:System administrator:/root:/run/current-system/sw/bin/bash
> >   dhcpcd:x:995:991::/var/empty:/run/current-system/sw/bin/nologin
> >   nm-iodine:x:996:57::/var/empty:/run/current-system/sw/bin/nologin
> >   avahi:x:997:996:avahi-daemon privilege separation 
> > user:/var/empty:/run/current-system/sw/bin/nologin
> >   sshd:x:998:993:SSH privilege separation 
> > user:/var/empty:/run/current-system/sw/bin/nologin
> >   geoclue:x:999:998:Geoinformation 
> > service:/var/lib/geoclue:/run/current-system/sw/bin/nologin
> >
> > Compare a regexp solution like this:
> >
> >   >>> g = re.search(r'([^:]*):([^:]*):(\d+):(\d+):([^:]*):([^:]*):(.*)$' , 
> > s)
> >   >>> print(g.groups())
> >   ('geoclue', 'x', '999', '998', 'Geoinformation service', 
> > '/var/lib/geoclue', '/run/current-system/sw/bin/nologin')
> >
> > to the code one would require to process it manually, with all the edge
> > cases. The regexp surely reads much simpler (?).
>
> Uh...
>
> >>> import pwd # https://docs.python.org/3/library/pwd.html
> >>> [x for x in pwd.getpwall() if x[0] == 'geoclue']
> [pwd.struct_passwd(pw_name='geoclue', pw_passwd='x', pw_uid=992, 
> pw_gid=992, pw_gecos='Geoinformation service', pw_dir='/var/lib/geoclue', 
> pw_shell='/sbin/nologin')]

That's great if the lines are specifically coming from your system's
own /etc/passwd, but not so much if you're trying to compare passwd
files from different systems, where you simply have the files
themselves.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to replace characters in a string?

2022-06-08 Thread Chris Angelico

On Wed, 8 Jun 2022 at 19:13, Dave  wrote:
>
> Hi,
>
> Thanks for this!
>
> So, is there a copy function/method that returns a MutableString like in 
> objective-C? I’ve solved this problems before in a number of languages like 
> Objective-C and AppleScript.
>
> Basically there is a set of common characters that need “normalizing” and I 
> have a method that replaces them in a string, so:
>
> myString = [myString normalizeCharacters];
>
> Would return a new string with all the “common” replacements applied.
>
> Since the following gives an error :
>
> myString = 'Hello'
> myNewstring = myString.replace(myString,'e','a’)
>
> TypeError: 'str' object cannot be interpreted as an integer
>
> I can’t see of a way to do this in Python?
>

Not sure why you're passing the string as an argument as well as using
it as the object you're calling a method on. All you should need to do
is:

myString.replace('e', 'a')

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to replace characters in a string?

2022-06-08 Thread Chris Angelico

On Wed, 8 Jun 2022 at 18:20, Dave  wrote:
>
> PS
>
> I’ve also tried:
> myCompareFile1 = myTitleName
> myCompareFile1.replace("\u2019", "'")
> myCompareFile2 = myCompareFileName
> myCompareFile2.replace("\u2019", "'")
> Which also doesn’t work, the replace itself work but it still fails the 
> compare?
>

This is a great time to start exploring what actually happens when you
do "myCompareFile2 = myCompareFileName". I recommend doing some poking
around with strings (which are immutable), lists (which aren't), and
tuples (which aren't, but can contain mutable children).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to replace characters in a string?

2022-06-08 Thread Chris Angelico

On Wed, 8 Jun 2022 at 18:12, Dave  wrote:

> I tried the but it doesn’t seem to work?
> myCompareFile1 = ascii(myTitleName)
> myCompareFile1.replace("\u2019", "'")

Strings in Python are immutable. When you call ascii(), you get back a
new string, but it's one that has actual backslashes and such in it.
(You probably don't need this step, other than for debugging; check
the string by printing out the ASCII version of it, but stick to the
original for actual processing.) The same is true of the replace()
method; it doesn't change the string, it returns a new string.

>>> word = "spam"
>>> print(word.replace("sp", "h"))
ham
>>> print(word)
spam

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to test characters of a string

2022-06-07 Thread Chris Angelico

On Wed, 8 Jun 2022 at 07:24, Barry  wrote:
>
>
>
> > On 7 Jun 2022, at 22:04, Dave  wrote:
> >
> > It depends on the language I’m using, in Objective C, I’d use isNumeric, 
> > just wanted to know what the equivalent is in Python.
> >
> > If you know the answer why don’t you just tell me and if you don’t, don’t 
> > post!
>
> People ask home work questions here and we try to teach a student with hints 
> not finished answers.
> Your post was confused with a home work question.
>

In the future, to make it look less like a homework question, show
your current code, which would provide context. Last I checked,
homework questions don't usually involve ID3 tags in MP3 files :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: min, max with position

2022-06-04 Thread Chris Angelico

On Sun, 5 Jun 2022 at 08:09, dn  wrote:
>
> On 05/06/2022 09.50, Chris Angelico wrote:
> > No, but it shouldn't be too hard to make it if you want it. The
> > obvious option of calling max/min on the enumerated list won't work on
> > its own, since the index comes before the value, but with a key
> > function it would work fine:
> >
> >>>> min(enumerate(l), key=lambda x: x[1])
> > (0, 1.618033)
> >>>> max(enumerate(l), key=lambda x: x[1])
> > (1, 3.141593)
>
> An elegant solution!
>
> But, but, but which of the above characters is an 'el' and which a 'one'???
> (please have pity on us old f...s and the visually-challenged!)
>

Fair point, but I stuck to the OP's example list and kept it called
'l' for list :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: min, max with position

2022-06-04 Thread Chris Angelico

On Sun, 5 Jun 2022 at 08:00, dn  wrote:
>
> On 05/06/2022 06.56, Dennis Lee Bieber wrote:
> > On Sat, 4 Jun 2022 13:36:26 -0500, "Michael F. Stemper"
> >  declaimed the following:
> >
> >>
> >> Are there similar functions that return not only the minimum
> >> or maximum value, but also its position?
> >>
> >   If it isn't in the library reference manual, NO...
> >
> >   But it also isn't that difficult to write...
> >
>  def whatAt(function, data):
> > ...   what = function(data)
> > ...   at = data.index(what)
> > ...   return (at, what)
> > ...
>  l = [  1.618033,   3.1415923536,   2.718282]
>  whatAt(min, l)
> > (0, 1.618033)
>  whatAt(max, l)
> > (1, 3.1415923536)
> 
> >
> > (properly, I should either reverse the order of the return value, or change
> > the name to atWhat() )
>
> ...and remembering the special case:
> if the what value appears more than once in the list, the where?at will
> report the first/'left-most' index only.
>

Which is consistent with the vanilla min() function.

>>> min([1, 1.0])
1

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: min, max with position

2022-06-04 Thread Chris Angelico

On Sun, 5 Jun 2022 at 07:46, Michael F. Stemper
 wrote:
>
> Python contains built-in functions that return the minimum or
> maximum items in a list.
>
>   >>> l = [1.618033,3.141593,2.718282]
>   >>> min(l)
>   1.618033
>   >>> max(l)
>   3.141593
>   >>>
>
> Are there similar functions that return not only the minimum
> or maximum value, but also its position?
>
>   >>> specialmin(l)
>   (0,1.618033)
>   >>> specialmax(l)
>   3.141593
>   >>>
>

No, but it shouldn't be too hard to make it if you want it. The
obvious option of calling max/min on the enumerated list won't work on
its own, since the index comes before the value, but with a key
function it would work fine:

>>> min(enumerate(l), key=lambda x: x[1])
(0, 1.618033)
>>> max(enumerate(l), key=lambda x: x[1])
(1, 3.141593)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Automatic Gain Control in Python?

2022-05-31 Thread Chris Angelico

On Wed, 1 Jun 2022 at 11:05, Steve GS  wrote:
>
>
> >Even easier, the few NPR podcasts I just checked now have RSS feeds of
> their episodes (as expected).  It seems it would be much easier to just
> download the latest episode based on the XML file, normalize, send it to
> play, done.
>
> How can that possibly be easier? I am playing the podcast and recording it
> for a one-time replay.
> Now you want me to write a program that automatically downloads 48 files
> then manipulate them for equalization then replay it. It certainly doesn't
> sound easier to me. I already have that working using simple
> computer-generated vocal commands.
>

General principle: If you're asking someone else for help, don't tell
them that your way is easier, because the obvious response is "go
ahead then, do it your own way".

You're technically right in a sense: something that you already have
is, indeed, easier than something else. But downloading files is
*easy* in Python, and audio analysis on files is FAR easier than
real-time audio analysis with hysteresis avoidance.

What you're doing actually reminds me of the old acoustic couplers
[1], which were a messy hack brought about by monopolies that refused
to allow other devices onto the network. Unless you have a really good
reason for sticking to the black-box system, I would strongly
recommend going for the much much easier method of simply downloading
the files as they are.

ChrisA

[1] https://en.wikipedia.org/wiki/Acoustic_coupler
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: .0 in name

2022-05-28 Thread Chris Angelico

On Sun, 29 May 2022 at 08:26, Eryk Sun  wrote:
>
> On 5/28/22, Chris Angelico  wrote:
> >
> > be extremely confusing; so to keep everything safe, the interpreter
> > generates a name you couldn't possibly want - same as for the function
> > itself, which is named "" or "", angle brackets
> > included.
>
> To clarify, "" is the co_name and co_qualname value of the
> code object, which was compiled for the list comprehension. These
> names are also used as the __name__ and __qualname__ of the temporary
> object that's created by MAKE_FUNCTION. They are not identifiers. The
> code object is a constant, which is referenced solely by its index in
> the co_consts tuple. The temporary function is referenced on the
> stack.

Correct. Every function has a name, important for tracebacks and such,
but with lambda functions, the internal functions of comprehensions,
and so on, there's no actual name binding for it. So the interpreter
generates a name that won't collide with any actual name that you'd
have assigned anything to.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: .0 in name

2022-05-28 Thread Chris Angelico

On Sun, 29 May 2022 at 06:41, Ralf M.  wrote:
>
> Am 13.05.2022 um 23:23 schrieb Paul Bryan:
> > On Sat, 2022-05-14 at 00:47 +0800, bryangan41 wrote:
> >
> >> May I know (1) why can the name start with a number?
> >
> > The name of an attribute must be an identifier. An identifier cannot
> > begin with a decimal number.
>
> I'm not sure about the first statement. Feeding
>
> [print("locals:", locals()) or c for c in "ab"]
>
> to the REPL, the result is
>
> locals: {'.0': , 'c': 'a'}
> locals: {'.0': , 'c': 'b'}
> ['a', 'b']
>
> i.e. there is a variable of name .0 in the local namespace within the
> list comprehension, and .0 is definitely not an identifier.
>
> I came across this while investigating another problem with list
> comprehensions, and I think the original post was about list comprehensions.
>

There are a few quirks with comprehensions, and to understand that
".0", you have to first understand two very important aspects of
scoping with regard to comprehensions.

(Note: For simplicity, I'm going to refer in general to
"comprehensions", and I am not going to count Python 2. My example
will be a list comp, but a generator expression also behaves like
this, as do other comprehensions.)

Consider this function:

def spam():
ham = "initial"
ham = [locals() for x in "q"]
return ham

The disassembly module can be very helpful here. The precise output
will vary with Python version, but the points I'm making should be
valid for all current versions. Here's how it looks in a December
build of Python 3.11 (yeah, my Python's getting a bit old now, I
should update at some point):

>>> dis.dis(spam)
  2   0 LOAD_CONST   1 ('initial')
  2 STORE_FAST   0 (ham)

  3   4 LOAD_CONST   2 ( at
0x7fb6a0cfa6b0, file "", line 3>)
  6 MAKE_FUNCTION0
  8 LOAD_CONST   3 ('q')
 10 GET_ITER
 12 CALL_FUNCTION1
 14 STORE_FAST   0 (ham)

  4  16 LOAD_FAST0 (ham)
 18 RETURN_VALUE

Disassembly of  at 0x7fb6a0cfa6b0, file
"", line 3>:
  3   0 BUILD_LIST   0
  2 LOAD_FAST0 (.0)
>>4 FOR_ITER 5 (to 16)
  6 STORE_FAST   1 (x)
  8 LOAD_GLOBAL  0 (locals)
 10 CALL_FUNCTION0
 12 LIST_APPEND  2
 14 JUMP_ABSOLUTE2 (to 4)
>>   16 RETURN_VALUE
>>>

Okay, that's a lot of raw data, but let's pull out a few useful things from it.

Line 2 initializes ham in an unsurprising way. Grab a constant, store
it in a local. Easy.

Line three. We grab the code object for the list comp, and make a
function (that's necessary for closures). Then, *still in the context
of the spam function*, we grab the constant "q", and get an iterator
from it. Leaving that on the top of the stack, we call the list
comprehension's function, and store the result into 'ham'.

The comprehension itself loads the fast local from slot zero (name
".0")  and iterates over it. Slot zero is the first argument, so
that's the string iterator that we left there for the function.

So why IS this? There are a few reasons, but the main one is generator
expressions. Replacing the list comp with a genexp gives this result:

>>> spam()
. at 0x7fb6a0780890>

The actual iteration (row 4 in the genexp in the above disassembly of
) doesn't happen until you iterate over this value. But it
would be extremely confusing if, in that situation, errors didn't show
up until much later. What if, instead of iterating over a string, you
tried to iterate over a number? Where should the traceback come from?
Or what if you're iterating over a variable, and you change what's in
that variable?

def wat():
stuff = "hello"
ucase = (l.upper() for l in stuff)
stuff = "goodbye"
return "".join(ucase)

Does this return "HELLO" or "GOODBYE"? Since stuff gets evaluated
immediately, it returns HELLO, and that's consistent for list comps
and genexps.

But because of that, there needs to be a parameter to carry that
iterator through, and every parameter needs a name. If the generated
name collided with any identifier that you actually wanted, it would
be extremely confusing; so to keep everything safe, the interpreter
generates a name you couldn't possibly want - same as for the function
itself, which is named "" or "", angle brackets
included.

That's a fairly long-winded way to put it, but that's why you can have
variables with bizarre names :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Now for something completely different...

2022-05-22 Thread Chris Angelico

On Mon, 23 May 2022 at 09:19, Skip Montanaro  wrote:
> That's not too informative (other than its relationship to moi), and I have
> room for probably four or five more characters. (I have a graphic artist in
> mind, so the space need not strictly be text either.)

Aww, not enough room to say "straight line", because (in Euclidean
space) it's the fastest way from B to A.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: how to distinguish return from print()

2022-05-22 Thread Chris Angelico

On Mon, 23 May 2022 at 09:23, Stefan Ram  wrote:
>   You are making it extra hard by wording the question in this
>   way. "What's the difference between the moon and liberty?". Uh ...
>
>   It's much easier to explain the moon and liberty separately.

"You can't tell the difference between a lump on the head and
margarine. The leadership of the Conservative Party is yours for the
asking!"
-- Grytpype Thynne, "The Last Goon Show of All"

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: "py" command for Linux and Mac?

2022-05-20 Thread Chris Angelico

On Sat, 21 May 2022 at 11:22, Michael Torrie  wrote:
> And of course the answer given by the grandparent is that Dan should use
> a normal linux shebang line in his scripts and on Windows the py
> launcher will read that shebang and guestimate the proper python
> interpreter to use and execute the script with that. Thus if I'm reading
> this correctly, a Linux shebang line should function as expected on
> Windows when python files are associated and launched with the py.exe
> launcher, even though there's no such thing as /usr/bin/python3 on
> Windows.  Py launcher makes it work as if there was.
>

That's correct, and when the py.exe launcher was first, well,
launched, the main thrust of it was "it uses the shebang that you
already include for the sake of Unix systems". You don't need extra
directives to tell it what to do.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Discerning "Run Environment"

2022-05-18 Thread Chris Angelico

On Wed, 18 May 2022 at 19:40, Stephen Tucker  wrote:
>
> Hi,
>
> I am a Windows 10 user still using Python 2.x (for good reasons, I assure
> you.)
>
> I have a Python 2.x module that I would like to be able to use in a variety
> of Python 2.x programs. The module outputs characters to the user that are
> only available in the Unicode character set.
>
> I have found that the selection of characters in that set that are
> available to my software depends on whether, for example, the program is
> being run during an IDLE session or at a Command Prompt.

Real solution? Set the command prompt to codepage 65001. Then it
should be able to handle all characters. (Windows-65001 is its alias
for UTF-8.)

> I am therefore needing to include logic in this module that (a) enables it
> to output appropriate characters depending on whether it is being run
> during an IDLE session or at a command prompt, and (b) enables it to
> discern which of these two "run environments" it is running in.
>
> Goal (a) is achieved easily by using string.replace to replace unavailable
> characters with available ones where necessary.
>
> The best way I have found so far to achieve goal (b) is to use sys.modules
> and ascertain whether any modules contain the string "idlelib". If they do,
> that I assume that the software is being run in an IDLE session.
>
> I suspect that there is a more Pythonic (and reliable?) way of achieving
> goal (b).
>
> Can anyone please tell me if there is, and, if there is, what it is?

Ultimately, it's going to depend on where your text is going: is it
going to the console, or to a Tk widget? I don't have a Windows system
handy to check, but I would suspect that you can distinguish these by
seeing whether sys.stdout is a tty, since Idle pipes stdout into its
own handler. That won't be a perfect check, as it would consider
"piped into another command" to be UTF-8 compatible, but that's
probably more right than wrong anyway.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Request for assistance (hopefully not OT)

2022-05-17 Thread Chris Angelico

On Wed, 18 May 2022 at 04:05, Loris Bennett  wrote:
>
> [snip (26 lines)]
>
> > I think you had a problem before that.  Debian testing is not an
> > operating system you should be using if you have a fairly good
> > understanding of how Debian (or Linux in general) works.
>
> Should be
>
>   I think you had a problem before that.  Debian testing is not an
>   operating system you should be using *unless* you have a fairly good
>   understanding of how Debian (or Linux in general) works.
>
> [snip (62 lines)]
>

Oh! My bad, didn't see this correction, sorry. With this adjustment,
the comment is a bit more reasonable, although I'd still say it's
generally fine to run Debian Testing on a personal desktop machine;
there are a number of distros that base themselves on Testing.

But yes, "unless" makes much more sense there.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Request for assistance (hopefully not OT)

2022-05-17 Thread Chris Angelico

On Wed, 18 May 2022 at 04:05, Loris Bennett  wrote:
> > So now I have problems.
>
> I think you had a problem before that.  Debian testing is not an
> operating system you should be using if you have a fairly good
> understanding of how Debian (or Linux in general) works.

I take issue with that! Debian Testing is a perfectly viable operating
system! I wouldn't use it on a server, but it's perfectly fine to use
it on a personal machine. You can generally consider Debian Testing to
be broadly as stable as Ubuntu non-LTS releases, although in my
opinion, it's actually quite a bit more dependable than them.

(Perhaps you're thinking of Debian Unstable?)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Request for assistance (hopefully not OT)

2022-05-17 Thread Chris Angelico

On Tue, 17 May 2022 at 21:22, o1bigtenor  wrote:
>
> Greetings
>
> I was having space issues in my /usr directory so I deleted some
> programs thinking that the space taken was more an issue than having
> older versions of the program.
>
> So one of the programs I deleted (using rm -r) was python3.9.
> Python3.10 was already installed so I thought (naively!!!) that things
> should continue working.
> (Python 3.6, 3.7 and 3.8 were also part of this cleanup.)

Did you install Python 3.9 using apt? If so, you should definitely
have removed it using apt - if for no reason than to find out if
something's depending on it.

Generally, Linux systems have just one "system Python" that other
applications depend on. Any other installed version is completely
independent.

> So now I have problems.
>
> Following is the system barf that I get when I run '# apt upgrade'.
>
> What can I do to correct this self-inflicted problem?
>
> (running on debian testing 5.17

I presume 5.17 is the Linux kernel version? Depending on how
up-to-date your Debian Testing is, that should theoretically mean that
the system Python is 3.10, which would imply that it should have been
safe to remove 3.9... but only if you had done it with apt.

> Setting up python2.7-minimal (2.7.18-13.1) ...
> Could not find platform independent libraries 
> Could not find platform dependent libraries 
> Consider setting $PYTHONHOME to [:]
> /usr/bin/python2.7: can't open file
> '/usr/lib/python2.7/py_compile.py': [Errno 2] No such file or
> directory

Did you also use rm to get rid of Python 2.7?

> dpkg: error processing package python2.7-minimal (--configure):
>  installed python2.7-minimal package post-installation script
> subprocess returned error exit status 2
> Setting up python3.9-minimal (3.9.12-1) ...
> update-binfmts: warning: /usr/share/binfmts/python3.9: no executable
> /usr/bin/python3.9 found, but continuing anyway as you request
> /var/lib/dpkg/info/python3.9-minimal.postinst: 51: /usr/bin/python3.9: not 
> found
> dpkg: error processing package python3.9-minimal (--configure):
>  installed python3.9-minimal package post-installation script
> subprocess returned error exit status 127
> dpkg: dependency problems prevent configuration of python3.9:
>  python3.9 depends on python3.9-minimal (= 3.9.12-1); however:
>   Package python3.9-minimal is not configured yet.
>
> dpkg: error processing package python3.9 (--configure):
>  dependency problems - leaving unconfigured
> dpkg: dependency problems prevent configuration of python2.7:
>  python2.7 depends on python2.7-minimal (= 2.7.18-13.1); however:
>   Package python2.7-minimal is not configured yet.
>
> dpkg: error processing package python2.7 (--configure):
>  dependency problems - leaving unconfigured
> dpkg: dependency problems prevent configuration of python3.9-dev:
>  python3.9-dev depends on python3.9 (= 3.9.12-1); however:
>   Package python3.9 is not configured yet.
>
> dpkg: error processing package python3.9-dev (--configure):
>  dependency problems - leaving unconfigured
> . . .
> Errors were encountered while processing:
>  python2.7-minimal
>  python3.9-minimal
>  python3.9
>  python2.7
>  python3.9-dev

So, yeah, you're definitely going to need to reinstate some parts of
Python to get this going.

If you can figure out which exact Python versions you need, it might
be possible to restore them manually. Download the packages from
packages.debian.org, then try to manually install them with dpkg, and
if that fails, unpack them and put the files into the right places.

It's going to be a pain. A lot of pain. And next time, use apt to
uninstall what apt installed :)

Something else to consider, though: It might not be Python that's
taking up all the space. On my system, /usr is dominated by /usr/lib
and /usr/local/lib, and while it might look like the pythonx.y
directories there are the large part, it's actually not Python itself
that's so big: it's other libraries, installed using either apt or
pip. So when you're trying to free up space, look to see whether you
have packages installed into every version of Python you have; the
largest directories in my python3.9/site-packages are scipy, plotly,
numpy, pandas, speech_recognition, matplotlib, and Cython - all great
tools, but if you have a copy for 3.9, a copy for 3.10, a copy for
3.11, etc, it adds up fast.

"Ten minutes with a hacksaw will save you thirty with a shovel"
-- Miss Pauling, discussing the art of uninstalling something

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Changing calling sequence

2022-05-14 Thread Chris Angelico

On Sun, 15 May 2022 at 14:27, dn  wrote:
>
> On 15/05/2022 11.34, 2qdxy4rzwzuui...@potatochowder.com wrote:
> > On 2022-05-15 at 10:22:15 +1200,
> > dn  wrote:
> >
> >> That said, a function which starts with a list of ifs-buts-and-maybes*
> >> which are only there to ascertain which set of arguments have been
> >> provided by the calling-routine; obscures the purpose/responsibility
> >> of the function and decreases its readability (perhaps not by much,
> >> but varying by situation).
> >
> > Agreed.
> >
> >> Accordingly, if the function is actually a method, recommend following
> >> @Stefan's approach, ie multiple-constructors. Although, this too can
> >> result in lower readability.
> >
> > (Having proposed that approach myself (and having used it over the
> > decades for functions, methods, procedures, constructors, ...), I also
> > agree.)
> >
> > Assuming good names,¹ how can this lead to lower readability?  I guess
> > if there's too many of them, or programmers have to start wondering
> > which one to use?  Or is this in the same generally obfuscating category
> > as the ifs-buts-and-maybes at the start of a function?
> >
> > ¹ and properly invalidated caches
>
> Allow me to extend the term "readability" to include "comprehension".
> Then add the statistical expectation that a class has only __init__().

(Confusing wording here: a class usually has far more than just
__init__, but I presume you mean that the signature of __init__ is the
only way to construct an object of that type.)

> Thus, assuming this is the first time (or, ... for a while) that the
> class is being employed, one has to read much further to realise that
> there are choices of constructor.

Yeah. I would generally say, though, that any classmethod should be
looked at as a potential alternate constructor, or at least an
alternate way to obtain objects (eg preconstructed objects with
commonly-used configuration - imagine a SecuritySettings class with a
classmethod to get different defaults).

> Borrowing from the earlier example:
>
> >   This would be quite pythonic. For example, "datetime.date"
> >   has .fromtimestamp(timestamp), .fromordinal(ordinal),
> >   .fromisoformat(date_string), ...
>
> Please remember that this is only relevant if the function is actually a
> module - which sense does not appear from the OP (IMHO).
>
> The alternatives' names are well differentiated and (apparently#)
> appropriately named*.
>
>
> * PEP-008 hobgoblins will quote:
> "Function names should be lowercase, with words separated by underscores
> as necessary to improve readability.

Note the "as necessary". Underscores aren't required when readability
is fine without them (see for instance PEP 616, which recently added
two methods to strings "removeprefix" and "removesuffix", no
underscores - part of the argument here was consistency with other
string methods, but it's also not a major problem for readability
here).

> Variable names follow the same convention as function names."
> - but this is a common observation/criticism of code that has been in
> the PSL for a long time.
>
> # could also criticise as not following the Software Craftsmanship/Clean
> Code ideal of 'programming to the interface rather than the
> implementation' - which we see in PEP-008 as "usage rather than
> implementation"
> (but please don't ask me how to differentiate between them, given that
> the only reason for the different interfaces is the
> function's/parameters' implementation!)
>
> NB usual caveats apply to PEP-008 quotations!

Notably here, the caveat that PEP 8 is not a permanent and unchanging
document. It is advice, not rules, and not all code in the standard
library fully complies with its current recommendations.

> Continuing the 'have to read further' criticism (above), it could
> equally-well be applied to my preference for keyword-arguments, in that
> I've suggested defining four parameters but the user will only call the
> function with either three or one argument(s). Could this be described
> as potentially-confusing?

Yes, definitely. Personally, I'd split it into two, one that takes the
existing three arguments (preferably with the same name, for
compatibility), and one with a different name that takes just the one
arg. That could be a small wrapper that calls the original, or the
original could become a wrapper that calls the new one, or the main
body could be refactored into a helper that they both call. It all
depends what makes the most sense internally, because that's not part
of the API at that point.

But it does depend on how the callers operate. Sometimes it's easier
to have a single function with switchable argument forms, other times
it's cleaner to separate them.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-11 Thread Chris Angelico

On Thu, 12 May 2022 at 07:27, Marco Sulla  wrote:
>
> On Wed, 11 May 2022 at 22:09, Chris Angelico  wrote:
> >
> > Have you actually checked those three, or do you merely suppose them to be 
> > true?
>
> I only suppose, as I said. I should do some benchmark and some other
> tests, and, frankly, I don't want to. I don't want to because I'm
> quite sure the implementation is fast, since it reads by chunks and
> cache them. I'm not sure it's 100% free of bugs, but the concept is
> very simple, since it simply mimics the *nix tail, so it should be
> reliable.

If you don't care enough to benchmark it or even debug it, why should
anyone else care?

I'm done discussing. You think that someone else should have done this
for you, but you aren't even willing to put in the effort to make this
useful to anyone else. Just use it yourself and have done with it.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-11 Thread Chris Angelico

On Thu, 12 May 2022 at 06:03, Marco Sulla  wrote:
> I suppose this function is fast. It reads the bytes from the file in chunks
> and stores them in a bytearray, prepending them to it. The final result is
> read from the bytearray and converted to bytes (to be consistent with the
> read method).
>
> I suppose the function is reliable. File is opened in binary mode and only
> b"\n" is searched as line end, as *nix tail (and python readline in binary
> mode) do. And bytes are returned. The caller can use them as is or convert
> them to a string using the encoding it wants, or do whatever its
> imagination can think :)
>
> Finally, it seems to me the function is quite simple.
>
> If all my affirmations are true, the three obstacles written by Chris
> should be passed.

Have you actually checked those three, or do you merely suppose them to be true?

> I'd very much like to see a CPython implementation of that function. It
> could be a method of a file object opened in binary mode, and *only* in
> binary mode.
>
> What do you think about it?

Still not necessary. You can simply have it in your own toolkit. Why
should it be part of the core language? How much benefit would it be
to anyone else? All the same assumptions are still there, so it still
isn't general, and you may as well just *code to your own needs* like
I've been saying all along. This does not need to be in the standard
library. Do what you need, assume what you can safely assume, and
other people can write different code.

I don't understand why this wants to be in the standard library.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: [Python-ideas] Re: New Tool Proposal

2022-05-10 Thread Chris Angelico

On Tue, 10 May 2022 at 19:57, anthony.flury
 wrote:
>
>
> On 10/05/2022 09:20, Chris Angelico wrote:
>
> On Tue, 10 May 2022 at 18:06, anthony.flury via Python-ideas
>  wrote:
>
> A proposal for a new tool to be implemented  -
>
> It is often the case that developer write Code in Python and then convert to 
> a C extension module for performance regions.
>
> A C extension module has a lot of boiler plate code - for instance the 
> Structures required for each class, the functions for Module initialization 
> etc.
>
> My Idea is a simple tool that uses introspection tools to take a Python 
> module and to generate the relevant boiler plate for the module - including 
> blank functions for the module classes and for methods. This tool would use 
> type annotations (if given) to make sensible choices for parameter and 
> attribute types, including using int and float directly rather than Internal 
> objects (depending on tool options).
>
> Yep, that's an awesome idea! Are you aware of Cython? You might be
> able to make use of that.
>
> Chris, Thank you.
>
> I am aware of Cython but that isn't quite what I had in mind. I want a tool 
> for a developer who doesn't want to continue to support the Python 
> 'prototype' for whatever reason, ie where they want a complete conversion to 
> C.
>
> It might even be possible with inspection of the AST to write some of the 
> code inside the C functions - but that is not for release 0.1 :-)
>

You may still be able to take advantage of Cython as part of the
process. One thing that's really cool about source code is that,
fundamentally, it's all text... and Python is *great* at manipulating
text files :) It might be that you can write a script that transforms
a Python module into a Cython module, which can then be compiled
as-is, or further processed as needed.

BTW, not sure which list you're intending to discuss this on, so I'm
just replying on the same list you sent this message to.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-09 Thread Chris Angelico

On Tue, 10 May 2022 at 07:07, Barry  wrote:
> POSIX tail just prints the bytes to the output that it finds between \n bytes.
> At no time does it need to care about encodings as that is a problem solved
> by the terminal software. I would not expect utf-16 to work with tail on
> linux systems.

UTF-16 ASCII seems to work fine on my system, which probably means the
terminal is just ignoring all the NUL bytes. But if there's a random
0x0A anywhere, it would probably be counted as a line break.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-09 Thread Chris Angelico

On Tue, 10 May 2022 at 05:12, Marco Sulla  wrote:
>
> On Mon, 9 May 2022 at 19:53, Chris Angelico  wrote:
> >
> > On Tue, 10 May 2022 at 03:47, Marco Sulla  
> > wrote:
> > >
> > > On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
> > > >
> > > > The point here is that text is a very different thing. Because you
> > > > cannot seek to an absolute number of characters in an encoding with
> > > > variable sized characters. _If_ you did a seek to an arbitrary number
> > > > you can end up in the middle of some character. And there are encodings
> > > > where you cannot inspect the data to find a character boundary in the
> > > > byte stream.
> > >
> > > Ooook, now I understand what you and Barry mean. I suppose there's no
> > > reliable way to tail a big file opened in text mode with a decent 
> > > performance.
> > >
> > > Anyway, the previous-previous function I posted worked only for files
> > > opened in binary mode, and I suppose it's reliable, since it searches
> > > only for b"\n", as readline() in binary mode do.
> >
> > It's still fundamentally impossible to solve this in a general way, so
> > the best way to do things will always be to code for *your* specific
> > use-case. That means that this doesn't belong in the stdlib or core
> > language, but in your own toolkit.
>
> Nevertheless, tail is a fundamental tool in *nix. It's fast and
> reliable. Also the tail command can't handle different encodings?

Like most Unix programs, it handles bytes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-09 Thread Chris Angelico

On Tue, 10 May 2022 at 03:47, Marco Sulla  wrote:
>
> On Mon, 9 May 2022 at 07:56, Cameron Simpson  wrote:
> >
> > The point here is that text is a very different thing. Because you
> > cannot seek to an absolute number of characters in an encoding with
> > variable sized characters. _If_ you did a seek to an arbitrary number
> > you can end up in the middle of some character. And there are encodings
> > where you cannot inspect the data to find a character boundary in the
> > byte stream.
>
> Ooook, now I understand what you and Barry mean. I suppose there's no
> reliable way to tail a big file opened in text mode with a decent performance.
>
> Anyway, the previous-previous function I posted worked only for files
> opened in binary mode, and I suppose it's reliable, since it searches
> only for b"\n", as readline() in binary mode do.

It's still fundamentally impossible to solve this in a general way, so
the best way to do things will always be to code for *your* specific
use-case. That means that this doesn't belong in the stdlib or core
language, but in your own toolkit.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-08 Thread Chris Angelico

On Mon, 9 May 2022 at 05:49, Marco Sulla  wrote:
> Anyway, apart from my implementation, I'm curious if you think a tail
> method is worth it to be a method of the builtin file objects in
> CPython.

Absolutely not. As has been stated multiple times in this thread, a
fully general approach is extremely complicated, horrifically
unreliable, and hopelessly inefficient. The ONLY way to make this sort
of thing any good whatsoever is to know your own use-case and code to
exactly that. Given the size of files you're working with, for
instance, a simple approach of just reading the whole file would make
far more sense than the complex seeking you're doing. For reading a
multi-gigabyte file, the choices will be different.

No, this does NOT belong in the core language.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-08 Thread Chris Angelico

On Mon, 9 May 2022 at 04:15, Barry Scott  wrote:
>
>
>
> > On 7 May 2022, at 22:31, Chris Angelico  wrote:
> >
> > On Sun, 8 May 2022 at 07:19, Stefan Ram  wrote:
> >>
> >> MRAB  writes:
> >>> On 2022-05-07 19:47, Stefan Ram wrote:
> >> ...
> >>>> def encoding( name ):
> >>>>   path = pathlib.Path( name )
> >>>>   for encoding in( "utf_8", "latin_1", "cp1252" ):
> >>>>   try:
> >>>>   with path.open( encoding=encoding, errors="strict" )as file:
> >>>>   text = file.read()
> >>>>   return encoding
> >>>>   except UnicodeDecodeError:
> >>>>   pass
> >>>>   return "ascii"
> >>>> Yes, it's potentially slow and might be wrong.
> >>>> The result "ascii" might mean it's a binary file.
> >>> "latin-1" will decode any sequence of bytes, so it'll never try
> >>> "cp1252", nor fall back to "ascii", and falling back to "ascii" is wrong
> >>> anyway because the file could contain 0x80..0xFF, which aren't supported
> >>> by that encoding.
> >>
> >>  Thank you! It's working for my specific application where
> >>  I'm reading from a collection of text files that should be
> >>  encoded in either utf_8, latin_1, or ascii.
> >>
> >
> > In that case, I'd exclude ASCII from the check, and just check UTF-8,
> > and if that fails, decode as Latin-1. Any ASCII files will decode
> > correctly as UTF-8, and any file will decode as Latin-1.
> >
> > I've used this exact fallback system when decoding raw data from
> > Unicode-naive servers - they accept and share bytes, so it's entirely
> > possible to have a mix of encodings in a single stream. As long as you
> > can define the span of a single "unit" (say, a line, or a chunk in
> > some form), you can read as bytes and do the exact same "decode as
> > UTF-8 if possible, otherwise decode as Latin-1" dance. Sure, it's not
> > perfectly ideal, but it's about as good as you'll get with a lot of
> > US-based servers. (Depending on context, you might use CP-1252 instead
> > of Latin-1, but you might need errors="replace" there, since
> > Windows-1252 has some undefined byte values.)
>
> There is a very common error on Windows that files and especially web pages 
> that
> claim to be utf-8 are in fact CP-1252.
>
> There is logic in the HTML standards to try utf-8 and if it fails fall back 
> to CP-1252.
>
> Its usually the left and "smart" quote chars that cause the issue as they code
> as an invalid utf-8.
>

Yeah, or sometimes, there isn't *anything* in UTF-8, and it has some
sort of straight-up lie in the form of a meta tag. It's annoying. But
the same logic still applies: attempt one decode (UTF-8) and if it
fails, there's one fallback. Fairly simple.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-07 Thread Chris Angelico

On Sun, 8 May 2022 at 07:19, Stefan Ram  wrote:
>
> MRAB  writes:
> >On 2022-05-07 19:47, Stefan Ram wrote:
> ...
> >>def encoding( name ):
> >>path = pathlib.Path( name )
> >>for encoding in( "utf_8", "latin_1", "cp1252" ):
> >>try:
> >>with path.open( encoding=encoding, errors="strict" )as file:
> >>text = file.read()
> >>return encoding
> >>except UnicodeDecodeError:
> >>pass
> >>return "ascii"
> >>Yes, it's potentially slow and might be wrong.
> >>The result "ascii" might mean it's a binary file.
> >"latin-1" will decode any sequence of bytes, so it'll never try
> >"cp1252", nor fall back to "ascii", and falling back to "ascii" is wrong
> >anyway because the file could contain 0x80..0xFF, which aren't supported
> >by that encoding.
>
>   Thank you! It's working for my specific application where
>   I'm reading from a collection of text files that should be
>   encoded in either utf_8, latin_1, or ascii.
>

In that case, I'd exclude ASCII from the check, and just check UTF-8,
and if that fails, decode as Latin-1. Any ASCII files will decode
correctly as UTF-8, and any file will decode as Latin-1.

I've used this exact fallback system when decoding raw data from
Unicode-naive servers - they accept and share bytes, so it's entirely
possible to have a mix of encodings in a single stream. As long as you
can define the span of a single "unit" (say, a line, or a chunk in
some form), you can read as bytes and do the exact same "decode as
UTF-8 if possible, otherwise decode as Latin-1" dance. Sure, it's not
perfectly ideal, but it's about as good as you'll get with a lot of
US-based servers. (Depending on context, you might use CP-1252 instead
of Latin-1, but you might need errors="replace" there, since
Windows-1252 has some undefined byte values.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

2022-05-07 Thread Chris Angelico

On Sun, 8 May 2022 at 04:37, Marco Sulla  wrote:
>
> On Sat, 7 May 2022 at 19:02, MRAB  wrote:
> >
> > On 2022-05-07 17:28, Marco Sulla wrote:
> > > On Sat, 7 May 2022 at 16:08, Barry  wrote:
> > >> You need to handle the file in bin mode and do the handling of line 
> > >> endings and encodings yourself. It’s not that hard for the cases you 
> > >> wanted.
> > >
> >  "\n".encode("utf-16")
> > > b'\xff\xfe\n\x00'
> >  "".encode("utf-16")
> > > b'\xff\xfe'
> >  "a\nb".encode("utf-16")
> > > b'\xff\xfea\x00\n\x00b\x00'
> >  "\n".encode("utf-16").lstrip("".encode("utf-16"))
> > > b'\n\x00'
> > >
> > > Can I use the last trick to get the encoding of a LF or a CR in any 
> > > encoding?
> >
> > In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes
> > could be little-endian or big-endian.
> >
> > As you didn't specify which you wanted, it defaulted to little-endian
> > and added a BOM (U+FEFF).
> >
> > If you specify which endianness you want with "utf-16le" or "utf-16be",
> > it won't add the BOM:
> >
> >  >>> # Little-endian.
> >  >>> "\n".encode("utf-16le")
> > b'\n\x00'
> >  >>> # Big-endian.
> >  >>> "\n".encode("utf-16be")
> > b'\x00\n'
>
> Well, ok, but I need a generic method to get LF and CR for any
> encoding an user can input.
> Do you think that
>
> "\n".encode(encoding).lstrip("".encode(encoding))
>
> is good for any encoding?

No, because it is only useful for stateless encodings. Any encoding
which uses "shift bytes" that cause subsequent bytes to be interpreted
differently will simply not work with this naive technique. Also,
you're assuming that the byte(s) you get from encoding LF will *only*
represent LF, which is also not true for a number of other encodings -
they might always encode LF to the same byte sequence, but could use
that same byte sequence as part of a multi-byte encoding. So, no, for
arbitrarily chosen encodings, this is not dependable.

> Furthermore, is there a way to get the
> encoding of an opened file object?

Nope. That's fundamentally not possible. Unless you mean in the
trivial sense of "what was the parameter passed to the open() call?",
in which case f.encoding will give it to you; but to find out the
actual encoding, no, you can't.

The ONLY way to 100% reliably decode arbitrary text is to know, from
external information, what encoding it is in. Every other scheme
imposes restrictions. Trying to do something that works for absolutely
any encoding is a doomed project.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 4626 matches

Mail list logo