[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-02 Thread Marc-Andre Lemburg

On 01.06.2023 20:06, David Mertz, Ph.D. wrote:

I guess this is pretty general for the described need:


%time unicode_whitespace = [chr(c) for c in range(0x) if unicodedata.category(chr(c)) 
== "Zs"]


Use sys.maxunicode instead of 0x


CPU times: user 19.2 ms, sys: 0 ns, total: 19.2 ms
Wall time: 18.7 ms

unicode_whitespace

[' ', '\xa0', '\u1680', '\u2000', '\u2001', '\u2002', '\u2003',
'\u2004', '\u2005', '\u2006', '\u2007', '\u2008', '\u2009', '\u200a',
'\u202f', '\u205f', '\u3000']

It's milliseconds not nanoseconds, but presumably something you do
once at the start of an application.  Can anyone think of a more
efficient and/or more concise way of doing this?


There isn't. You essentially have to scan the entire database for 
whitespacy chars.



This definitely feels better than making a static sequence of
characters since the Unicode Consortium may (and has) changed the
definition. 


Which was my point: including the above in a stdlib module wouldn't make 
sense, since it increases module load time (and possibly startup time), 
so it's better to generate a string and put this verbatim into the 
application.


However, this would have to be part of the Unicode database update dance 
and whitespace is only possible category of chars which would be 
interesting. Digits or numbers are another, letter, linebreaks, symbols, 
etc. others:


https://www.unicode.org/reports/tr44/#GC_Values_Table

It's better to put this into the application in question or to have 
someone maintain such collections outside the stdlib in a package on PyPI.


--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Jun 02 2023)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NPO3RLDFXP7IWHP6X54GXTF6CYKOY75U/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Add a .whitespace property to module unicodedata

2023-06-01 Thread Marc-Andre Lemburg

On 01.06.2023 18:18, Paul Moore wrote:
On Thu, 1 Jun 2023 at 15:09, Antonio Carlos Jorge Patricio 
mailto:antonio...@gmail.com>> wrote:


I suggest including a simple str variable in unicodedata module to
mirror string.whitespace, so it would contain all characters defined
in CPython function

[_PyUnicode_IsWhitespace()](https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h#L6314
 <https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h#L6314>) 
so that:

  # existent
string.whitespace = ' \t\n\r\x0b\x0c'

# proposed
unicodedata.whitespace = '
\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f\x85\xa0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000' 



What's the use case? I can't think of a single occasion when I would 
have found this useful.


Same here.

For those few cases, where it might be useful, you can easily put the 
string into your application code.


Putting this into the stdlib would just mean that we'd have to recheck 
whether new Unicode whitespace chars were added, every time the standard 
upgrades. With ASCII, this won't happen in the foreseeable future ;-)


--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Jun 01 2023)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/REMDZ2SVFVOIDEJYX3VSB2WUZTQPTTLM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Ampersand operator for strings

2023-03-06 Thread Marc-Andre Lemburg

On 06.03.2023 11:33, Steven D'Aprano wrote:

On Mon, Mar 06, 2023 at 10:33:26AM +0100, Marc-Andre Lemburg wrote:


def join_words(list_of_words)
 return ' '.join([x.strip() for x in list_of_words])


That's not Rob's suggestion either.


I know, but as I mentioned, I use the above often, whereas I find Rob's 
definition not very intuitive or useful.



Rob's suggestion is an operator which concats two substrings with
exactly one space between them, without stripping leading or trailing
whitespace of the result.

Examples:

 a = "\nHeading:"
 b = "Result\n\n"
 a & b

would give "\nHeading: Result\n\n"

 s = "my hovercraft\n"
 t = "is full of eels\n"
 s & t

would give "my hovercraft is full of eels\n"

I find the concept is very easy to understand: "concat with exactly one
space between the operands".  But I must admit I'm struggling to think
of cases where I would use it.

I like the look of the & operator for concatenation, so I want to like
this proposal. But I think I will need to see real world code to
understand when it would be useful.


--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Mar 06 2023)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/E7DCRBG5I7RJPIHWSNDZPEBG7UVJEGMR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Ampersand operator for strings

2023-03-06 Thread Marc-Andre Lemburg

On 02.03.2023 18:27, Rob Cliffe via Python-ideas wrote:
Tl;dr: Join strings together with exactly one space between non-blank 
text where they join.


I propose a meaning for
     s1 & s2
where s1 and s2 are strings.
Namely, that it should be equivalent to
     s1.rstrip() + (' ' if (s1.strip() and s2.strip()) else '') + 
s2.lstrip()

Informally, this will join the two strings together with exactly one space
between the last non-blank text in s1 and the first non-blank text in s2.
Example:  " bar " & "    foo    "    ==    " bar 
foo    "


I don't find these semantics particularly intuitive. Python already has 
the + operator for concatenating strings and this doesn't apply any 
stripping.


If you're emphasizing on joining words with single space delimiters, 
then the usual:


def join_words(list_of_words)
return ' '.join([x.strip() for x in list_of_words])

works much better. You can also apply this recursively, if needed, or 
add support for list_of_phrases (first splitting these into a 
list_of_words).


The advantage of join_words() is that it's easy to understand and 
applies stripping in a concise way. I use such helpers all the time.


--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Mar 06 2023)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IQWVHHNXNFFVWCSSLHWVFF5LCAPVSR2J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: int.to_base, int.from_base

2022-05-02 Thread Marc-Andre Lemburg
On 02.05.2022 08:54, Chris Angelico wrote:
> On Mon, 2 May 2022 at 16:46, Serhiy Storchaka  wrote:
>>
>> 02.05.22 08:03, Chris Angelico пише:
>>> Let's not go as far as a PEP yet, and figure out a couple of things:
>>
>> A PEP is necessary if we add Roman numerals and Cyrillic numerals, and
>> Babylonian cuneiform numerals to the heap.
>>
> 
> I'm aware of PEP 313 for Roman, but not for the others. Was there a
> PEP when the int() constructor started to support other types of
> digits? I can't find one but it wouldn't surprise me.

That was a consequence of PEP 100, the addition of Unicode to the
language. There are now a lot more characters which represent digits
than we had in the 8-bit world.

Just a word of warning: numeric bases are not necessarily the same
as numeric encodings. The latter usually come with other formatting
criteria in addition to representing numeric values, e.g. base64 is
an encoding and not the same as representing numbers in base 64.

We have the binascii module for the encodings.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, May 02 2022)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/57LZNQOJISWBWKXHIRSZVXYAKLDTUXMS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Giving Decimal a global context was a mistake?

2022-04-07 Thread Marc-Andre Lemburg
On 07.04.2022 20:55, David Mertz, Ph.D. wrote:
> On Thu, Apr 7, 2022 at 2:47 PM Marc-Andre Lemburg  <mailto:m...@egenix.com>> wrote:
> 
> In high finance, I've never seen decimals being used, only floats.
> Excel is omnipresent, sets the standards and uses IEEE 754 floats
> as well (plus some black magic which sometimes helps, but often
> makes things worse):
> 
> 
> In forex, instantaneous exchange rates are defined as a specific number of
> decimal digits in a currencyA/currencyB exchange rate (on a particular 
> market).
> 
> This is about US$6.6 trillion/day governed by these rules... FAR more than the
> combined size of ALL securities markets.
> 
> It was something like 2007 when the NYSE moved from prices in 1/32 penny to a
> fixed-length decimal representation.

... and then you end up with problems such as these:

https://www.wsj.com/articles/berkshire-hathaways-stock-price-is-too-much-for-computers-11620168548

(a good problem to have, BTW)

Seriously, the actual trades will still use fixed numbers in many
cases (after all, each trade is a legal contract), but everything else
leading up to the trades tends to use floats: models and other decision
making tools, pricing, risk calculation, hedging, etc. etc.

But that's just one application space. There are plenty others where
decimal are used.

Still, to get back to the original topic, in most cases, a fixed
high enough precision is usually enough to keep applications, users,
authorities and regulators happy, so the concept of a global
default works well.

You can easily round decimals of this higher precision to whatever
precision you need for a particular purpose or I/O, without changing
the context precision.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Apr 07 2022)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/APO4P2C766QTIECCMX4SI3DZNDYV4WUZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Giving Decimal a global context was a mistake?

2022-04-07 Thread Marc-Andre Lemburg
There's theory and math, and then there's reality.

In reality, some accounting systems use decimals with fixed
precision for certain aspects and apply predefined rounding
(usually defined in the contracts between the counterparties
or in accounting/tax regulations), while others use IEEE 754 double
precision floats. Rounding errors are dealt with by booking
corrections where necessary.

As an example, it's possible that VAT regulations mandate to
do the VAT calculation at the per item level (including
rounding at that level) and not at the summary level. This can
result in significant differences when you have to deal with
lots of small amounts. The VAT sum will diverge considerably
from the VAT you'd do get from using the sum of the net items
as basis - but this is intended.

In high finance, I've never seen decimals being used, only floats.
Excel is omnipresent, sets the standards and uses IEEE 754 floats
as well (plus some black magic which sometimes helps, but often
makes things worse):
https://en.wikipedia.org/wiki/Numeric_precision_in_Microsoft_Excel

As a result, there's no one-fits-all answer to decimal vs.
floats. It depends on your use and the context in which you have
to apply math operations.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Apr 07 2022)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MXBELPGVMP7NSA6XT3FPEL6KWZKEJ4S3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Giving Decimal a global context was a mistake?

2022-04-07 Thread Marc-Andre Lemburg
On 07.04.2022 02:41, Greg Ewing wrote:
> On 6/04/22 8:58 pm, Mark Dickinson via Python-ideas wrote:
>> I'd be curious to know what alternatives you see. When a user writes `x + y`
>> with both `x` and `y` instances of `decimal.Decimal`, the decimal module 
>> needs
>> to know what precision to compute the result to (as well as what rounding 
>> mode
>> to use, etc.). Absent a thread-local context or task-local context, where
>> would that precision information come from?
> 
> I'm not sure, but my feeling is that if I want to limit results
> to a specific number of digits, I'm going to want much finer
> grained control, like specifying it for each individual
> operation. The current design doesn't fit any use case I can
> see myself needing.

GMP uses a smarter approach
(https://gmplib.org/manual/Floating_002dpoint-Functions):

- GMP floats are mutable objects
- all floats have a variable precision and you can even adjust
  the precision after creation
- operations put the result into an existing float object
  (with defined precision)

Of course, this is not what a Python user would expect (numbers
in Python are usually immutable), so using the approach directly
would break "Python" intuition and likely cause many weird
errors down the line.

However, in practice, you rarely need decimals with more than 64 bits
(or some other fixed upper limit) precision, so the global context
works just fine and you can always adjust the precision for output
purposes at the I/O boundaries of your application.

The MPFR library, which uses a similar strategy for numbers as GMP,
adds more flexibility by also providing a rounding
context (https://www.mpfr.org/#intro). MPFR provides a global default
rounding mode and also allows a per operation rounding mode.

Again, applications will typically just use one rounding method
for consistency purposes, so a global context works well in practice.

Certain algorithms may require special handling of both precision
and rounding, but for those, you can either use a thread-local
or task-local context which you only enable while the algorithm
is running.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Apr 07 2022)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GA53LN4LGZQJFCQM7GC6BPIE3JWSCOPK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Runtime-accessible attribute docstrings

2021-11-18 Thread Marc-Andre Lemburg
On 17.11.2021 15:26, tmkehrenb...@gmail.com wrote:
> Hi all,
> 
> I have seen discussion of docstrings for class attributes before on this
> list, but not with this exact proposal.
> 
> My motivation is that I have a dataclass for which I want to write
> docstrings that can be accessed at runtime.  In my specific case, the
> docstrings would be used to create a help message.
> 
> Sphinx supports a syntax to document class attributes.  It looks like
> this:
> 
> @dataclass
> class A:
> """Docstring for class A."""
> x: int
> """Docstring for x"""
> y: bool = True
> "Docstring for y"

See https://www.python.org/dev/peps/pep-0224/

Perhaps it's time to revive the idea for its 20th anniversary :-)

> It is a bit awkward that the docstring is *below* the definition of the
> attribute, but it can't be put above because it could be confused for
> the class docstring.
> 
> My proposal would be to just enshrine this syntax as the syntax for
> attribute docstring, and to make them available at runtime.  They would
> be stored in a dictionary like the type annotations.  For example like
> this:
> 
> A.__attrdoc__ == {"x": "Docstring for x", "y": "Docstring for y"}
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 18 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/
____________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HLBTXCBOTQYD7JJ5WVH3TZEVJEB6H2XB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Python Shared Objects

2021-10-27 Thread Marc-Andre Lemburg
On 25.10.2021 21:40, byk...@gmail.com wrote:
> Due to https://www.python.org/dev/peps/pep-0554/ multi-interpreters
> implementation going really slow, I had the audicity to try an alternative 
> route
> towards the same objective of implementing multicore support of python:
> instead of sharing the memory by running multiple threads, I employed
> an interprocess shared memory with multiple processes.
> 
> I know there are multiprocessing.sharedctypes and 
> multiprocessing.shared_memory,
> but I went much deeper into it by solving two problems they failed to solve:
> sharing of complex dynamic objects and synchronization of data access.
> 
> I have a working prototype to show: 
> https://github.com/byko3y/python-shared-objects
> It's a kind of software transactional memory within shared memory. It has a 
> basic support
> for fundamental python types (bool, int, str, bytes, tuple, list, dict, 
> object),
> for both atomic and non-atomic transactions via fine-grained RW-locks, has a 
> built-in
> protection against deadlock and starvation.
> 
> Most of my code can also be used for cross-interpreter communication after 
> PEP 554
> is successfully implemented, since cross-interpreter communication is still 
> an open question.

This looks interesting. The 32-bit limitation is a bit of a bummer, but I
suppose that can be lifted, right ?

Some additional pointers for inspiration:

- Here's an old project trying to do more or less the same:
http://poshmodule.sourceforge.net/

- Another newer one, which is specific to numpy arrays:
https://pypi.org/project/SharedArray/

- For more general purpose types, there's Apache Arrow's
Plasma store:
https://arrow.apache.org/docs/python/plasma.html

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 27 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XTDWVI4LHCBYP6LIRRINVXOZZOGI4OCY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 671: Syntax for late-bound function argument defaults

2021-10-26 Thread Marc-Andre Lemburg
On 26.10.2021 18:36, Erik Demaine wrote:
> On Tue, 26 Oct 2021, Marc-Andre Lemburg wrote:
> 
>> Now, it may not be obvious, but the key advantage of such
>> deferred objects is that you can pass them around, i.e. the
>> "defer os.listdir(DEFAULT_DIR)" could also be passed in via
>> another function.
> 
> Are deferred code pieces are dynamically scoped, i.e., they are evaluated in
> whatever scope they end up getting evaluated?  That would certainly 
> interesting,
> but also kind of dangerous (about as dangerous as eval), and I imagine fairly
> prone to error if they get passed around a lot. 

Yes, they would work more or less like copy & pasting the deferred
code into a new context and running it there.

Sure, you can abuse this, but the function running the deferred
can make sure that it's working in a trusted environment.

> If they're *not* dynamically
> scoped, then I think they're equivalent to lambda, and then they don't solve 
> the
> default parameter problem, because they'll be evaluated in the function's
> enclosing scope instead of the function's scope.

Indeed. Lambdas are similar, but not the same. The important part is
running the code in a different context.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 26 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NUA442RSNDSHKEYNUINRY727XT7OVS63/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 671: Syntax for late-bound function argument defaults

2021-10-26 Thread Marc-Andre Lemburg
On 26.10.2021 10:54, Marc-Andre Lemburg wrote:
> [...]
> For the object version, the string would have to be compiled
> as well and then executed at the top of the function somehow :-)
> 
> I think for the latter, we'd need a more generic concept of
> deferred execution in Python, but even then, you'd not really
> save typing:
> 
> def process_files(processor, files=defer os.listdir(DEFAULT_DIR)):
> if deferred(files): files = eval(files)
> ...
> 
> The details are more complex than the above, but it demonstrates
> the idea.
> 
> Note that eval() would evaluate an already compiled expression
> encapsulated in a deferred object, so it's not slow or dangerous
> to use.
> 
> Now, it may not be obvious, but the key advantage of such
> deferred objects is that you can pass them around, i.e. the
> "defer os.listdir(DEFAULT_DIR)" could also be passed in via
> another function.

Here's a better way to write the above pseudo-code, which makes
the intent clearer:

def process_files(processor, files=defer os.listdir(DEFAULT_DIR)):
if isdeferred(files): files = files.eval()
...

isdeferred() would simply check the object for being a deferred
object.

I'm using "eval" for lack of a better word to say "please run
the deferred code now and in this context)". Perhaps a second
keyword could be used to wrap the whole "if isdeferred()..."
dance into something more intuitive.

Here's an old recipe which uses this concept:

https://code.activestate.com/recipes/502206/

BTW: While thinking about defer some more, I came up with this
alternative syntax for your proposal:

def process_files(processor, defer files=os.listdir(DEFAULT_DIR)):
# results in adding the deferred statement at the top of the
# function, if the parameter is not given, i.e.
if files is NotGiven: files = os.listdir(DEFAULT_DIR)
...

This has the advantage of making things a lot more obvious than the
small added ">", which is easy to miss and the main obstacle I see
with your PEP.

That said, I still like the idea to be able to "inject" expressions
into functions. This opens up lots of doors to make dynamic
programming more intuitive in Python.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 26 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZFVTQF7HARVSGFH7XRUKSK22KB6LW2HZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 671: Syntax for late-bound function argument defaults

2021-10-26 Thread Marc-Andre Lemburg
On 25.10.2021 15:44, Chris Angelico wrote:
> On Mon, Oct 25, 2021 at 11:53 PM Marc-Andre Lemburg  wrote:
>>
>> On 25.10.2021 14:26, Chris Angelico wrote:
>>> On Mon, Oct 25, 2021 at 11:20 PM Marc-Andre Lemburg  wrote:
>>>>
>>>> On 25.10.2021 13:53, Chris Angelico wrote:
>>>>> On Mon, Oct 25, 2021 at 10:39 PM Marc-Andre Lemburg  
>>>>> wrote:
>>>>>> I would prefer to not go down this path.
>>>>>>
>>>>>> "Explicit is better than implicit" and this is too much "implicit"
>>>>>> for my taste :-)
>>>>>>
>>>>>> For simple use cases, this may save a few lines of code, but as soon
>>>>>> as you end up having to think whether the expression will evaluate to
>>>>>> the right value at function call time, the scope it gets executed
>>>>>> in, what to do with exceptions, etc., you're introducing too much
>>>>>> confusion with this syntax.
>>>>>
>>>>> It's always possible to be more "explicit", as long as explicit means
>>>>> "telling the computer precisely what to do". But Python has default
>>>>> arguments for a reason. Instead of simply allowing arguments to be
>>>>> optional, and then ALWAYS having code inside the function to provide
>>>>> values when they are omitted, Python allows us to provide actual
>>>>> default values that are visible to the caller (eg in help()). This is
>>>>> a good thing. Is it "implicit"? Yes, in a sense. But it's very clear
>>>>> what happens if the argument is omitted. The exact same thing is true
>>>>> with these defaults; you can see what happens.
>>>>>
>>>>> The only difference is whether it is a *value* or an *expression* that
>>>>> defines the default. Either way, if the argument is omitted, the given
>>>>> default is used instead.
>>>>
>>>> I guess I wasn't clear enough. What I mean with "implicit" is that
>>>> execution of the expression is delayed by simply adding a ">" to
>>>> the keyword default parameter definition.
>>>>
>>>> Given that this alters the timing of evaluation, a single character
>>>> does not create enough attention to make this choice explicit.
>>>>
>>>> If I instead write:
>>>>
>>>> def process_files(processor, files=deferred(os.listdir(DEFAULT_DIR))):
>>
>> def process_files(processor, files=deferred("os.listdir(DEFAULT_DIR)")):
>>
>> @deferred(files="os.listdir(DEFAULT_DIR)")
> 
> Ahhh, okay. Now your explanation makes sense :)
> 
> This does deal with the problem of function calls looking like
> function calls. It comes at the price of using a string to represent
> code, so unless it has compiler support, it's going to involve eval(),
> which is quite inefficient. (And if it has compiler support, it should
> have syntactic support too, otherwise you end up with weird magical
> functions that don't do normal things.)

The decorator version would not need eval, since the decorator
would actually rewrite the function to include the parameter
defaulting logic right at the top of the function and recompile
it.

For the object version, the string would have to be compiled
as well and then executed at the top of the function somehow :-)

I think for the latter, we'd need a more generic concept of
deferred execution in Python, but even then, you'd not really
save typing:

def process_files(processor, files=defer os.listdir(DEFAULT_DIR)):
if deferred(files): files = eval(files)
...

The details are more complex than the above, but it demonstrates
the idea.

Note that eval() would evaluate an already compiled expression
encapsulated in a deferred object, so it's not slow or dangerous
to use.

Now, it may not be obvious, but the key advantage of such
deferred objects is that you can pass them around, i.e. the
"defer os.listdir(DEFAULT_DIR)" could also be passed in via
another function.

>>> It's also extremely verbose, given that it's making a very small
>>> difference to the behaviour - all it changes is when something is
>>> calculated (and, for technical reasons, where; but I expect that
>>> intuition will cover that).
>>
>> It is verbose indeed, which is why I still think that putting such
>> code directly at the top of the function is the better way
>> to go :-)
> 
> That's what I want to avoid though. Why go with the incredibly verbose
> 

[Python-ideas] Re: PEP 671: Syntax for late-bound function argument defaults

2021-10-25 Thread Marc-Andre Lemburg
On 25.10.2021 14:26, Chris Angelico wrote:
> On Mon, Oct 25, 2021 at 11:20 PM Marc-Andre Lemburg  wrote:
>>
>> On 25.10.2021 13:53, Chris Angelico wrote:
>>> On Mon, Oct 25, 2021 at 10:39 PM Marc-Andre Lemburg  wrote:
>>>> I would prefer to not go down this path.
>>>>
>>>> "Explicit is better than implicit" and this is too much "implicit"
>>>> for my taste :-)
>>>>
>>>> For simple use cases, this may save a few lines of code, but as soon
>>>> as you end up having to think whether the expression will evaluate to
>>>> the right value at function call time, the scope it gets executed
>>>> in, what to do with exceptions, etc., you're introducing too much
>>>> confusion with this syntax.
>>>
>>> It's always possible to be more "explicit", as long as explicit means
>>> "telling the computer precisely what to do". But Python has default
>>> arguments for a reason. Instead of simply allowing arguments to be
>>> optional, and then ALWAYS having code inside the function to provide
>>> values when they are omitted, Python allows us to provide actual
>>> default values that are visible to the caller (eg in help()). This is
>>> a good thing. Is it "implicit"? Yes, in a sense. But it's very clear
>>> what happens if the argument is omitted. The exact same thing is true
>>> with these defaults; you can see what happens.
>>>
>>> The only difference is whether it is a *value* or an *expression* that
>>> defines the default. Either way, if the argument is omitted, the given
>>> default is used instead.
>>
>> I guess I wasn't clear enough. What I mean with "implicit" is that
>> execution of the expression is delayed by simply adding a ">" to
>> the keyword default parameter definition.
>>
>> Given that this alters the timing of evaluation, a single character
>> does not create enough attention to make this choice explicit.
>>
>> If I instead write:
>>
>> def process_files(processor, files=deferred(os.listdir(DEFAULT_DIR))):

def process_files(processor, files=deferred("os.listdir(DEFAULT_DIR)")):

>> it is pretty clear that something is happening at a different time
>> than function definition time :-)
>>
>> Even better: the deferred() object can be passed in as a value
>> and does not have to be defined when defining the function, since
>> the function will obviously know what to do with such deferred()
>> objects.
> 
> Actually, I consider that to be far far worse, since it looks like
> deferred() is a callable that takes the *result* of calling
> os.listdir. Maybe it would be different if it were
> deferred("os.listdir(DEFAULT_DIR)"), but now we're losing a lot of
> clarity.

Yes, sorry. I forgot to add the quotes. The idea is to take the
argument and essentially prepend the parameter processing to the
function call logic, or even build a new function with the code
added at the top.

> If it's done with syntax, it can have special behaviour. If it looks
> like a function call (or class constructor), it doesn't look like it
> has special behaviour.
> 
>>>> Having the explicit code at the start of the function is more
>>>> flexible and does not introduce such questions.
>>>
>>> Then use the explicit code! For this situation, it seems perfectly
>>> reasonable to write it either way.
>>>
>>> But for plenty of other examples, it makes a lot of sense to late-bind
>>> in a more visible way. It's for those situations that the feature
>>> would exist.
>>
>> Sure, you can always find examples where late binding may make
>> sense and it's still possible to write explicit code for this as
>> well, but that's not the point.
>>
>> By introducing new syntax, you always increase the potential for
>> readers not knowing about the new syntax, misunderstanding what the
>> syntax means, or even not paying attention to the subtleties it
>> introduces.
>>
>> So whenever new syntax is discussed, I think it's important to
>> look at it from the perspective of a user who hasn't seen it before
>> (could be a programmer new to Python or one who has not worked with
>> the new feature before).
> 
> I actually have a plan for that exact perspective. Was going to
> arrange things tonight, but it may have to wait for later in the week.

Ok :-)

>> In this particular case, I find the syntax not ideal in making
>> it clear that evaluation is deferred. It's also not in

[Python-ideas] Re: PEP 671: Syntax for late-bound function argument defaults

2021-10-25 Thread Marc-Andre Lemburg
On 25.10.2021 13:53, Chris Angelico wrote:
> On Mon, Oct 25, 2021 at 10:39 PM Marc-Andre Lemburg  wrote:
>> I would prefer to not go down this path.
>>
>> "Explicit is better than implicit" and this is too much "implicit"
>> for my taste :-)
>>
>> For simple use cases, this may save a few lines of code, but as soon
>> as you end up having to think whether the expression will evaluate to
>> the right value at function call time, the scope it gets executed
>> in, what to do with exceptions, etc., you're introducing too much
>> confusion with this syntax.
> 
> It's always possible to be more "explicit", as long as explicit means
> "telling the computer precisely what to do". But Python has default
> arguments for a reason. Instead of simply allowing arguments to be
> optional, and then ALWAYS having code inside the function to provide
> values when they are omitted, Python allows us to provide actual
> default values that are visible to the caller (eg in help()). This is
> a good thing. Is it "implicit"? Yes, in a sense. But it's very clear
> what happens if the argument is omitted. The exact same thing is true
> with these defaults; you can see what happens.
> 
> The only difference is whether it is a *value* or an *expression* that
> defines the default. Either way, if the argument is omitted, the given
> default is used instead.

I guess I wasn't clear enough. What I mean with "implicit" is that
execution of the expression is delayed by simply adding a ">" to
the keyword default parameter definition.

Given that this alters the timing of evaluation, a single character
does not create enough attention to make this choice explicit.

If I instead write:

def process_files(processor, files=deferred(os.listdir(DEFAULT_DIR))):

it is pretty clear that something is happening at a different time
than function definition time :-)

Even better: the deferred() object can be passed in as a value
and does not have to be defined when defining the function, since
the function will obviously know what to do with such deferred()
objects.

>> Having the explicit code at the start of the function is more
>> flexible and does not introduce such questions.
> 
> Then use the explicit code! For this situation, it seems perfectly
> reasonable to write it either way.
> 
> But for plenty of other examples, it makes a lot of sense to late-bind
> in a more visible way. It's for those situations that the feature
> would exist.

Sure, you can always find examples where late binding may make
sense and it's still possible to write explicit code for this as
well, but that's not the point.

By introducing new syntax, you always increase the potential for
readers not knowing about the new syntax, misunderstanding what the
syntax means, or even not paying attention to the subtleties it
introduces.

So whenever new syntax is discussed, I think it's important to
look at it from the perspective of a user who hasn't seen it before
(could be a programmer new to Python or one who has not worked with
the new feature before).

In this particular case, I find the syntax not ideal in making
it clear that evaluation is deferred. It's also not intuitive where
exactly execution will happen (before entering the function, in
which order, in a separate scope, etc).

Why not turn this into a decorator instead ?

@deferred(files=os.listdir(DEFAULT_DIR))
def process_files(processor, files=None):

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 25 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/
________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4Q7S3ATHAC5R77MRZT74XFV3VHDHWY3W/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP 671: Syntax for late-bound function argument defaults

2021-10-25 Thread Marc-Andre Lemburg
On 24.10.2021 02:13, Chris Angelico wrote:
> How to Teach This
> =
> 
> Early-bound default arguments should always be taught first, as they are the
> simpler and more efficient way to evaluate arguments. Building on them, late
> bound arguments are broadly equivalent to code at the top of the function::
> 
> def add_item(item, target=>[]):
> 
> # Equivalent pseudocode:
> def add_item(item, target=):
> if target was omitted: target = []

I would prefer to not go down this path.

"Explicit is better than implicit" and this is too much "implicit"
for my taste :-)

For simple use cases, this may save a few lines of code, but as soon
as you end up having to think whether the expression will evaluate to
the right value at function call time, the scope it gets executed
in, what to do with exceptions, etc., you're introducing too much
confusion with this syntax.

Exmple:

def process_files(processor, files=>os.listdir(DEFAULT_DIR)):

Some questions:
- What happens if dir does not exist ? How would I
  be able to process the exception in the context of process_files() ?
- Since the same code is valid without the ">", would a user
  notice that os.listdir() is called in the scope of the function
  call ?
- What if DEFAULT_DIR == '.' ? Would the user notice that the
  current work dir may have changed compared to when the module
  with the function was loaded ?

Having the explicit code at the start of the function is more
flexible and does not introduce such questions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 25 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KYQ3XFSGZOKFGPPQ5ZL4TVKWVAEIUER4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: os.workdir() context manager

2021-10-22 Thread Marc-Andre Lemburg
Just as an update to this thread:

The workdir context manager got implemented for Python 3.11
and now lives in contextlib as contextlib.chdir():

https://github.com/python/cpython/commit/3592980f9122ab0d9ed93711347742d110b749c2

Thanks to everyone who contributed to the thread, the SC for
approving it, Filipe Laíns who implemented it and others who
are still helping to sort out some smaller issues:

https://bugs.python.org/issue45545

Cheers,
-- 
Marc-Andre Lemburg


On 15.09.2021 21:41, Marc-Andre Lemburg wrote:
> On 15.09.2021 21:02, Guido van Rossum wrote:
>> To make chdir() return a context manager *and* keep it working without 
>> calling
>> `__enter__`, it would have to call `getcwd()`, which I've heard is expensive.
>>
>> So I don't think that would work, alas.
> 
> At least on Linux, the cost for os.getcwd() is similar to the cost
> of os.chdir(), but yes, since we can't have os.chdir() not change
> the dir when called, the logic would need the extra os.getcwd() call:
> 
> # python3 -m timeit -s 'import os' "os.getcwd()"
> 50 loops, best of 5: 619 nsec per loop
> # python3 -m timeit -s 'import os' "os.chdir('.')"
> 50 loops, best of 5: 726 nsec per loop
> 
> Here's simple implementation of the chdir() context manager:
> 
> import os
> import pathlib
> 
> # chdir context manager
> PlatformPath = pathlib.WindowsPath if os.name == 'nt' else pathlib.PosixPath
> class chdir(PlatformPath):
> def __init__(self, dir):
> self.dir = dir
> self.olddir = os.getcwd()
> os.chdir(dir)
> def __enter__(self):
> return self
> def __exit__(self, *exc):
> os.chdir(self.olddir)
> return False
> 
> # Normal chdir()
> path = chdir('abc/')
> print (os.getcwd())
> print (path.olddir)
> 
> # chdir() context manager
> with chdir('def/') as wd:
> print (repr(wd))
> print (os.getcwd())
> print (os.listdir('.'))
> 
> For extra perks, I made os.chdir() return a pathlib Path object
> and you get to see the old directory, so you can backtrack
> if needed, even without a context manager.
> 
> 
>> On Wed, Sep 15, 2021 at 11:55 AM Eric V. Smith > <mailto:e...@trueblade.com>> wrote:
>>
>> On 9/15/2021 2:48 PM, Eric Fahlgren wrote:
>>> On Wed, Sep 15, 2021 at 12:21 AM Eric V. Smith >> <mailto:e...@trueblade.com>> wrote:
>>>
>>> And I'm not crazy about the name "workdir". To me, it sounds like it
>>> returns something, not sets and resets something. But I don't have a
>>> particularly great alternative in mind: in my own code I've used
>>> "change_dir", which isn't awesome either.
>>>
>>>
>>> Our version is named "pushdir", modeled after shell's pushd (even though
>>> pushdir is a context manager and does auto-pop).  Everyone figures out
>>> what it does at a glance.
>>
>> That's a great name!
>>
>> Although I think having os.chdir() return a context manager is a better
>> design (assuming it can work, but at first blush it would seem so).
>>
>> Eric

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 22 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JHN5UOA3DNPI2RLWVKC2BLW5RWCOWZD4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Implementing additional string operators

2021-10-13 Thread Marc-Andre Lemburg
On 13.10.2021 20:47, Paul Moore wrote:
> On Wed, 13 Oct 2021 at 19:02, <2qdxy4rzwzuui...@potatochowder.com> wrote:
> 
>> So aside from filename extensions, what are the real use cases for
>> suffix removal?  Plurals?  No, too locale-dependent and too many
>> exceptions.  Whitespace left over from external data?  No, there's
>> already other functions for that (and regexen and actual parsers if
>> they're not good enough).  Directory traversal?  No, that's what path
>> instances and the os module are for.
> 
> I think this is a good point. Is removesuffix really useful enough to
> warrant having an operator *as well as* a string method? It was only
> added in 3.9, so we've been managing without it at all for years,
> after all...

Sure, but that's not evidence that this kind of operation is not
common.

Some examples:
- removal of file extensions
- removal of end tags
- removal of units
- removal of currencies
- removal of standard suffixes
- removal of wildcard patterns
etc.

I find lots of such uses in the code bases I work with.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 13 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/L66YTUILBF2RUVVPGDQIBZCHKUPWQSHS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Implementing additional string operators

2021-10-13 Thread Marc-Andre Lemburg
On 13.10.2021 17:11, Guido van Rossum wrote:
> Maybe we should only accept operators as aliases for existing methods.
> 
> x-y could mean x.removesuffix(y)

That was the idea, yes, in particular to make it similar to "+",
which adds to the end of the string, so that:

s = x - oldend + newend

works as expected.

> I don't think x~y is intuitive enough to use.

True.

I tried to find an operator that looked similar to "-", but
"~" would only work as unary operator, a Chris correctly pointed
out, and even if it were a binary one, it would look too
similar to "-" and also doesn't play well when used on a single
line.

s = newstart + (x ~ oldstart)

So I withdraw that proposal.

> On Wed, Oct 13, 2021 at 8:03 AM Stephen J. Turnbull 
>  <mailto:stephenjturnb...@gmail.com>> wrote:
> 
> Chris Angelico writes:
> 
>  > +1, although it's debatable whether it should be remove suffix or
>  > remove all. I'd be happy with either.
> 
> If by "remove all" you mean "efefef" - "ef" == "", I think that's a
> footgun.  Similarly for "efabcd" - "ef" == "abcdef" - "ef".
> 
> Steve
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 13 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/
____

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TFUOXQAHBYZFON2BU2MQ62HI4HKNLCZX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Implementing additional string operators

2021-10-13 Thread Marc-Andre Lemburg
The idea to use "-" in the context of strings may have some
merrit. Not as unary minus, but as sequence operation and
shorthand for str.removesuffix(x):

s = 'abc' + 'def' - 'ef' + 'gh'

giving

s == 'abcdgh'

Removing suffixes from strings is a rather common operation.

Removing prefixes is common as well, so perhaps "~" could be
mapped to str.removeprefix():

s = 'abcdef' ~ 'abc'

giving

s == 'def'

In a similar way, "/" could be mapped to str.split(), since that's
probably even more common:

l = 'a,b,c,d' / ','

giving:

l == ['a', 'b', 'c', 'd']


Looking at the examples, I'm not sure how well this would play out
in the context of just using variables, though:

s = a - s
s = a / c
s = a ~ p

By adding such operators we could potentially make math functions
compatible with strings by the way of duck typing, giving some
really weird results, instead of errors.

Cheers,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 13 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IPOPOUJ7MDOQ3QXXXZWO726725EDNJPY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Syntax Sugar for __name__ == "__main__" boilerplate?

2021-10-03 Thread Marc-Andre Lemburg
Perhaps more people need to be made aware of the __main__.py package
module feature we have in Python:

https://docs.python.org/3/library/__main__.html

Instead of just a single main() function, you get a whole module to
play with :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 03 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/62D46T75NJVFF53V5EXT2FX2TTBPPQRD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: os.workdir() context manager

2021-09-15 Thread Marc-Andre Lemburg
On 15.09.2021 21:02, Guido van Rossum wrote:
> To make chdir() return a context manager *and* keep it working without calling
> `__enter__`, it would have to call `getcwd()`, which I've heard is expensive.
> 
> So I don't think that would work, alas.

At least on Linux, the cost for os.getcwd() is similar to the cost
of os.chdir(), but yes, since we can't have os.chdir() not change
the dir when called, the logic would need the extra os.getcwd() call:

# python3 -m timeit -s 'import os' "os.getcwd()"
50 loops, best of 5: 619 nsec per loop
# python3 -m timeit -s 'import os' "os.chdir('.')"
50 loops, best of 5: 726 nsec per loop

Here's simple implementation of the chdir() context manager:

import os
import pathlib

# chdir context manager
PlatformPath = pathlib.WindowsPath if os.name == 'nt' else pathlib.PosixPath
class chdir(PlatformPath):
def __init__(self, dir):
self.dir = dir
self.olddir = os.getcwd()
os.chdir(dir)
def __enter__(self):
return self
def __exit__(self, *exc):
os.chdir(self.olddir)
return False

# Normal chdir()
path = chdir('abc/')
print (os.getcwd())
print (path.olddir)

# chdir() context manager
with chdir('def/') as wd:
print (repr(wd))
print (os.getcwd())
print (os.listdir('.'))

For extra perks, I made os.chdir() return a pathlib Path object
and you get to see the old directory, so you can backtrack
if needed, even without a context manager.


> On Wed, Sep 15, 2021 at 11:55 AM Eric V. Smith  <mailto:e...@trueblade.com>> wrote:
> 
> On 9/15/2021 2:48 PM, Eric Fahlgren wrote:
>> On Wed, Sep 15, 2021 at 12:21 AM Eric V. Smith > <mailto:e...@trueblade.com>> wrote:
>>
>> And I'm not crazy about the name "workdir". To me, it sounds like it
>> returns something, not sets and resets something. But I don't have a
>> particularly great alternative in mind: in my own code I've used
>> "change_dir", which isn't awesome either.
>>
>>
>> Our version is named "pushdir", modeled after shell's pushd (even though
>> pushdir is a context manager and does auto-pop).  Everyone figures out
>> what it does at a glance.
> 
> That's a great name!
> 
>     Although I think having os.chdir() return a context manager is a better
> design (assuming it can work, but at first blush it would seem so).
> 
> Eric
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 15 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/
____

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KRMBQYQUCSU4GYN3F4OPUKVQ72VLCBIV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: os.workdir() context manager

2021-09-15 Thread Marc-Andre Lemburg
 to code in the same
  block which e.g. relies on os.abspath() working with your
  modified version of the CWD.

- Main use case ? Shell scripts written in Python instead of
  bash.

- Adding a big warning to the docs. Yes, absolutely, and os.chdir()
  should receive one in the docs as well.

- openat() et al. (mentioned by Cameron and Christian). These could
  be used to implement a thread-safe way of working with CWDs,
  essentially placing the CWD under control of Python. However,
  Python extensions and the OS would now know about this per-thread
  CWD, so you'd most likely run into problems with only part of your
  code working with these per-thread CWDs. This is definitely
  out of scope for os.workdir().

- Bike shedding on the name. "workdir" is what I use for the context
  manager, but e.g. os.cwd() would also do and is more in line with
  os.getcwd(). We could even have os.chdir() return the context
  manager, without introducing any new API in the os module.
  Still, I find "with os.workdir()" quite intuitive :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 15 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UWJB4J6UT7XC7XLA4L5JN55COWYO4OW3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] os.workdir() context manager

2021-09-14 Thread Marc-Andre Lemburg
Hello all,

I sometimes write Python scripts which need to work in specific
work directories.

When putting such code into functions, the outer function typically
does not expect the current work dir (CWD) to be changed, so wrap the
code which need the (possibly) modified CWD using a simply context
manager along the lines of:

class workdir:
def __init__(self, dir):
self.dir = dir
def __enter__(self):
self.curdir = os.getcwd()
os.chdir(self.dir)
def __exit__(self, *exc):
os.chdir(self.curdir)
return False

Would there be interest in adding something like this to the os module
as os.workdir() ?

Example:

def backup_home_dir(account):
with os.workdir(os.path.join('/home', account)):
# Create a backup of the account dir, rooted at the account's
# home dir
restic.backup(repo, '.')

Notes:
- The context manager is not thread safe. There's no thread safe model
  for the current work dir. OTOH, scripts usually don't use threads,
  so not a big deal.
- The context manager could be made more elaborate, e.g. adding optional
  logging for debugging purposes.
- The same could be added to pathlib's objects as .workdir() method
  returning a context manager.

Thoughts ?

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 14 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/C525UVPP3ALGTXDNFL2GFDV23KCHP3RL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Different exceptions for assert

2021-09-11 Thread Marc-Andre Lemburg
On 11.09.2021 15:17, Juancarlo Añez wrote:
> Of course, none of this will make sense to programmers with a strong belief 
> that
> assertions must always be turned off in production runs.

You seem to be missing that the Python optimize mode turns off
all code which is related to debugging (__debug__ is set to False,
the compiler doesn't generate code for "if __debug__: ..." statements).

assert is just one of the instances where this happens:

https://docs.python.org/3/reference/simple_stmts.html#grammar-token-assert-stmt

asserts are meant to help find bugs in programs, not check for
user input errors. They document assumptions a programmer has made
when writing the code, which are then tested with a test suite to
make sure the assumptions hold as invariants of the application.

For anything which can go wrong at production run-time, please use
normal if-statement checks and raise appropriate exceptions.

Using assert in such cases is dangerous and can render your
application broken, while everything appears to be running fine
when testing.

Unlike regular if-statement checks, asserts are meant to never
trigger an exception in production code -- which is why they
are removed with -O.

This is not about whether or not to use -O in production environments,
it's about preventing user input exceptions from going unnoticed
when code is run with -O (e.g. the deployment team or a user
decided to always use -O in production).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 11 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CQZDKUPYTPM5A2X55E53SMHLURRSGRDH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Different exceptions for assert

2021-09-10 Thread Marc-Andre Lemburg
On 10.09.2021 05:49, Steven D'Aprano wrote:
> What I question is that allowing assert to raise non-assertions will 
> lead to *more* resilient software rather than less.
> 
> I know far too many people who insist on abusing assertions as a lazy 
> way to validate caller- or user-supplied data, which is **not** a 
> contract.

I concur.

asserts are meant for verifying assumptions the code designer
and wants to verify *during development*.

In C they lead to a core dump which aids in finding the cause
of the problem. In Python, an AssertionError is raised with the
same intent.

In Python, using assert for anything that is not development
related is dangerous, since production code running with
-O (optimized mode) will not even run those assertions -
for a good reason: they are meant only for checking assumptions
in development.

For any non-development related error checking, normal if
statements should be used, not assert.

I very often see code bases, which abuse assert as a quick
way to check types, ranges, valid flag combinations, etc.

Since all these things can go wrong in production as well,
those checks need to be handled with if statements, not
assert.

If we'd not allow assert to also raise non-AssertionErrors,
we'd get even more abuse.

In fact, I'd be in favor of deprecating assert altogether,
if it were not for pytest using it for testing - which is a
valid use case, since those tests are not run in production,
but again, most likely leads to programmers thinking that
they can use the same logic in the actual production code.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 10 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/466LNIX4FGGSTBFFC5PMLIQKQLPQCKJJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: PEP8 mandatory-is rule

2021-09-01 Thread Marc-Andre Lemburg
On 01.09.2021 10:27, Chris Angelico wrote:
> On Wed, Sep 1, 2021 at 5:19 PM Steven D'Aprano  wrote:
>> Outside of contrived counter-examples, we're not likely to come across
>> anything trying to mimac None in the real world. But that's not really
>> why we use `is`, or at least, it's not the only reason. There are a
>> bunch of reasons, none of which on their own are definitive, but
>> together settle the issue (in my opinion).
>>
>> 1. Avoid rare bugs caused by weird objects.
>> 2. Slightly faster and more efficient.
>> 3. Expresses the programmer's intent.
>> 4. Common idiom, so the reader doesn't have to think about it.
>>
> 
> And in terms of education, these sorts of reasons are the things I
> tend to wrap up behind the description "best practice". For instance,
> why do we write file management using the 'with' statement? You could
> argue that it's because other Python implementations may behave
> differently, or future versions may behave differently, or there's the
> risk that the file wouldn't be closed because of some other
> reference... but the easiest explanation is "that's the standard
> idiom", and then you can explain the reasons as the justifications for
> it being standard.

I'm a bit puzzled by all this discussion around "is None".

None is a singleton, so there can only be one such object in any
Python process. Because there is only one, asking:

if x is None: ...

feels intuitive to me.

if x == None: ...

would also work, but misses the point about None being a singleton.

BTW: In SQL you have to use "field IS NULL", "field = NULL" returns
NULL, so you're not any smarter than before :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Sep 01 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/46NOM3Y3RMTVEDIZIBHBFHOX2XVH764Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-28 Thread Marc-Andre Lemburg
On 28.08.2021 14:33, Richard Damon wrote:
> On 8/28/21 6:23 AM, Marc-Andre Lemburg wrote:
>> To me, the behavior looked a lot like stripping NANs left and right
>> from  the list, but what you're explaining makes this appear even more
>> as a bug in the implementation of median() - basically wrong assumptions
>> about NANs sorting correctly. The outcome could be more or less random, it
>> seems.
> 
> It isn't a 'bug in median()' making the wrong assumption about NANs
> sorting, it is an error in GIVING median a NAN which violates its
> precondition that the input have a total-order by the less than operator.

That precondition is not documented as such, though:

https://docs.python.org/3/library/statistics.html#statistics.median

> Asking for the median value of a list that doesn't have a proper total
> order is a nonsense question, so you get a nonsense answer.

Leaving aside that many programmers will probably not know that
NANs cause the total ordering of Python floats to fail (even though
they are of type float), you'd expect Python to do the right thing
and either:

- raise an exception or
- apply a work-around to regain total ordering, as suggested by Steven, or
- return NAN for the calculation as NumPy does.

>>> import statistics
>>> statistics.median([1,2,3])
2
>>> nan = float('nan')
>>> statistics.median([1,2,3,nan])
2.5
>>> statistics.median([1,2,nan,3])
nan
>>> statistics.median([1,nan,2,3])
nan
>>> statistics.median([nan,1,2,3])
1.5
>>> nan < 1
False
>>> nan < nan
False
>>> 1 < nan
False

vs.

>>> import numpy as np
>>> nan = np.nan
>>> np.median(np.array([1,2,3,nan]))
nan
>>> np.median(np.array([1,2,nan,3]))
nan
>>> np.median(np.array([1,nan,2,3]))
nan
>>> np.median(np.array([nan,1,2,3]))
nan
>>> nan < nan
False
>>> nan < 1
False
>>> 1 < nan
False

> It costs too much to have median test if the input does have a total
> order, just to try to report this sort of condition, that it won't be
> done for a general purpose operation.

If NumPy can do it, why not Python ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 28 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/
____________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MZKELO2RCFVC6L6NC7AAHLGDHJR6LKMQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-28 Thread Marc-Andre Lemburg
On 28.08.2021 05:32, Steven D'Aprano wrote:
> On Thu, Aug 26, 2021 at 09:36:27AM +0200, Marc-Andre Lemburg wrote:
> 
>> Indeed. The NAN handling in median() looks like a bug, more than
>> anything else:
> 
> [slightly paraphrased]
>>>>> l1 = [1,2,nan,4]
>>>>> l2 = [nan,1,2,4]
>>
>>>>> statistics.median(l1)
>> nan
>>>>> statistics.median(l2)
>> 1.5
> 
> Looks can be deceiving, it's actually a feature *wink*
> 
> That behaviour is actually the direct consequence of NANs being 
> unordered. The IEEE-754 standard requires that comparisons with NANs all 
> return False (apart from not-equal, which returns True). So NANs are 
> neither less than, equal to, or greater than other values.
> 
> Which makes sense numerically, NANs do not appear on the number line and 
> are not ordered with numbers.
> 
> So when you sort a list containing NANs, they end up in some arbitrary 
> position that depends on the sort implementation, the other values in 
> the list, and their initial position. NANs can even throw out the order 
> of other values:
> 
>>>> sorted([3, nan, 4, 2, nan, 1])
> [3, nan, 1, 2, 4, nan]
> 
> and *that* violates `median`'s assumption that sorting values actually 
> puts them in sorted order, which is why median returns the wrong value.
> 
> I don't think that Timsort is buggy here. I expect that every sort 
> algorithm on the planet will require a Total Order to get sensible 
> results, and NANs violate that expectation.
> 
> https://eli.thegreenplace.net/2018/partial-and-total-orders/
> 
> If we define the less than operator `<` as "isn't greater than (or equal 
> to)", then we can see that sorted is *locally* correct:
> 
> * 3 isn't greater than nan;
> * nan isn't greater than 1;
> * 1 isn't greater than 2;
> * 2 isn't greater than 4;
> * and 4 isn't greater than nan.
> 
> sorted() has correctly sorted the values in the sense that the invariant 
> "a comes before b iff a isn't greater than b" is satisfied between each 
> pair of consecutive values, but globally the order is violated because 
> NAN's are unordered and mess up transitivity:
> 
> 3 isn't greater than NAN, and NAN isn't greater than 1, 
> but it is not true that 3 isn't greater than 1.
> 
> In the general case of sorting elements, I think that the solution is 
> "don't do that". If you have objects which don't form a total order, 
> then you can't expect to get sensible results from sorting them.
> 
> In the case of floats, it would be nice to have a totalOrder function as 
> specified in the 2008 revision of IEEE-354:
> 
> https://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf
> 
> Then we could sensibly do:
> 
> sorted(floats_or_decimals, key=totalorder)
> 
> and at least NANs would end up in a consistent place and everything else 
> sorted correctly.

Thanks for the analysis.

To me, the behavior looked a lot like stripping NANs left and right
from  the list, but what you're explaining makes this appear even more
as a bug in the implementation of median() - basically wrong assumptions
about NANs sorting correctly. The outcome could be more or less random, it
seems.

In SQL NULL always sort smaller than anything else. Perhaps that would be
a strategy to use here as well.

The totalOrder predicate in the IEEE spec would make NANs get shifted
to the left or right part of the sequence, depending on the NAN sign.

In any case, +1 on anything which fixes this :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 28 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WDHP7YCBYRFJY74HOHAER4YJHBJNZTMO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-28 Thread Marc-Andre Lemburg
On 28.08.2021 07:14, Christopher Barker wrote:
> 
> SciPy should probably also be a data-point, it uses:
> 
>     nan_policy : {'propagate', 'raise', 'omit'}, optional
> 
> 
> +1
> 
> Also +1 on a string flag, rather than an Enum.

Same here.

Codecs use strings as well: 'strict', 'ignore', 'replace'
(and a bunch of others).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 28 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6PUBRL565SBKBW2FWCF3OLTDZ2ZXJ2EA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Marc-Andre Lemburg
On 27.08.2021 09:58, Serhiy Storchaka wrote:
> 26.08.21 12:05, Marc-Andre Lemburg пише:
>> Oh, good point. I was under the impression that NAN is handled
>> as a singleton.
>>
>> Perhaps this should be changed to make to make it easier to
>> detect NANs ?!
> 
> Even ignoring a NaN payload, there are many different NaNs of different
> types. For example, Decimal('nan') cannot be the same as float('nan').

Right, it's a much larger problem than I thought :-)

cmath has its own NANs as well.

Too many NANs... it's probably better to stick with NumPy for handling
data sets with embedded NANs. It provides consistent handling for NANs
across integers, floats, complex and even date/time values (as NATs).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 27 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RU6L6J7JXVLO42PHIEYPHWNGXRWKMCCK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Marc-Andre Lemburg
On 27.08.2021 03:24, David Mertz, Ph.D. wrote:
> 
> 
> On Thu, Aug 26, 2021, 6:46 AM Marc-Andre Lemburg 
> 
> Fair enough. Would it then make sense to at least have all possible NAN
> objects compare equal, treating the extra error information as an 
> attribute
> value rather than a distinct value and perhaps exposing this as such ?
> 
> 
> No, no, no!
> 
> Almost the entire point of a NaN is that it doesn't compare as equal to
> anything... Not even to itself!

Yeah, you're right, it would break the logic that NAN should "infect"
most (or even all) other operations they are used in to signal
"no idea what to do here".

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 27 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FUZRCHW7NVZPZMNGS2S2CVIVALELCB74/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Marc-Andre Lemburg
On 26.08.2021 17:36, Christopher Barker wrote:
> There have been a number of discussions on this list, and at least one PEP,
> about NaN (and other special values). 
> 
> Let’s keep this thread about handling them in the statistics lib.
> 
> But briefly:
> 
> NaNs are weird on purpose, and Python should absolutely not deviate from 
> IEEE. 

Agreed. I was just surprised that NANs are more Medusa-like than
expected ;-)

> That’s (one reason) Python has None :-)
>
> If you are that worried about performance, you should probably use numpy 
> anyway :-)

Sure, and pandas, which both have methods to replace NANs in arrays.

> -CHB
> 
> 
> 
> On Thu, Aug 26, 2021 at 3:47 AM Marc-Andre Lemburg  <mailto:m...@egenix.com>> wrote:
> 
> On 26.08.2021 12:15, Steven D'Aprano wrote:
> > On Thu, Aug 26, 2021 at 11:05:01AM +0200, Marc-Andre Lemburg wrote:
> >
> >> Oh, good point. I was under the impression that NAN is handled
> >> as a singleton.
> >
> > There are 4503599627370496 distinct quiet NANs (plus about the same
> > signalling NANs). So it would need to be 4-quadrillion-ton :-)
> >
> > (If anyone is concerned about the large number of NANs, it's less than
> > 0.05% of the total number of floats.)
> >
> > Back in the mid-80s, Apple's floating point library, SANE, distinguished
> > different classes of error with distinct NANs. Few systems have followed
> > that lead, but each NAN still has 51 bits available for a diagnostic
> > code, plus the sign bit. While Python itself only generates a single NAN
> > value, if you are receiving data from outside sources it could contain
> > NANs with distinct payloads.
> >
> > The IEEE-754 standard doesn't mandate that NANs preserve the payload,
> > but it does recommend it. We shouldn't gratuitously discard that
> > information. It could be meaningful to whoever is generating the data.
> 
> Fair enough. Would it then make sense to at least have all possible
> NAN objects compare equal, treating the extra error information as an
> attribute value rather than a distinct value and perhaps exposing this
> as such ?
> 
> I'm after the "practicality beats purity" here. The math.isnan() test
> doesn't work well in practice, since you'd have to iterate over all
> sequence members and call that test function, which is expensive when
> done in Python.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Experts (#1, Aug 26 2021)
> >>> Python Projects, Coaching and Support ...    https://www.egenix.com/
> >>> Python Product Development ...        https://consulting.egenix.com/
> ________
> 
> ::: We implement business ideas - efficiently in both time and costs :::
> 
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                https://www.egenix.com/company/contact/
>                      https://www.malemburg.com/
> 
> ___
> Python-ideas mailing list -- python-ideas@python.org
> <mailto:python-ideas@python.org>
> To unsubscribe send an email to python-ideas-le...@python.org
> <mailto:python-ideas-le...@python.org>
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> 
> https://mail.python.org/archives/list/python-ideas@python.org/message/GX7PAY5ZR76KBK5INWKV2Y67FKCCAK2Y/
> Code of Conduct: http://python.org/psf/codeofconduct/
> 
> -- 
> Christopher Barker, PhD (Chris)
> 
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2TBVSPQDO5VKIF46U2XEWEKHARN73UW3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Marc-Andre Lemburg
On 26.08.2021 12:15, Steven D'Aprano wrote:
> On Thu, Aug 26, 2021 at 11:05:01AM +0200, Marc-Andre Lemburg wrote:
> 
>> Oh, good point. I was under the impression that NAN is handled
>> as a singleton.
> 
> There are 4503599627370496 distinct quiet NANs (plus about the same 
> signalling NANs). So it would need to be 4-quadrillion-ton :-)
> 
> (If anyone is concerned about the large number of NANs, it's less than 
> 0.05% of the total number of floats.)
> 
> Back in the mid-80s, Apple's floating point library, SANE, distinguished 
> different classes of error with distinct NANs. Few systems have followed 
> that lead, but each NAN still has 51 bits available for a diagnostic 
> code, plus the sign bit. While Python itself only generates a single NAN 
> value, if you are receiving data from outside sources it could contain 
> NANs with distinct payloads.
> 
> The IEEE-754 standard doesn't mandate that NANs preserve the payload, 
> but it does recommend it. We shouldn't gratuitously discard that 
> information. It could be meaningful to whoever is generating the data.

Fair enough. Would it then make sense to at least have all possible
NAN objects compare equal, treating the extra error information as an
attribute value rather than a distinct value and perhaps exposing this
as such ?

I'm after the "practicality beats purity" here. The math.isnan() test
doesn't work well in practice, since you'd have to iterate over all
sequence members and call that test function, which is expensive when
done in Python.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 26 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GX7PAY5ZR76KBK5INWKV2Y67FKCCAK2Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Marc-Andre Lemburg
On 26.08.2021 10:02, Peter Otten wrote:
> On 26/08/2021 09:36, Marc-Andre Lemburg wrote:
> 
>> In Python you can use a simple test for this:
> 
> I think you need math.isnan().
> 
>>>>> nan = float('nan')
>>>>> l = [1,2,3,nan]
>>>>> d = {nan:1, 2:3, 4:5, 5:nan}
>>>>> s = set(l)
>>>>> nan in l
>> True
> 
> That only works with identical nan-s, and because the container omits the
> equality check for identical objects:
> 
>>>> nan = float("nan")
>>>> nan in [nan]
> True
> 
> But:
> 
>>>> nan == nan
> False
>>>> nan in [float("nan")]
> False

Oh, good point. I was under the impression that NAN is handled
as a singleton.

Perhaps this should be changed to make to make it easier to
detect NANs ?!

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 26 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/
________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/R357YDI5ZV5UG6DZSQISVNICH4IZMNIT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Marc-Andre Lemburg
On 26.08.2021 02:36, Finn Mason wrote:
> Perhaps a warning could be raised but the NaNs are ignored. For example:
> 
> Input: statistics.mean([4, 2, float('nan')])
> Output: [warning blah blah blah]
> 3
> 
> Or the NaNs could be treated as zeros and a warning raised:
> 
> Input: statistics.mean([4, 2, float('nan')])
> Output: [warning blah blah blah]
> 2
> 
> I do feel there should be a catchable warning but not an outright exception, 
> and
> a non-NaN value should still be returned. This allows calculations to still
> quickly and easily be made with or without NaNs, but an alternative course of
> action can be taken in the presence of a NaN value if desired.

With the keyword argument, you can decide what to do.

As for the default: for codecs we made raising an exeception
the default, simply because this highlights the need to make
an explicit decision.

For long running calculations this may not be desirable, but then
getting NAN as end result isn't the best compromise either.

In practice it's better to check for NANs before entering a
calculation and then apply case specific handling, e.g. replace
NANs with fixed default values, remove them, use a different
heuristic for the calculation, stop the calculation and ask
for better input, etc. etc.

There are many ways to process things in the face of NANs.

In Python you can use a simple test for this:

>>> nan = float('nan')
>>> l = [1,2,3,nan]
>>> d = {nan:1, 2:3, 4:5, 5:nan}
>>> s = set(l)
>>> nan in l
True
>>> nan in d
True
>>> nan in s
True

but this really only makes sense for smaller data sets. If you
have a large data set where you rarely get NANs, using the
keyword argument may indeed be a better way to go about this.

> In any case, the current behavior should definitely be changed.

Indeed. The NAN handling in median() looks like a bug, more than
anything else:

>>> import statistics
>>> statistics.mean(l)
nan
>>> statistics.mean(d)
nan
>>> statistics.mean(s)
nan

>>> l1 = [1,2,nan,4]
>>> statistics.mean(l1)
nan
>>> l2 = [nan,1,2,4]
>>> statistics.mean(l2)
nan

>>> statistics.median(l)
2.5
>>> statistics.median(l1)
nan
>>> statistics.median(l2)
1.5

> On Tue, Aug 24, 2021, 1:46 AM Marc-Andre Lemburg  <mailto:m...@egenix.com>> wrote:
> 
> On 24.08.2021 05:53, Steven D'Aprano wrote:
> > At the moment, the handling of NANs in the statistics module is
> > implementation dependent. In practice, that *usually* means that if your
> > data has a NAN in it, the result you get will probably be a NAN.
> >
> >     >>> statistics.mean([1, 2, float('nan'), 4])
> >     nan
> >
> > But there are unfortunate exceptions to this:
> >
> >     >>> statistics.median([1, 2, float('nan'), 4])
> >     nan
> >     >>> statistics.median([float('nan'), 1, 2, 4])
> >     1.5
> >
> > I've spoken to users of other statistics packages and languages, such as
> > R, and I cannot find any consensus on what the "right" behaviour should
> > be for NANs except "not that!".
> >
> > So I propose that statistics functions gain a keyword only parameter to
> > specify the desired behaviour when a NAN is found:
> >
> > - raise an exception
> >
> > - return NAN
> >
> > - ignore it (filter out NANs)
> >
> > which seem to be the three most common preference. (It seems to be
> > split roughly equally between the three.)
> >
> > Thoughts? Objections?
> 
> Sounds good. This is similar to the errors argument we have
> for codecs where users can determine what the behavior should be
> in case of an error in processing.
> 
> > Does anyone have any strong feelings about what should be the default?
> 
> No strong preference, but if the objective is to continue calculations
> as much as possible even in the face of missing values, returning NAN
> is the better choice.
> 
> Second best would be an exception, IMO, to signal: please be explicit
> about what to do about NANs in the calculation. It helps reduce the
> needed backtracking when the end result of a calculation
> turns out to be NAN.
> 
> Filtering out NANs should always be an explicit choice to make.
> Ideally such filtering should happen *before* any calculations
> get applied. In some cases, it's better to replace NANs with
> use case specific default values. In others, removing them is the
> right thing to do.
> 
>

[Python-ideas] Re: We should have an explicit concept of emptiness for collections

2021-08-24 Thread Marc-Andre Lemburg
On 21.08.2021 23:33, Tim Hoffmann via Python-ideas wrote:
> Hi all,
> 
> The Programming Recommendations section in PEP-8 states
> 
> "For sequences, (strings, lists, tuples), use the fact that empty sequences 
> are false:"
> 
>   # Correct:
>   if not seq:
>   if seq:
> 
>   # Wrong:
>   if len(seq):
>   if not len(seq):
> 
> In the talk "When Python Practices Go Wrong" Brandon Rhodes makes a good 
> point against this practice based on "explicit is better than implicit" 
> (https://youtu.be/S0No2zSJmks?t=873). He advertizes using
> 
>   if len(seq):
> 
> While that is as explicit as one can get within the current language, it 
> could still be more explicit: Semantically, we're not interested in the 
> (zero) length of the sequence, but want to know if it is empty.
> 
> 
> **Proposal**
> 
> Therefore, I propose syntax for an explicit empty check
> 
>   if isempty(seq):   (i)
> 
> or
> 
>   if seq.is_empty()  (ii)
> 
> This proposal is mainly motivated by the Zen verses "Explicit is better than 
> implicit" and "Readability counts".

I assume your function would first check that the argument is
a sequence and then check its length to determine emptiness.
That doesn't strike me as more explicit. It's just shorter than
first doing the type check and then testing the length.

For the method case, it's completely up to the object to define
what "empty" means, e.g. could be a car object which is fully
fueled but doesn't have passengers. That's very flexible, but
also requires all sequences to play along, which is hard.

When you write "if not seq: ..." in a Python application, you already
assume that seq is a sequence, so the type check is implicit (you
can make it explicit by adding a type annotation and applying
a type checked; either static or dynamic) and you can assume
that seq is empty if the boolean test returns False.

Now, you can easily add a helper function which implements your
notion of "emptiness" to your applications. The question is:
would it make sense to add this as a builtin.

My take on this is: not really, since it just adds a type
check and not much else. This is not enough to warrant the
added complexity for people learning Python.

You may want to propose adding a new operator.is_empty() function
which does this, though.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 24 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/
________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NHD5AA2FK5DF5IHNBHTFJCXTR65DPKS3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: NAN handling in statistics functions

2021-08-24 Thread Marc-Andre Lemburg
On 24.08.2021 05:53, Steven D'Aprano wrote:
> At the moment, the handling of NANs in the statistics module is 
> implementation dependent. In practice, that *usually* means that if your 
> data has a NAN in it, the result you get will probably be a NAN.
> 
> >>> statistics.mean([1, 2, float('nan'), 4])
> nan
> 
> But there are unfortunate exceptions to this:
> 
> >>> statistics.median([1, 2, float('nan'), 4])
> nan
> >>> statistics.median([float('nan'), 1, 2, 4])
> 1.5
> 
> I've spoken to users of other statistics packages and languages, such as 
> R, and I cannot find any consensus on what the "right" behaviour should 
> be for NANs except "not that!".
> 
> So I propose that statistics functions gain a keyword only parameter to 
> specify the desired behaviour when a NAN is found:
> 
> - raise an exception
> 
> - return NAN
> 
> - ignore it (filter out NANs)
> 
> which seem to be the three most common preference. (It seems to be 
> split roughly equally between the three.)
> 
> Thoughts? Objections?

Sounds good. This is similar to the errors argument we have
for codecs where users can determine what the behavior should be
in case of an error in processing.

> Does anyone have any strong feelings about what should be the default? 

No strong preference, but if the objective is to continue calculations
as much as possible even in the face of missing values, returning NAN
is the better choice.

Second best would be an exception, IMO, to signal: please be explicit
about what to do about NANs in the calculation. It helps reduce the
needed backtracking when the end result of a calculation
turns out to be NAN.

Filtering out NANs should always be an explicit choice to make.
Ideally such filtering should happen *before* any calculations
get applied. In some cases, it's better to replace NANs with
use case specific default values. In others, removing them is the
right thing to do.

Note that e.g. SQL defaults to ignoring NULLs in aggregate functions
such as AVG(), so there are standard precedents for ignoring NAN values
per default as well. And yes, that default can lead to wrong results
in reports which are hard to detect.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 24 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/L5QB4GUPYXNYBFKG43VSGOWVE27Y5BIF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: multiprocessing: hybrid CPUs

2021-08-20 Thread Marc-Andre Lemburg
On 20.08.2021 09:30, Chris Angelico wrote:
> On Fri, Aug 20, 2021 at 5:22 PM  wrote:
>> I simply tried to understand how processes transfering data between
>> each other. I know they pickle. But how exactly? Which pickle protocol
>> they use by default? Do they decide the protocol depending on the
>> type/kind/structure of data? Do they compress the pickled data?
>> e.g. I read a PEP about pickle version 5 which is relevant for large
>> data like pandas.DataFrames.
>>
> 
> pickle.DEFAULT_PROTOCOL would be my first guess :)
> 
> When you're curious about this sort of thing, I strongly recommend
> browsing the CPython source code. Sometimes, you'll end up with a
> follow-up question "is this a language guarantee?", but at very least,
> you'll know how the most-used Python implementation does things.
> 
> Don't be put off by the "C" in CPython; a lot of the standard library
> is implemented in Python, including the entire multiprocessing module:
> 
> https://github.com/python/cpython/tree/main/Lib/multiprocessing
> 
> A quick search for the word "pickle" shows this as a promising start:
> 
> https://github.com/python/cpython/blob/main/Lib/multiprocessing/reduction.py

Chris is pointing to the right resources.

In Python 3.9, pickle writes the format 4.0 per default and the reduction
mechanism in multiprocessing always uses the default, since even though
it subclasses the Pickler class, the protocol variable is not touched.

See https://github.com/python/cpython/blob/3.9/Lib/pickle.py for details.

Aside:

If you're dealing with data frames, there are a few alternative
tools to consider apart from multiprocessing:

- Prefect: https://www.prefect.io/
- Dask: https://dask.org/
- MPI: https://mpi4py.readthedocs.io/en/stable/

If you have a GPU available, you can also try these frameworks:

- RAPIDS: https://rapids.ai/
- HeAT: https://heat.readthedocs.io/en/latest/

Those tools will do a lot more than multiprocessing and require
extra effort to get up and running, but on the plus side, you
don't have to worry about things like pickling protocols
anymore :-)

If you want to explore the other direction and create an optimized
multiprocessing library, replacing pickle with e.g. Arrow would
give you some advantages:

- pyarrow: https://pypi.org/project/pyarrow/

Alternatively, don't even pass data chunks around per in-process memory,
but instead have your workers read them from (RAM) disk by converting
them to one of the more efficient formats for this, e.g.

- Parquet: https://github.com/dask/fastparquet

or place the data into shared memory using one of those formats.

Reading Parquet files is much faster than reading CSV or pickle
files.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 20 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MDPBCMR26GF6PKEZFVE3BVRCPXQUFPQP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: multiprocessing: hybrid CPUs

2021-08-18 Thread Marc-Andre Lemburg
On 18.08.2021 15:58, Chris Angelico wrote:
> On Wed, Aug 18, 2021 at 10:37 PM Joao S. O. Bueno  
> wrote:
>>
>> So,
>> It is out of scope of Pythonmultiprocessing, and, as I perceive it, from
>> the stdlib as a whole to be able to allocate specific cores for each 
>> subprocess -
>> that is automatically done by the O.S. (and of course, the O.S. having an 
>> interface
>> for it, one can write a specific Python library which would allow this 
>> granularity,
>> and it could even check core capabilities).
> 
> Python does have a way to set processor affinity, so it's entirely
> possible that this would be possible. Might need external tools
> though.

There's os.sched_setaffinity(pid, mask) you could use from within
a Python task scheduler, if this is managing child processes (you need
the right permissions to set the affinity).

Or you could use the taskset command available on Linux to fire
up a process on a specific CPU core. lscpu gives you more insight
into the installed set of available cores.

multiprocessing itself does not have functionality to define the
affinity upfront or to select which payload goes to which worker.
I suppose you could implement a Pool subclass to handle such cases,
though.

Changing the calculation model is probably better, as already
suggested. Having smaller chunks of work makes it easier to even
out work load across workers in a cluster of different CPUs. You
then don't have to worry about the details of the CPUs - you just
need to play with the chunk size parameter a bit.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Aug 18 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QJBYMO2FDD2PHHAACNI2BQZIBXJV7AZT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: disallow assignment to unknown ssl.SSLContext attributes

2021-06-26 Thread Marc-Andre Lemburg
On 26.06.2021 21:32, Ethan Furman wrote:
> On 6/25/21 5:20 PM, Eric V. Smith wrote:
> 
>> It seems like many of the suggestions are SSLContext specific. I don't think
> we should be adding
>> __slots__ or otherwise redefining the interface to that object. Isn't this a
> general "problem" in
>> python...
> 
> In most cases I would agree with you, but in this case the object is security
> sensitive, and security should be much more rigorous in ensuring correctness.

Isn't this more an issue of API design rather than Python's
flexibility when it comes to defining attributes ?

IMO, a security relevant API should not use direct attribute
access for adjusting important parameters. Those should always
be done using functions or method calls which apply extra sanity
checks and highlight issues in form of exceptions.

And those are possible in Python without calling type checking,
linters or other extra tools to the rescue.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Jun 26 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/53V2RCIDJRE2SVVB7C4FZADIY2LSRMWT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Deprecate sum of lists

2021-06-19 Thread Marc-Andre Lemburg
On 19.06.2021 17:17, Serhiy Storchaka wrote:
> 19.06.21 18:12, Marc-Andre Lemburg пише:
>>> But there could be endless debate about whether flatten( ("x", "y") ) should
>>> return a list or a tuple...
>>
>> Have it return an iterator :-)
> 
> flatten = itertools.chain.from_iterable

Well, like I said: modulo the discussion around what "flatten"
should mean, e.g. you will probably want to have flatten() go
a certain number of levels deep and not necessarily flatten
strings.

But yes, such a definition is certainly a good start.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Jun 19 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5C2JB3QU7E4S5MMRI7NCRSZ52MFKZCC5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Deprecate sum of lists

2021-06-19 Thread Marc-Andre Lemburg
On 19.06.2021 17:03, Guido van Rossum wrote:
> I think I would find flatten(a) more readable than [*chunk for chunk in a], 
> and
> more discoverable: this operation is called "flatten" in other languages, so
> users are going to search the docs or help for that.

+1

> But there could be endless debate about whether flatten( ("x", "y") ) should
> return a list or a tuple...

Have it return an iterator :-)

flatten() would be in the same category of builtins as reversed()
and enumerate().

I think we'll see more discussion about exactly how to flatten
the structures, e.g. do you stop at strings or flatten them into
lists of characters ? But I'm sure we'd reach a sensible default
which makes most happy.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Jun 19 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3TKI5PMVNMBXBXEOFIVZTX77MSAVJVYL/
Code of Conduct: http://python.org/psf/codeofconduct/